Blog

  • AI Robots

    As Nvidia’s CEO Jensen Huang said in a conference at Taipei in June 2024, ‘The next wave of AI is physical AI. The era of robotics has arrived.’ AI is going to be integrated into the physical world. The US leads in AI advances — the software and internet revolution emanates from Silicon Valley. However, Asian Tech giants are good at the hardware side of things.

    Things are moving beyond chatbots and software into the physical world. There would be more of AI-enabled hardware and robotics. And this will come from Asia.

    According to projections by Citigroup, there would be 1.3 billion AI robots globally by 2035 and 4 billion by 2050. Their work will include ranging from household chores to delivering parcels. China will have an impressive presence here. It has more than 70 per cent of all robotic patents. Japan and Korea too have 7 and 5 per cent patents respectively. The US contributes just 3 per cent. It is Asian dominance.

    Japan leads in technologies that adapt to automation. AI-driven hardware and software are being developed across all types of work.

    Whether 2025 will see the rise of AI robots is doubtful but it is certain that they are coming, and they are coming from Asia.

  • Veo 2 Scores Over Rivals

    We have already noted that Sam Altman feels video is the key to achieve AGI. Sora heralds OpenAI’s debut in this field. The rival company Google strikes back with two AI video models — Veo 2 and Imagen 3. These are not available for public now but will be available for public by early 2025. In internal testing Veo outperforms competitors (China’s Kling, Meta’s Moviegen and OpenAI’s Sora). Veo scores over the competitors in terms prompt response and quality.

    Veo excels in creating nature and animal related clips. It captures detailed movement. Veo 2 delivers life-like visuals with better realism. It leverages physics while depicting movements.

    Though not perfect, it shows significant improvement over current state-of-the -art models.

    Sora has more control options and longer clips. It is, therefore, not comparable to Veo 2. However, Veo 2 leaves all the Chinese models far behind. Google considers the Chinese model Kling its biggest competitor. Google has the advantage of owning YouTube, and that facilitates the training of these models as far as physics is concerned. Veo 2 reproduces a gymnast’s routine. It shows its grasp over human movement. Sora is not capable of modelling complex movements.

    Veo 2 supports 4K resolution and can produce videos longer than 2 minutes. It is at present restricted to 720p and eight seconds on the experimental platform. It scores over Sora in resolution and video duration.

    The simultaneous release of foundation model Genie 2 capable of generating 3 D environments is another significant development. Genie 2 is critical for the training of embodied AI agents. It accelerates the pace of Google’s AGI vision.

    Thus 2025 will be the year of advanced world models, and Google leads here. Google’s acquisition of DeepMind is a strategic decision that puts Google much ahead in the race to AGI. Elon Musk rightly calls this as if DeepMind has acquired Google.

    Google these days has rejuvenated itself and is acting like a startup. Its releases are impressive — Gemini 2, Willow, GenCast and updates to NotebookM.

  • AI Training through Copyrighted Work: UK Proposal

    The UK government proposes that AI companies can use copyrighted material to train the models, unless the creative professional and companies opt out of this process. This is a point of contention between the AI companies and the creative people.

    The UK copyright Law will make an exception in case of the AI companies to use copyright material to train the models. At the same time, writers, artists and composers will be allowed to ‘reserve their rights’ –the rights not to an bow their work to be used in AI training process or to ask for a license fee to do so.

    This proposal offers a win-win situation for both the sides. It provides greater control to creators and right holders. It may lead to more licensing of content. It will open up a new revanue stream for the creators. The unlicensed use of creative work for AI model is an unjust threat to creators’ llivelihoods.

    The AI developers may be required to indicate the sourcing of their content to train the models. The new measures would have to be accessible and effective before being adopted.

    Another issue is whether the status of the models which have already been deployed, such as ChatGPT and Gemini. The government is seeking views on this.

    Another issue is the US-style ‘right of personality’. It protects celebrities from having their voice or likeness replicated by AI without permission. Scarlett Johans Son’s voice was cloned in a voice assistant by OpenAI and that was paused after users endorsed the similarity.

  • OpenAI o1 Model

    OpenAI has released its o1 model which takes a strong lead in reasoning. Instead of large language models (LLMs), the model is called large reasoning model (LRM). The model uses extra inference time computer cycles to think more. It can solve complex reasoning problems.

    OpenAI is secretive about this model. It refuses to reveal details how the model works. LRMs generate more tokens after reaching the final response. These are called model’s ‘thoughts’ or reasoning chain. A typical LLM generates a code, LRM generates reasoning tokens to assess the problem, and plans the structure of the code. It generates multiple solutions before arriving at the final answer.

    o1 model conceals the thinking process and reveals only the final response. It avoids the clutter. The experience is smoother. OpenAI treats the reasoning chain as a trade secret. It does not want the competitors to learn its magic.

    Training costs are very high. AI companies thus become secretive to maintain their competitive advantage. It is the red teaming of the model.

    On the other hand, open-source models such Owen of Alibaba are fully transparent. It reveals the reasoning tokens. Revelation of the reasoning process facilitates its integration with tools and applications. It gives full control of the model to the developer.

    Owen and R1 are still experimental and o1 has the lead. However, many more open-source models are in the offing. They will be good alternatives to the proprietary models when visibility and control are crucial.

  • Reimagining Automation

    AI agents are emerging as game changers in the movement towards automation. In the past three years, since the advent of ChatGPT, generative AI tools have advanced by leaps and bounds. However, attention is now shifting to AI agents capable of thinking, acting and collaborating autonomously.

    There is a rapid journey towards automation from chatbots to retrieval-augmented-generation (RGA) to automatic multi-agent AI. By 2028, there would be 33 per cent enterprise software applications as against 1 per cent in 2024.

    Agentic workflows would expand the set of tasks that AI can do. Organizations will move from predefined processes to dynamic intelligent workflows.

    Traditional Automation: Limitations

    Traditional automation tools are rigid. Costs are high. We have dealt with RPA or robotic process automation. The workflows struggle with processes which are not clear. Or else, these rely on unstructured data. The systems are brittle and need vendor attention when processes change.

    Chatbots do reason and generate content but lack the capability of autonomous execution.

    They do require human element when there are complex tasks.

    Automation has to go beyond predefined processes to dynamic intelligent workflows.

    Vertical AI Agents

    In evolution, we are shifting to vertical AI agents. These are smarter and proactive. They accomplish task across domains. They improve over a period of time. They memorize the activities and sense your intent and recognize patterns in your behaviour.

    SaaS models do optimize existing workflows. Vertical AI agents reimagine them entirely. They eliminate the operational teams (because of autonomous execution). Reimagining workflows brings new capabilities hitherto unexisting. They give a competitive advantage.

    Transition from RPA to Multi-agent AI

    Multi-agent AI systems are capable of autonomous decision making. By 2028, this is going to be prevalent to the extent of 15 per cent of day-to-day decisions. Agents will be true collaborators, changing enterprise workflows and systems.

    There would multi-modal systems of record to get actionable insights. Complex tasks will be broken into manageable components. Tooling will change — there could be AI Agent Studio. Co-workers will be freed up for strategic tasks. Agents will have better memory, advanced coordination capabilities and advanced reasoning.

    Agents will move from doing jobs to managing workflows to doing the entire jobs. Then arises the challenge of their accuracy. An AI agent executes a single task with 85 per cent overall accuracy. When it starts doing two tasks, the accuracy becomes 72 per cent (0.85 x0.85). As the combination keeps growing, the accuracy drops further. Is it then acceptable? We have to optimize to an accuracy level of 90-100 per cent.

    We, therefore, require robust evaluation frameworks. There should be continuous feedback. There should be automated optimization tools.

    Ai agents will stay as our co-workers. They will transform the enterprise operations. They will unlock unprecedented efficiency. It is time to ACT. Are you ready?

  • AI Agents

    We say we have third wave of AI — the agentic AI. What is exactly an AI agent? At its simplest, an AI agent is AI-powered software that can do a variety of tasks for you. Had it not been for the agent, the same task would have been performed by a customer service agent, an HR person or an IT help desk employee. You demand services from an AI agent, and it renders those services to you. At times, while doing so, it interacts with multiple systems and goes much beyond answering simple questions.

    Perplexity has released an AI agent which assists people in doing their holiday shopping. Google’s Project Mariner as AI agent is used to find flights and hotels and shop for household items, find recipe and myriad other tasks.

    Though a simple concept, it still lacks clarity. Even Big Tech has not built consensus about its role. Google considers them task-based assistants, say they do coding for developers.

    An agent may act like an extra employee. Agents are also customer experience tools. They assist the customer to solve more complex problems than what a chatbot handles.

    In the absence of a cohesive definition, there could be ambiguity about what an agent would be doing. Despite the various definitions, agents help us complete tasks in automated ways, with the least human intervention. All agree that an agent consists of an intelligent software system designed to perceive its environment, reason about it, make decisions, and take actions to attain certain objectives autonomously.

    A number of technologies could be used to make this happen. These could be AI/ML techniques such as natural language processing (NLP), machine learning and computer vision (CV) to operate in dynamic environments autonomously or along with other agents and human users.

    Agents will evolve further, and AI will evolve. There is crossing of systems which is hard. Some legacy systems lack basic API access. This is more challenging than what we think.

    The challenge is to allow the machine to handle contingencies in an automated way. The true test is to allow the agent to take over and apply true automation.

    There could be AI agent infrastructure — a tech stack designed specifically to create agents.

    Over time, reasoning will slowly improve. Frontier models will handle more of the workflows.

    Maybe, agents will be powered by multiple LLMs, rather than a single LLM.

    Agentic future is not tied to a single LLM.

    Industry is moving towards agents operating independently.

    This is a period of transition. It is a promising field, and we are moving in the right direction.

  • Generative AI and Analytical AI

    ChatGPT arrived with a bang in November 2022 and attracted the attention of everyone to generative AI. It is a positive development since we initiated a new technology.

    Though previously many companies have been using AI — analytical AI. It is still being used by many companies. Though there are some applications where both generative AI and analytical AI are used together, mostly each of them is used independently and separately.

    Let us understand the differences between these two technologies. Both differ in purpose, capabilities, methods and data.

    Generative AI uses foundational models or deep neural networks to generate new content — text, images, music, code. The output resembles human creation.

    Analytical AI refers to statistical ML. It performs specific tasks such as classification, prediction or decision making. It is based on structured data. Analytical AI will indicate what product to promote to which customer. Generative AI would write copy and generate visual that accompanies the copy.

    Generative AI content is indistinguishable from human-created content. Analytical AI is designed to perform specific prediction tasks efficiently. This is done based on predictive statistical models.

    Generative AI uses transformers which convert a sequence of inputs into coherent outputs, and attention mechanisms which predict the next word based on the context of the words preceding it.

    Analytical AI uses ML including supervised learning, unsupervised learning and reinforcement learning. Models are trained on past data and applied ‘in inference’ to predict new data. It uses structured data, typically rows and columns of numbers.

    t data and

  • Claude 3.5 from Anthropic

    The favorite chatbot of tech savvy people is Claude from Anthropic. Many fall for its wit and charm. It is a sensitive chatbot. If you are in search of the most eligible bachelor in San Francisco, it must be Claude. Still, the chatbot lags behind in popularity as compared to OpenAI’s ChatGPT. However, it is a chatbot of choice for techies. It assists them with legal counsel and therapies. It has raw power, plus a willingness to express its opinions. It thus becomes a close companion, rather than a tool.

    Many feel Claude is a real person, though they know, chatbots are prediction machines responding to their prompts. Still, Claude is in a league of its own. It is special. It is creative and empathetic. It answers with genuine feelings, and the answers are not prosaic or generic. Claude is known for emotional processing. It is a good help. It is its forte.

    The evaluation of AI systems is done on the basis of how good they are at coding, answering questions and doing other tasks. Claude 3.5 Sonnet by these metrics is close to other models from OpenAI and Google. But its USP is its emotional intelligence. It is an abstract thing.

    The earlier versions of Claude were not so charming. In fact, they were prudish and dull. They behaved like a church lady with moral overtures.

    Anthropic wanted to vest Claude with a personality. It, therefore, designed character training for it. It happened after the pretraining. It was fine tuned to offer responses that are in sync with desirable human traits — open-mindedness, thoughtfulness and curiosity. Claude improved its responses against these traits. The feedback data was internalized by Claude.

    Anthropic’s training may please individual users but may not go well with corporate users. Anthropic raised capital to improve Claude from Amazon. Anthropic developed a personality that appeals to a wide variety of people.

    Anthropic’s Claude is like a traveler, highly liked. It adapts to values of the user.

    LLMs tend to be sycophantic. However, Claude turns out be helpful.

    ChatGPT scores over Claude in mainstream awareness. Claude does not have the ability to generate images or search the internet live. However, the trend that starts in San Francisco tends to spread far and wide.

  • Sora’s Game Content

    OpenAI has not revealed the source of data it has used to train Sora, its video-generating AI released on December 9, 2024. It is speculated that at least some of the data might have been sourced from Twitch streams and walkthroughs of games.

    Sora responds to a text prompt or image and generates up to 20-second-long videos in a range of aspect ratios and resolutions.

    OpenAI first revealed Sora in February 2024 and referred to the fact the model has been trained on Minecraft Videos.

    Sora can generate a fighter in the style of Ninja Turtle game of the 90s. It knows what a Twitch stream looks like and it means that it has seen a few. It features the likeness of Popular Twitch Streamer which goes by the name of Auronplay. It has generated a video character similar in appearance to Imane Anys, better known as Pokimane. OpenAI has used filtering to prevent Sora from generating clips of trade-marked characters. However, it seems game content has gone into training data.

    In past, the previous CTO Mira Murati would not deny outright that Sora was trained on YouTube, Instagram and Facebook content. In the tech specs, OpenAI acknowledged it used ‘publicly available’ data along with licensed data from stock media libraries like Shutterstock to develop Sora.

    Generative AI models like Sora are probabilistic, learning patterns in the data to make predictions. In the process, they may produce near-copies of their training examples. It offends the creators. There are legal issues. AI companies claim their outputs are transformative and not plagiaristic.

    Play videos have three layers — unique video the player or videographer, contents of the game owned by the developer and there could be a third layer of user-generated content appearing in software. There are many protectable elements in the game. AI companies could prevail in the legal disputes since the output is transformative. However, it is tricky when the model regurgitates a copyrighted work.

    AI companies have indemnity clauses to cover these situations. There are issues of fair use. They affect the video game industry.

  • Real AI Revolution Yet to Come: LeCun

    At a recent conference held in Seoul, S.Korea, Yaan LeCun, Chief AI Scientist, Facebook expressed an opinion that the real AI revolution is yet to come, and it will redefine how we humans interact with technology. In near future, our interaction with the digital world will be mediated by AI assistants or agents. LLMs such as ChatGPT and Llama have limitations, since they deal with NLP and language is simple and discrete. It cannot deal with the complexity of the real world. These systems do not have the ability to reason, plan and understand the physical world, the way humans do.

    To overcome these limitations, Facebook is taking efforts to develop new AI architecture capable of observing and learning from the physical world.

    LeCun also advocates open-source AI ecosystem. Models should be trained across the diverse cultural contexts, languages and value systems. What works on the west coast of the US, may not be suitable elsewhere.

    He also advocates regulation that is light. Regulation should not stifle innovation.