Blog

  • Normalization of LLMs

    Large language models are normalized so as to ensure stable and efficient training. The commonly used normalizations are :

    1 Layer normalization Here the activations of each layer are independently normalized. Normalization is not extended to the entire batch. It is used for RNNs and transformers. (Here the batch sizes vary and are small).

    2 Batch normalization Here activations of each layer is adjusted and scaled to have zero mean to normalize and unit variance over the mini batch are normalized during training. It stabilizes and speeds up training (by reducing internal covariate shift).

    3 Instance normalization Activations are normalized across spatial dimensions independently for each sample. It is used for style transfer and image generation.

    4 Group normalization The channels are divided into groups. Activations within each group are normalized separately. It is useful when batch sizes are small or where batch normalization is not suitable (fine-tuning pre-trained models).

  • Beam Search

    In NLP, machine translation and language generation, beam search algorithm is commonly used as an extension of greedy search algorithm (which selects the most probable next token at each step of sequence generation).

    Instead of selecting only the top-scoring token at each step, beam search tracks a fixed number of candidates (beam width or beam size). At each step, the algorithm expands each candidate by considering all possible next tokens and sets aside top candidates based on their combined probability scores. It enables the algorithm to consider multiple potential sequences.

    The steps in short are initialization or starting with a single initial candidate sequence (just the start token). The next set of candidates is generated considering all possible next tokens for each existing candidate. Individual score for each candidate is calculated based on its probability. Top candidates are retained based on their scores (as determined by beam width). The expansion and pruning steps are repeated until a termination condition is met (reaching maximum length or generating an end token). The final candidate is selected based on some criteria (highest overall score or reacting a specific termination condition).

    The technique suffers from flaws like repetition or generic responses. There are other techniques such as length normalization, diverse beam search or nucleus sampling to overcome these issues.

  • Convolutional Neural Networks: CNNs

    CNNs are specialized deep neural networks. These have a grid-like topology, say visual imagery. They are useful in image recognition, classification, object detection and segmentation. They have the ability to learn hierarchical representation features. These are used for computer vision (CV) tasks and have revolutionized fields such as medical imaging and image recognition.

    Convoluted refers to convolution layers which perform the main operation. Here a small filter or kernel is slided over the input image. It computes the dot product between the filter and the portion of the image it is currently covering. The operation is repeated across the whole image. It produces a feature map that captures relevant patterns.

  • Recurrent Neural Networks: RNNs

    RNNs are another type of neural networks used for sequential data. They are used for time series analysis, NLP and speech recognition.

    Unlike feedforward neural networks, RNNs have connections which form directed cycles. They thus show dynamic temporal behaviour. They are able to process sequences of inputs and capture dependencies over time.

    RNNs have the ability to maintain a hidden state. It retains information about previous inputs seen in the sequence. The hidden state is updated at each time step. It allows the network to capture context and long-range dependencies in sequential data.

    RNNs suffer from vanishing gradient problem — it makes them struggle to capture long-term dependencies effectively.

    RNNs overcome this by Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). Here the flow of information and gradients are regulated.

  • Generative Adversarial Networks: GANs

    GANs are innovative algorithms used in unsupervised ML. It is implemented by two neural networks competing with each other. It is a zero-sum game network.

    GANs generate new data with the same statistics as the training set. (they can generate photographs that look authentic).

    GANs consist of two main parts — the generator that generates data and the discriminator that evaluates it.

    It is used in image generation, photorealistic image modification, art creation, generating realistic human faces.

  • Transformers

    These neural architectures are the foundation for NLP. It followed the 2017 paper of Vaswani et al titled Attention Is All That You Need. Transformers differ from RNNs and CNNs by avoiding recurrence and processing data in parallel, significantly reducing the training time.

    Here attention mechanism is utilized to weigh influence of different words on each other. Transformers have the ability to handle data sequences without the need for sequential processing. It makes them effective for various NLP tasks. They can do translation, text summarization and sentiment analysis.

    Transformers have achieved state-of-the art results in various NLP tasks. BERT is a variant. GPT is a variant. Transformer architecture consists of an encoder and decoder each composed of multiple layers of self-attention mechanism. This enables it to capture lang-range dependencies in input sequence.

    The encoder processes the input sequence. The decoder generates the output sequence. This architecture does not rely on recurrent connection. It is highly parallelizable. It is more efficient.

  • AI Regulation

    The European Union has taken the lead to introduce an Act that regulates AI. Broadly, AI has been analyzed in terms of risks it poses. At one extreme, there is AI that poses an unacceptable risk — say AI that affects people’s rights, biometric systems, facial recognition system, social scoring, predictive policing, system that manipulates human behaviour or exploits people’s vulnerabilities. Such AI is of course, prohibited. At the other extreme, there is minimal-risk AI systems — these could remain unregulated. What is actually regulated is AI that falls in between these two extremes — it is neither in unacceptable risk category nor in minimal-risk category.

    The European legislation establishes the EU AI Authority (nodal agency for implementation and enforcement of the AI Act). It has extra-territorial jurisdiction.

    Securities Market

    Generative AI has sneaked into the securities market. Some thirty years back, this area hardly had any use of technology. Technology entered the scene with some seriousness after the introduction of dematerialized shares. Gradually, we have reached a stage of T+O settlement cycle.

    AI is transforming the securities market. The issue is that of data privacy. AI generated algorithms spread into many sectors — including securities market. Regulators have to design laws that govern these technologies.

    Algo trading is carried out by automated means. SEBI suggests regulating algo trading. AI can write codes based on instructions fed to it. It bypasses an IT-trained programmer. The algo can violate the securities laws. Who is then accountable — whether the person who allowed AI to create the algo. The principle applied could be the person behind the machine. However, as AI advances, this could require a revisit.

    There is robot-advisory which may take center stage in near future. It analyzes vast amounts of data points very quickly. It may require a revision of the regulatory framework.

    Judicial Systems

    AI can be used for dispensation of justice by integrating it with our judicial system — say resolution of traffic offences or enforcement of securities laws. There is a focus on online dispute resolution. AI can serve as an arbiter or mediator.

    Pattern Recognition and Predictive Analysis

    AI models can recognize patterns, and by doing so predict the future. It all depends upon the data points AI has access to, and these are coded to think. Algos are deployed to track suspicious activities. SEBI has suggested use of blockchain technology to verify information and to ensure transparency. AI can safeguard investor interest. Such technology should be used with caution. There should be strict safeguards to prevent any misuse.

  • Microsoft’s New AI Model: MAI-1

    As we have already observed, Microsoft has released Phi-3 mini, and is on the way to release two more Phi-3 versions. According to Microsoft, these small versions attract a wider client base as they are cost effective options.

    As we know, OpenAI has been financially backed by Microsoft so that it can deploy ChatGPT infused software with generative AI to take the lead.

    These days Microsoft is training a new model (internally called MAI-1). This AI language model is large enough to compete with other rival models from Google and OpenAI. Its training is being overseen by Mustafa Suleyman, who previously worked for DeepMind and Inflection ÁI.

    This model being bigger will be more expensive. Microsoft is setting up a cluster of servers powered by Nvidia’s GPUs and is making available large amount of data to improve the model.

    Roughly, this model will have 500 billion parameters. GPT-4 has one trillion parameters. Phi-3 mini has 3.8 billion parameters.

    The exact purpose of this model has not been determined yet.

  • Rejuvenated Zoom

    Zoom is a communication technology company. During the pandemic, it facilitated communications within and across the companies. Zoom meetings are the heart of the platform.

    Apart from meetings, Zoom is becoming a collaboration platform. Here there are two components — Zoom Workplace (meetings, chats, mail, calendar, productivity, engagement tools) and Business Services (virtual agents, webinars, events, contact centers and integration of these applications). It is an omnichannel experience. AI cuts across all the offerings and functions. They have introduced Zoom AI Companion. It provides meeting summaries, synchronized chats, emails etc. It is available in various languages.

    Zoom’s market share is 50 per cent for video conferencing where it competes with Google Meet and Microsoft Teams. They allow users to fire up a Zoom Meeting right from a Teams chat or join a Google Meet from Zoom. It is a differentiator for them. The users can exercise their own choices of technology.

    India has two development centers of Zoom — one in Chennai and the other in Bangalore. Chennai center is for R&D. Bangalore center is for global tech support.

    They intend to bring Zoom phone to India. Here the users can transfer calls across devices. They can convert a voice call into a video conference without interruption. They also intend to set up a cloud contact center. They also want to bring in an employee engagement center.

    They have a varied clientele. LensKart video contact center allows the customers to come in touch with an eye specialist at the store. They are also working with pharma brands such as Glenmark. They have worked with Goa police to maintain law and order on New Year’s Eve.

    Zoom Voice, their platform for video, voice and chat, is useful where internet connectivity is an issue.

  • Generative AI and AGI

    While dealing with AI, we come across two concepts — generative AI and Artificial General Intelligence (AGI). Both the concepts are revolutionary enough to transform the world, but both are different.

    Generative AI is used to generate content. It is akin to a parrot repeating the human language. It understands the complex patterns of language and predicts the next word so as to create content. It does not understand language like a parrot. While dealing with images, it predicts the next stroke.

    A poet draws on his emotional reservoir to compose a poem. Generative AI depends on its vast database. Its writing is more mechanical than emotive. Generative AI is good at commercial work, economics and summarization. It fails to grasp complex human experiences and cannot perform those tasks for which it has not been trained.

    AGI is a big theoretical leap. Here a machine goes beyond tasks. It understands and initiates cognitive abilities of a human being. It can innovate and adapt. AGI can make a machine drive a car or do a medical diagnosis. Here the human tasks are replicated by understanding the context.

    AGI still remains chimerical. It does not exist right now. There is a lot of speculation about it. Some experts see AGI looming large over our shoulders. Some think is a distant dream.

    There are insurmountable technical hurdles in achieving AGI. There are issues of context and generalization. AGI has to be intuitive. It should grasp how different pieces of information relate to each other. It is not just processing power that you need. You also need artificial cognition. There should be connection of different disparate ideas and experiences.

    Human beings have sensory perception. They interact with the physical world. AGI will have to perceive environment. There should be recognition of things in the environment. The whole thing builds a context.

    Even with little information and data, AGI must adapt to different situations. It is called transfer learning.

    Current models just regurgitate information learnt. They do not go beyond their programming. There is a limitation to the capability of generative AI models.

    Generative AI has no real understanding but depends on algorithms and statistics. By contrast, AGI will have to develop understanding of the world around it.

    Generative AI is applied to raise productivity and generate content. AGI, as and when realized, will transform the world by autonomously working for tasks. AGI would be able to reason, learn and understand complex concepts just like humans.

    Super AI refers to AI that surpasses human intelligence. It will solve complex problems beyond human capabilities. It would learn and adapt at a rate faster than human intelligence. It is still a hypothetical concept. It is the ultimate goal of AI research.

    GPT-4 does not possess self-awareness or introspection abilities, which are essential components of AGI. AGI deals with consciousness and sentience. AGI is able to learn new skills and knowledge on its own just like a human (without explicit programming).

    GPT-5 is likely to go as close to AGI as we have ever been. OpenAI has been showcasing demos of GPT-5 to some enterprise customers.

    AGI could revolutionize many industries and solve complex problems in medicine, climate change and exploration.