Blog

  • AI Regulation

    EU has taken an initiative to regulate AI by approving The AI Act in European Parliament, which is to be sent to the European Council.

    The approach of Europe is risk-based regulation. The greater the risk of AI, the higher the restrictions.

    It is clarified that generative AI companies will have to disclose the data used in training the systems.

    It proposes to ban the use of AI where it affects the livelihood, safety and rights of people. It also disallows real-time facial recognition and biometric identification in public.

    India is concerned about this subject, but has to start consultations with the stakeholders. India’s Digital India Bill may bring some principles and guardrails for AI.

    There could be three aspects — state interventions, industry-based certification and code of conduct.

    The issue is the MNC tech platforms are interconnected and operate across geographical boundaries.

    Europe may get the first mover advantage. It is for the rest of the world to decide how far to align with it or deviate from it. Europe’s GDPR — General Data Protection Regulation has become a benchmark at which the rest of the world looks.

    Real-time surveillance systems should be banned. So should facial recognition in public places.

  • In-context Learning

    Building LLMs or large language models is done several ways. One can train the models from scratch. One can fine tune the open source models or one can use hosted APIs.

    Many developers start with in-context learning. In-context learning makes the models usable off the shelf-no fine tuning is necessary. Their behaviour is managed through clever prompting and conditioning (on private contextual data).

    If you are building a chatbot to answer questions about pharmaceuticals, all the relevant documents could be pasted in ChatGPT or GPT-4 prompt. At the end, a question is put forward. This works for very small datasets. It is not scalable. At the most, 50 pages of input text can be processed. The performance deteriorates when measured by inference time and accuracy. Thus there is a context window.

    In-content learning addresses this issue cleverly. All the documents are not sent with each prompt to LML. Only a few relevant documents are sent.

    And the relevance is decided by the LLM itself.

    We can divide the work-flow at the higher level into three stages.

    Data pre-processing/embedding

    Here the data about pharmaceuticals is stored. The data can be retrieved later. Typically, the documents are split into chunks of text. These chunks are embedded. They are then stored into a vector database.

    Prompt construction/retrieval

    Here the user submits a query — say what are the antibiotics? The application constructs a series of prompts to be submitted to the LLM. A compiled prompt is a combination of a prompt template coded by the developer, illustrations of valid outputs (few-shot illustrations), information retrieved from the APIs, and relevant documents retrieved from vector database.

    Prompt execution/inference

    After the compilation of the prompts, these are submitted to a pre-trained LLM for inference (proprietary model APIs and open source self-trained models). Some developers may add operational systems such as logging, caching and validation at this stage.

    All this seems to be heavy work. It is, however, easier than the alternative training or fine tuning the LLM itself.

    In-context training can be accomplished without a team of specialized ML engineers. No infrastructure is to be hosted. Not necessary to buy dedicated instance from OpenAI or Google.

    The pattern reduces AI problem to a data engineering problem. For small datasets it works fine, and even outperforms fine tuning.

    The bigger issue is : what happens if the context window is expanded? It is possible. It is an area of research. There are, however, trade offs-of cost and time.

  • Translatotron 3 : Speech-to-Speech Translation

    Machine translation (MT) deals with one important area, speech-to-speech translation, (S2ST). In this area, Google is a significant player. It introduced the S2ST system for the first time in 2019. An improved version was put in 2021. DeepMind researchers put a third iteration of S2ST in a paper published in May, 2023. It called Translatotron 3.

    The preceding version Translatotron 2 too was very efficient.

    The present version is unsupervised end-to-end model for direct speech-to-speech translatron.

    This model is not trained in two languages as done conventionally. This model on its own finds consistent patterns and regularities in the given data. In the training phase, the model learns one language speech-text datasets. It relies on unsupervised cross lingual embeddings in both languages. These embeddings are mapped in shared space through self-learning.

    Initially, the model learns the structure of both the languages separately. The learning is extended to find a common ground. It understands to link to and relate to the qualities of both the languages. It leads to cross-lingual embeddings which initializes a shared encoder. The encoder can handle both languages equally well.

    Further improvements in the model are attributed to masked autoencoder. In encoding, this tool is provided for a part of the data. During the decoding stage, it has to infer or predict the hidden information. The model, in other words, is pushed into the guessing game.

    Additionally, the model uses back-translation technique as a self-check. It ensures coherence and accuracy in translation.

    Conventionally, S2ST used the pipeline of automatic speech recognition + machine translation (MT) + text-to-speech synthesis. Translatotron relies on different architecture. It maps source language speech to target language (there is no reliance on reliance on intermediate representation). It becomes more effective.

    It also captures, so claim the researchers, the NVC gestures.

  • Human Element Indispensable in AI

    Generative AI models such as Bard and ChatGPT depend on human-generated content for training. Thus these models do depend on human element to succeed. Researchers from prestigious institutions such as Imperial College of London, Oxford, Cambridge conducted a study called ‘The Curse of Recursion : Training on Generated Data Makes Models Forget.’ LLMs do face a major threat in future.

    If LLMs are trained on AI-generated content, and not the human-generated content, there are significant risks. Their reliance is on the existing data. Data that has been originally created by the human beings.

    If Bing is asked about drones, it will present in its answer the material it has collected from articles written by human beings. The data could be in the form of papers, books, photos.

    If the models rely on AI-generated content for presenting the information, there are adverse effects. It could be called a ‘model collapse’. The models deviate from reality and become corrupted. There is a deteriorative process in which models gradually forget the true underlying data distribution. In long term learning, this process may set in.

    The errors in AI-generated content will multiply as time passes. There is a cumulative effect. The future drafts are more distorted.

    Such a model collapse can perpetuate a model bias on sensitive attributes such as ethnicity gender, complexion etc.

    To curtail the risks in future, it is necessary to preserve original human-generated content and data.

    At present, there is no fool-proof method to distinguish between AI-generated and human-generated data.

    In future human-generated data is likely to become a valuable resource for training the AI models.

  • Backpropagation and Gradient Descent

    In a neural network, the neurons take inputs, multiply them with weights, and add a bias value. All this is run through an activation function.

    Neurons generate output, and that becomes the input for the other neurons. The output neurons in the last layer produce the output, of the network. This is a feed-forward.

    Neural network has input neurons, the hidden neurons, and the output neuron.

    Let us consider the idea of backpropagation with gradient descent. The whole network is treated as a multi-variate function. The loss function calculates a number which denotes how well the network performs. (here output is compared to the know good results).

    The set of input data coupled with desired good results is called the training set. The loss function is designed to increase the number value(as the networks behaviour moves further away from correct).

    Gradient descent algorithms take the loss function and use partial derivatives to determine what each variable (weights and biases} in the network contributes to the loss value. It then slides backwards visiting each variable and adjusting it to decrease the loss value.

    Calculus of Gradient Descent

    Concepts from calculus are necessary to understand gradient descent.

    Derivative as a notion must be understood. It gives the slope (or rate of change) for a function at a single point. In other words, the derivative of a function gives the rate of change at a given input.

    Partial derivative is another concept. It gives us a multi-dimensional or multi-variable function. It just isolates one of the variables to find the slope for the given dimension.

    What is the rate of change (slope) of a function at a specific point? Derivatives can answer this question. Given multiple input variables to the equation, what is the rate of change for just one variable? Partial derivatives answer this question.

    Gradient descent utilizes these ideas. Each variable of an equation is visited. It is adjusted to minimize the output of the equation. This is our training goal. If the loss function is graphically plotted, the movement incrementally is towards the minimum of a function. We want to find the global minimum.

    The size of the increment is known as the learning rate in ML.

  • Coke Studio

    Coke Studio is a well-recognised music property of Coca Cola. It was initiated as Coke Studio Pakistan in 2008 as a TV series featuring established and emerging artists from various music genres. Coca Cola company partnered with Rohail Hyatt to create Pak version of the show. The first season was premiered in June 2008. These artists collaborated live in studio sessions. The show later shifted to a closed studio format which remains its format to this day. Coke Studio Pakistan was available later across channels and on YouTube. Rohail Hyatt ran it till season six. Strings took over thereafter for four seasons.

    Coke Studio was first launched in India in 2011 on TV (MTV India and DD National). Some seasons were also aired on Big FM and AIR. The first season was produced by Leslie Lewis. The later seasons had various other producers. The show continued till 2015. Coke Studio then remained dormant for some years. After an eight year hiatus, it stages a comeback in February, 2023. It has been launched in two avatars — Coke Studio Bharat and Coke Studio Tamil. This time around, the format is digital friendly, and the songs can be experienced across devices and platforms — TV, YouTube or audio OTTs such as Spotify, Gaana, Saavan, Wynk Music and Audible.

    India is a country which is crazy for film music. Coke Studio has to stand on its own in such an environment.

    Coke Studio in 2023 has been created as Coke Studio Bharat by Ankur Tiwari and Kausar Munir. It covers the indie music scene.

    Coke Studio is located at Churchgate , Mumbai.

    Coke Studio is an extension of Coca Cola’s real magic philosophy. Music has an ability to unite and uplift. It is the connection point. There is a cultural connect. It is a new experience.

    Coke Studio is monitored by technology and data analytics. No doubt, Coke Studio Bharat and Coke Studio Tamil are creative properties, but the money to create them comes from Coca Cola’s advertising budget. Therefore, these properties must deliver in terms of weekly-plus audiences — audiences who are picking up the bottle on a weekly basis. However, the company looks at it on a long term basis. One cannot equate a song to an ad, and expect it to deliver as an ad would. Every song is not an ad.

    Coke bottle or packs are their biggest real estate. Each bottle has been turned into a portal that can transport a person straight into Coke Studio. The bottle carries a dynamic QR code, which on scanning, offers an AR, 360 degree experience of the show. It also has a karaoke option. To enter the studio, one needs to feed in one’s phone number. It facilitates tracking — how many people picked up a bottle of c Coke, and how often.

    Most of the people who take the AR experience are youngsters below 25 years of age.

    The content is updated every month. It tracks the consumer’s liking for the content. There is then data available to change the content or media metric. It also indicates those takers who come back to the franchises. The impact of Coke Studio is measured through consumer engagement.

    Lot of song discovery happens through the shorts format. Such as Reels. The research team has changed its approach, and intend to leverage shorts to strategize.

    It is not a typical ad model. Here the artist is at the centre. He may not like any tweaks to the creative composition. The artists have their own eco-system that gives them the feedback.

    In a person’s usual day, there are breaks and there are meals. Coke wants to be a part of both.

    To make the experience physital or physical, the company intends to hold concerts.

    Coke Studio adds fizz to Coca Cola.

  • Memory-Efficient Zeroth Order Optimizer (MeZO)

    LLMs which are pre-trained are fine tuned to adapt to specialized domains, accommodate human instructions and cater to individual preferences. LLM is adjusted on a smaller and domain specific dataset.

    Scaled up LLM models are demanding computationally to fine tune. It is memory intensive too for the process of backpropagation.

    Princeton University researchers have addressed the memory issue by developing a new optimizer –MeZO, a memory-efficient zeroth order optimizer. It is an adaptation of traditional ZO-SGD method of estimating gradients. ZO method can estimate gradients using only two forward passes. Thus it is memory-efficient. It is a modification of ZO-SGD method. MeZO can improve non-differentiable goals ( such as accuracy or F1 score) still using the same amount of memory as inference. MeZO outperforms zero-shot, ICL and linear probing in experiments.

    While evaluating, MeZO was able to train a 30-billion parameter model using a single NVidia A 100 80 GB GPU whereas backprapogation can only train a 2.7 billion parameter language model with the same memory constraints.

  • Software Bots

    Traditionally, robots have been imagined as physical robots. However, there are software bots too which mimic the work life. They are used in Robotic Process Automation (RPA). If what is being done is a process with a structure, you can use tools such as bots. These can be created in a short time to do the routine work. They take pleasure in calling the bot their own — they in fact name it. There are software tools to discover the process. It helps an organisation to learn which processes can be automated. To begin with, they identify 5-10 processes, but soon the utility of the automation and use of bots dawns on them, and they are ready to have hundreds and thousands of bots.

    Bots often imitate or replace a human user’s behaviour. They can be used to automate certain tasks (they run without specific instructions from humans).

    Bots can help you comply with the regulations. They can make submissions to the authorities. They can do booking for a travel company. They can operate machines for recipes in a quick response restaurants which formerly used to rely on chefs.

    Some examples of bots are chatbots, web crawlers, social bots and malicious bots.

    Bots have taken over mundane tasks. They do them faster and more accurately.

  • JEPA : Break from the Generative AI

    Yann Lecun is one of the godfathers of AI, along with Bengio and and Geoffrey Hinton. Lecun is the chief scientist at Facebook. At an event in Paris, he declared that generative AI is overhyped, and has reached a dead end. He is into developing new AI models that would show human-like rationality.

    Lecun is aware that computers lack common sense. He talked about the image-based Joint Embedding Predictive Architecture (JEPA) which will surpass the capabilities of generative AI. Machines will be able to conceptualise abstract ideas. At present, they just spew out the existing online information.

    He feels in a few years generative AI models will be replaced. He feels that AI should align with human understanding of the world. The models should perceive the world and make predictions.

    Facebook has adopted generative AI for its platforms discreetly. It has released open source AI models that require less computing power than generative AI.

    Facebook is spearheading the development of new AI models which will replicate human rationality.

  • New Gold Rush in California

    There was Gold Rush to California of 3 lac people from the rest of the US in 1848 when Marshall found gold here. It was a rapid influx of fortune seekers. It reached its peak in 1852. Of late, Silicon Valley, a region between San Francisco and San Jose, experienced dooms days, when people left for greener pastures elsewhere, say Miami, LA, New York or Puerto Rico. There was pandemic and people left the Bay Area. The headlines were the disaster of crypto currencies, discouraging share prices and the fall of Silicon Valley Bank that funded startups.

    There is always a cloud with a silver lining. That came as AI. To be specific, generative AI. It started the new Gold Rush to California. Some 300 enthusiasts and entrepreneurs gathered for Generative AI Meeting of the Minds in May 2023 at Shack15, a swanky social club at the second floor of Ferry Building, San Francisco.

    The mood was elevated by the host Peter Leyden, a futurist. He assured the gathering that with the advent of AI, somethings new is cracking open. The whole experience made tech circles buoyant once again.

    AI has become the talk of the town. The industry had beat a retreat, but has now come forward to offer AI solutions.

    Sam Altman’s OpenAI came up with ChatGPT and Google with Bard. There are image softwares, Dall-E and Midjourney. Altman was assisted financially by Microsoft — $10 billion were poured into OpenAI.

    Generative AI brought the gravitas back to Silicon Valley. It all started with a group of 8 researchers at Google who laid the foundation of this with 2017 Vaswani’s and others paper called Attention Is All That You Need. OpenAI was taking cognizance of this development.

    Long sequence of data or chunks of text were processed. Each word was weighed in relation to what preceded it. The grammatical structures were considered. It was a breakthrough when computers predicted the next word.

    Bay Area at the valley was home to Big Tech, apart from Seattle in the north. Big Tech had hired AI talents for years. Now Google and DeepMind are combining. There are many startups who are mission-driven, and not salary-driven. They expect to be on par with OpenAI in next five years. AI has in fact brought a revival at Silicon Valley. Palo Alto offices of Character AI are creating chatbots.

    Employees here process the data in front of high-power computers. These people have practised what OpenAI is doing today.

    Of course, there are AI skeptics. AI models lack accuracy. There are issues of bias. There could be an AI hangover or a meltdown. However, AI is not another bubble. Young engineers and entrepreneurs are making it more and more power-packed.