Blog

  • Flexible AI Norms

    Countries across the world are working on regulatory frameworks for AI. Google advocates risk-based approach (instead of uniform rules) for AI applications. There should not be ‘one size fits all’ approach which hinders innovation.

    Different AI models pose different risks, and the regulation should be framed in proportion to the risks posed. The regulation should be directed to the application level, rather than the technology level.

    The application layer for generative AI means the stage where the technology is being deployed for use cases.

    Google is doing continuous research on biases — what the bias means and how to address it. Basically, it can be addressed by training models on good data. Models should not be trained on unsafe data.

    Of late, the government released an advisory that if there is bias in content generated by algorithmic research, search engines or AI models (such as ChatGPT and Bard), there will not be any protection under the safe harbour clause of Section 79 of the IT Act.

    In order to reduce bias, there should be cross-border flow of trusted data. Such a flow will facilitate the use of diverse demographic data for training, and that is useful to address bias.

    Indian government will share public data available with it with only those firms which have a proven track record and can be called trusted sources. Google supports this stand.

  • Wish You a Merry Xmas, 2023. OTT Platforms

    OTT stands for over-the-top and has become a new way to consume the media. OTT delivers audio and video content directly to the audience through the internet. It thus bypasses the traditional channels such as cable, satellite and broadcast TV. Thus, the name over-the-top of the old school methods.

    Content providers use Content Delivery Networks (CDNs) to distribute their offerings. CDNs have their servers all across the globe. The viewers access the content fast through streaming, regardless of the location. The content is accessed through internet-connected devices, say consoles, computers, smartphones, tablets.

    There are apps or web interfaces. In fact, they act as gateways to content libraries. Here the users can browse, select and play what they want, e.g. YouTube or Netflix.

    There are some key technologies involved. The content is encoded into formats suitable for different devices. They are adjusted for different internet speeds. The content is transcoded. There are streaming protocols. There is HTTP Live Streaming. There is Dynamic Adaptive Streaming over HTTP called DASH.

    OTTs are different from linear TV which is seen by appointment. OTTs could be seen at your convenience. There is a wider choice from a library of content. It has diversity of content — different genres and niche interests. One is at liberty to choose a convenient subscription plan and cancel it anytime. The content becomes personalized since there are recommendatory algorithms and the play lists can be curated.

    OTTs have disrupted traditional media landscape. It encourages independent content creators. It enables new methods of monetizing content.

    Technological advances such as 5G and fiber internet promise faster streaming. There is personalization of content. It uses interactive techniques and immersive techniques such as AR/VR.

  • To Peak AI Should Go Beyond Transformer

    We have been into a post-ChatGPT world for almost a year, while entering the coming year 2024. It is a short time to ask the question — has this technology of generative AI peaked?

    Instead, this is the time to leverage the generative AI technology in different fields. Google has released its most awaited AI model Gemini in December 2023. It has come some nine months after GPT-4. It was expected that Gemini may push the envelope further. However, Gemini Ultra hardly inches ahead of GPT-4 on performance benchmarks. There is no model yet seen which beats GPT-4. Is at this the limit of LLMs? How can we jump from here to artificial general intelligence (AGI) that puts cognitive ability of the model on par with that of human beings? LLMs have taken us so far, but no further. Of course, there is a chance that a model will emerge eventually.

    Transformer architecture used since 2017 scale up by increasing the number of parameters, and though OpenAI has not disclosed the parameters of its models, it has been estimated that GPT-3 has 175 billion parameters. LLMs scale linearly with the amount of data and compute. But such scaling is not practical. It is an expensive proposition. Thus, transformer architecture has limitations that prevent LLMs reach AGI.

    Transformers are not good at generalizing. It affects their capability to reach AGI.

    Something has to be conceived on top of a transformer which provides it some capacity of reasoning.

  • Generative AI Commercialization

    The year 2023 was a year when generative AI such as ChatGPT entered into the collective consciousness of people all over the world. It was a ‘hype cycle’. The year 2024 will be the year for commercialization of generative AI. All across industries, there would be licensing of generative AI. Organizations plan to purchase generative AI software such as Microsoft Co-pilot. It will bring artificial intelligence more directly into the lives of the workers and custtomers.

    Technology officers have shown deep interest to buy enterprise-level generative AI software such as Microsoft Copilot –almost half of them would buy generative AI software in the next six months. There are still many who has not yet made a spending decision. Very few are left behind who are not interested in this technology.

  • Large Language Models (LLMs) of 2023

    We have been hearing about LLMs for a while, but they became a part of our consciousness in 2023. LLMs are the foundation of chatbots. Many big tech companies are now in race to build LLMs.

    LLMs are advanced AI models which do NLP or natural language processing. They have been trained on massive corpus of data. They understand relationships between words. They are able to answer our queries. They can translate from one language to another language. They can generate text and are harbingers of generative AI. They can summarize a voluminous document into a concise format.

    LLMs are now becoming multi-modal, and are trained on not only text, but on images and audio.

    Let us learn about the LLMs available in 2023.

    GPT-4: It has been released in March 2023 by OpenAI. It has become the current benchmark. It processes both text and images. Its training methodology has not been revealed. It has a trillion plus parameters. (six times the parameters of GPT-3, based on 175 billion parameters). It has been fine-tuned on Reinforcement Learning by Human Feedback method (RLHF method). This RLHF generated data are again used to train the model. This enhances its performance. It shows the least hallucinations. In November 2023, its new version called GPT-Turbo has been released. It is updated till April 2023 in terms of data. It can handle larger prompts.

    Gemini: Google released Gemini multi-modal LLMs in 2023 in three versions — Nano, Pro and Ultra. Its chatbot Bard has underlying LLM Gemini Nano. A separate article has been written on Gemini.

    GPT-3.5: It was released towards the end of November 2022 by OpenAI. It is the underlying model for ChatGPT. Since Google has now released Ultra (Gemini), a new version of brand will appear called Bard Advanced. Gemini Pro intermediates between GPT-3.5 and GPT-4. GPT-4 handles only text. It hallucinates more. ChatGPT plus works on GPT-4.

    Llama 2: It has been releases by Facebook in March 2023. It is an open-source AI model. There is a model with 7 billion parameters and another with a 70 billion parameters. GPT-4 outperforms Llama-2 or Google’s PaLM-2.

    PaLM-2: It has been launched by Google in May 2023. It is very powerful. It has 540 billion parameters. It has reasoning capabilities. It has been trained on 100 languages. The older version of Bard was based on PaLM-2.

    Claude-2: It has been developed by Anthropic, founded by former OpenAI employees. Claude 1 has been released in July 2023. It has huge context-length. (The number of words a model considers in its input). Claude-2 is a new version released in November 2023. It has higher context length than GPT-4.

    Mistral 7B: A Paris-based startup Mistral has built not a larger language model, but a niftier language model. Mistral 7B was released in September 2023. Another version Mistral 8x7B has been launched. It is a watered-down version of GPT-4. It completes with Llama-2 of Facebook.

  • Natural Language Processing Techniques

    As we know by now, LLMs are useful in natural language processing. The text is linguistic data, and there is always pre-processing of this data by using a number of techniques.

    Tokenization

    A token is a word segment. It is a vital step to divide the text into tokens — lengthy strings of text are dissected into more manageable and meaningful units. Tokens are the building blocks of NLP. It provides structured framework.

    Stemming and lemmatization

    After tokenization, there is stemming and lemmatization. These processes distill the root form of words from their morphological variations. To illustrate, ‘stick’ can appear in various forms — stuck, sticking, sticks, unstuck. We ignore the prefixes and suffixes while stemming. Lemmatization leads us to a root form of a word (commonly called lemma). It surpasses the limitations of stemming and identifies, the root word. Stemming vs lemmatization is considered for ‘change’. The various forms could be changing, changes, changed and changer. The stemming gives us ‘chang’. However, lemmatization leads as to ‘change’.

    Morphological Segmentation

    Some words are monomorphic, say table, lamp consisting of a single morpheme. Some words have more than one morpheme — sunrise which have two morphemes ‘sun’ and ‘rise’. The fusion of these two morphemes will lead to the holistic understanding of the word meanings.

    Unachievability has four morphemes — ‘un’, ‘achiev’, ‘abil’ and ‘ity’.

    Morphological segmentation prepares the text for subsequent analysis.

    Stop Words Removal

    This is a crucial pre-processing step. Here we eliminate extraneous linguistic elements which do not contribute much to the meaning of the text. Such words are ‘and’, ‘because’, ‘under’ and ‘in’. These are filler words.

    Marketingganga — A marketing portal for market savvy — this has stops. Without stops, it would read Marketingganga, marketing, portal, market, savvy.

    I like reading, so I read. This is with stops. Remove the stops and it would read — Like, Reading, Read.

    Text Classification

    Text classification has a number of techniques to organize vast quantities of unprocessed (textual) data.

    The ultimate aim is to convert unstructured data into structured format.

    Sentiment Analysis

    It is also called emotion AI or opinion mining. It examines user-generated content. It can be leveraged to address evolving needs and enhance consumer experience.

    Topic Modelling

    Here the underlying themes and topics of the text are identified. It operates as an unsupervised ML process. The topics within the corpus are identified and categorized. The essential key words can be extracted while sifting the document. It identifies a subject of interest within a textual dataset.

    Text Summarization

    Here the text is condensed into a cohesive summary. It is either extraction-based or abstraction-based.

    Parsing

    It unravels a grammatical framework. In parsing we come across Named Entity Recognition (NER). It extracts information that identifies ‘named entities’. Here it uses pre- defined key works.

    Then there are TF-IDF. It is an acronym for term frequency-inverse document frequency. It is a statistical methodology. It assesses significance of words within a document (relative to a bunch of documents). A word pervasive in all documents, attracts a lower score, even though its occurrence is frequent.

  • Verses Claims Breakthrough in AI

    Verses is a US-based company developing AI systems patterned after ‘wisdom and genius of nature’. It has set up a billboard outside OpenAI headquarters expecting a tie-up and has published an open letter in the New York Times dated 19th December, 2023 claiming a breakthrough that could lead to more advanced form of AI.

    The efforts of researchers are now directed to artificial general intelligence (AGI) which could match human capability or exceed it. That goal is called a singularity, later leading to superintelligence — AI outperforming the human being. Big Tech is in the race to develop AGI, which could benefit humanity. OpenAI’s founding mission written large on its website is to create AGI that benefits the whole of humanity.

    Verses assertion in the billboard and its open letter do not disclose technical details but talks of a technical breakthrough in Active Inference. Sentience in living organisms can be comprehended by a mathematical framework. It is this framework that could unlock AGI (Active Inference is a book authored by Karl Friston an acclaimed neuroscientist, 2022). Active Inference outlines ‘free energy principle’.

    Verses breakthrough could make the current models more reliable, efficient and aligned with human goals. Sam Altman of OpenAI knows that LLMs are not capable to push AI models into AGI. He believes another breakthrough is required.

    Verses letter also talks about safety concerns. It considers itself as worthy of attention, and volunteers to develop general intelligence and superintelligence in collaboration, with safety concerns and beneficial effects in mind.

  • Satellite Internet: Spectrum Licensing

    In the new Telecom Bill, the issue of whether to auction satellite broadband spectrum or offer it at administrative prices has been resolved in favour of the administrative prices. Amongst the telecom operators OneWeb of Bharati pushed for administrative prices, whereas Jio of Reliance demanded an auction. DOT preferred auction and TRAI demanded larger consultation. Indian Space Association (ISpA) which represents key satellite players, along with support from Nelco of Tatas, Kuiper of Amazon and Starlink of Musk are of the opinion that administrative allocation is a global trend, and India should not be an outlier.

    In terrestrial operations, the operators require earmarked frequencies with no interference. Here the frequency is given on auction. In satellite services, the spectrum is globally shared by satellites. These work mostly within a particular band. Spectrum use is coordinated by satellite companies through a dynamic automated system on a good faith basis. Hence spectrum should be given administratively. Countries such as Thailand, Mexico and Brazil tried auctions but shifted to administrative allocation. Sharing makes spectrum use efficient. Startups will also get access to the spectrum, and they could provide connectivity to fisherman at sea. Expensive auction route will stifle the existence of these startups.

    Jio opposes the move of administrative prices route. Telecom operators who pay for spectrum offer broadband services. Satellite players who pay administrative prices would also offer the same broadband services but without paying any cost on the spectrum. It is thus not a level playing field. There is another possibility. The early entrants in satellite communications will be given preferred orbital slots by International Telecom Union (on a first come, first served basis). The newer ones will be at a disadvantage.

    DOT sent a reference to TRAI in September 2021, and TRAI initiated a consultation process. TRAI has not yet stated its final views. However, the government has decided to offer spectrum through administrative route.

  • Krutrim: India’s Chatbot

    Ola’s cofounder and CEO Bhawish Agrawal launched Krutrim, an LLM and generative AI platform on the lines of ChatGPT and Bard. It is an indigenous model. It has 2 trillion tokens or pieces of textual information, which represent Indian data.

    Krutrim is derived from Sankrit, and means artitificial. There are two models — a base model and a Pro model. The current model answers queries and prompts from people. It can understand 22 Indian languages and can generate text in 10 Indian languages including Gujarati, Marathi, Hindi, Bengali, Tamil, Kannada, Telugu, Malayalam and Odia. Krutrim pro will be launched in the first quarter of 2024.

    After sign up, Krutrim will be available in batches, and it will be open for all users hopefully by January, 2024.

    Developers will access Krutrim APIs.

    India being a multi-cultural and multi-lingual country, the currently available LLM models are not able to capture the unique nature of India. Krutrim is India’s own AI.

    Krutrim’s training is on India specific data sets. Krutrim is also working on creating AI cloud infrastructure. It wants to work on AI compute by developing GPU chips. Krutrim’s architecture has multiple chipsets to power different AI infrastructure, models and applications. Krutrim is already being used by Ola group of companies. Krutrim is faster on Indian languages, and generates responses in less time, using less compute. In English, it outperforms Llama2 of Facebook. Krutrim’s design was launched in 2023.

    Indian startup Sarvam has also launched OpenHathi, the first Hindi LLM. Krutrim’s launch comes on the heels of it.

  • Generative AI’s Contribution to GDP

    Generative AI could contribute $1.2 to $1.5 trillion to India’s GDP over the next seven years (EY report). Generative AI has the potential to speed up India’s digital transformation.

    The most promising sectors that would adopt Gen AI are the services (IT, legal, consulting, outsourcing, rental machinery and equipment and others), financial services, education, retail and healthcare.

    Such adoption would result into enhanced productivity, operational efficiency and personalized engagement with customers.

    In 2029-30 alone, generative AI could add $359-438 billion to India’s GDP. It indicates an increase of 5.9-7.2 per cent above the baseline GDP.

    These are early days. There are challenges while adopting AI — skill gap, clear use cases, risks of data privacy.

    AI-first approach is becoming acceptable. It leads to digital transformation.

    AI regulation in the initial stages should be light touch. It should be responsive. There has to be a balance between innovation and risk management. There should be regulatory sandboxes. AI-generated content could be watermarked. There should be standards for accountability to build trust in the AI systems.

    AI-systems could be offered as public goods. Conducive environment must be provided — 5G, data centers, access to chip, AI specific compute, access to talent, public funding of R&D.