Cached Transformer with a GRC

Here a transformer incorporates a memory cache to enhance its capacity to tackle long-range dependencies in sequences. In traditional transformers, there is always a struggle to capture relationships between distant elements in long sequences.

The key components of a cached transformer include a GRC or Gated Recurrent Cache. This component stores dynamically token embeddings based on their relevance and historical significance.

It serves as a differentiable memory cache. The model can attend to both current and previously seen information.

Tokenized embeddings are converted into numerical veotors. GRC processes these veotors. It stores relevant information in its cache. The transformer self-attention mechanism now can attend to both to current input tokens and cached historical information. It performs standard self attention on the combined input and cached representations. This processed information is further refined through normalization and feed-forward layers. The GRC constantly updates its cache based on current input tokens. It thus ensures that the stored information remains relevant.

Advantages:

There is improved handling of long-range dependencies. The model captures relationships between distant elements in long sequences. It enhances its performance, e.g. in language modelling, machine translation, image classification and instance segmentation.

It reduces computation cost as compared to traditional transformers. It is a promising advance in transformer architecture. Further research is expected to explore their full capabilities and potential impact on language and vision processing. Researchers from the Chinese University of Hong Kong, the University of Hon Kong and Tencent Inc. propose this innovative approach called Cached Transformers with a GRC.

NYT Suit against OpenAI and Microsoft

The basic idea behind copyright is to ensure that creators have an incentive to do new work. It also leaves some space for derivative work — say fair usage for criticism, comment, reporting, teaching, scholarship or research among others. Here a small sample of copyright work can be duplicated.

Of late AI is testing the boundaries of copyright. It generates music, visuals, lyrics and scripts, based on ingesting previous work of the creative people. On receiving a prompt, AI processes ingested material and delivers the outcome.

The New York Times has filed a suit against OpenAI and Microsoft alleging that both these companies have used LLMs that were trained on the copyright articles from NYT. This deprives NYT of the audience that could have reached it. The attempt substitutes the content without permission or payment. According to NYT, this is not fair usage since these models compete with NYT and closely mimic it by using the content to train them.

The lawsuit sites examples of articles reproduced from NYT word by word. It bypasses the subscription paywall, which is critical for its survival.

Even Bing search engine of Mircrosoft generates detailed summaries and excerpts from the articles. It is far beyond fair usage.

The NYT demands not only compensation and restriction but also the destruction of all tools and models that incorporate their work.

The issue that can be raised is — why the NYT did not block the access to its content. The answer is that ChatGPT went live in November 2022, and by that time it has already been trained on 175 billion parameters, and about 45 terabytes of data from various datasets. By the time this was realized, the model has already ingested the data.

The NYT points out that the information is public. However, it does not mean it is free to copy.

Apple, by contrast, is negotiating with publishers and offering them monetary compensation for licensing the content for training its AI tools. If this transaction happens, it will strengthen the NYT case.

The NYT case has been filed three months after Authors of Guild of America went to court against OpenAI.

Content created by someone is being used without either acknowledgement or payments. Both Authors Guild and NYT have accused the tech companies of freeriding.

Traditional AI used data for pattern recognition. These models were mostly predictive. Generative AI creates or generates content. It takes technology to another level. To do this, generative AI uses extraordinarily large dataset — ChatGPT-like apps use 45 terabytes. It is trained on content created by others. Its answers closely resemble the original content. It substitutes the original.

The issue is not stifling the innovation. It is the use of data without express permission and payment. The principle is pay for what you use.

This is an untested legal area.

The vernacular models being developed in India must respect copyrights. The government can frame laws to prevent freeriding. Original content creators must not be taken for a ride.

Controlling Superhuman Intelligence

Mankind faces problems. Innovations too pose challenges. It is paradoxical. Advanced mathematics is built on imaginary numbers. Blackholes have validated many laws, and still, they remain inscrutable. Similarly, AI too pages certain challenges. The basic challenge is how a superintelligent system of future would be controlled.

OpenAI has laid down Preparedness Strategy Framework in December 2023. It intends to adopt a scientific approach to assess catastrophic risk of any advanced AI system. The document describes processes to track, evaluate, forecast and protect against such risks.

If AGI is realized, it will require oversight.

AI sector has developed a concept of super alignment to deal with AGI. It is a holistic approach. It goes beyond technical specifications. It wants to consider the societal impact and ethical issues.

So far alignment was restricted to alignment of AI systems to human values (during training phase). Super alignment refers to continuous alignment throughout the life cycle of AI systems — including deployment, adaptation and evolution.

OpenAI suggests that a less potent LLM should serve as a proxy for human oversight of the more potent superintelligent AI.

OpenAI forecasts superintelligence could be a reality in the next 10 years.

OpenAI looked at how GPT 2 developed five years ago could supervise GPT-4, which is the latest LLM.

AI could be an existential threat to mankind. It is a dooms day scenario. It distracts from short-term risks of the present-day AI systems. Such risks include misinformation, bias, copyright violations and expensive compute. Industry should not be fixated upon doomsday scenarios. All such talk is highly hypothetical. The issue is how to deal with technology that currently exists. Of course, future possibilities cannot be ignored. But as Andrew Ng puts it — ‘Is there any engineering discipline where much attention is on hypothetical problems, rather than actual problems?’

AI is transformative. It has the potential to do much good.

Popular AI Tools

The year 2023 is a significant year for AI tools, though some of the tools have been released in 2022. Generative AI demonstrates its ability to perform certain tasks which were once thought to be impossible for computer algorithms to carry out — say engaging in conversations to creating mind-blowing images.

OpenAI has become a light bearer of AI. ChatGPT and Dall E 2 are being replicated by others. In April 22, the company released DALL E-2, its image generating tool. Later in November 2022, it released its chatbot ChatGPT powered by GPT-3 a large language model. Within five days of it release, it acquired one million users. It became the fastest app by garnering 100 million users by January 2023.

Let us acquaint ourselves with most popular AI tools.

ChatGPT: It stands for Chat Generative Pre-Trained Transformer. It is a chatbot which converses with us and generates natural language text in response to a user’s prompt.

It is a tool for content creation, is an alternative search engine, and is used for coding. There is an OpenAI API which businesses can use for various tasks.

Midjourney: It generates images from natural language descriptions called prompts. It has been created by a research lab called Midjourney Inc of the USA. The ‘Imagine’ command is used by the users to generate images based on their imagination. It is of great help to graphic and visual artists.

Bing: Bing is AI-powered search engine of Microsoft. It has been launched along with Edge browser in February 2023. The integration has enhanced its functionality. It uses key features of ChatGPT and GPT-3.5. Bing search engine rankings are also made more relevant.

Notion AI: It is a writing assistant. It is useful in editing, summarizing and brainstorming. It performs grammar check too. It helps marketers to write brand new pages extracting insights from databases and pages.

Runway Gen 2: It is a leading tool for video generation. It generates videos based on text prompts. It has incorporated other features — motion brush feature for animation in specifc parts of image, camera movement etc. It is useful for short films, music videos, animation ad design etc.

Co-pilot — GitHub: It is a tool from GitHub developed in collaboration with OpenAI. It is based on the OpenAI Codex model. There are code editors (Visual Studio Code) which are integrated. Developers get entire lines or blocks of code as they type. It speeds up development, streamlines repetitive tasks and provide assistance with syntax.

MusicLM: It is a Google product. Here the musical idea comes to life with AI. This algorithm creates two versions of song based on inputs.

ElevenLabs: It is a text-to-speech AI tool. It is handy for video creators, podcasters and businesses.

Framer: It is used for web site creation and publication.

Chandamama Kathalu: Telugu SLM

Chandamama was a popular magazine that told stories to children. Swecha, a non-profit organization, in collaboration with Ozonetel decided to tell those stories by developing a small language model (SLM) in Telugu. The SLM will be launched in January, 2024.

Let us understand the concept of SLM. The genesis of SLM lies in a paper authored by Microsoft research scientists titled Tiny Stories. SLMs are built on the same methology as any LLM, but its neural network is smaller, it has fewer parameters, and it is trained on smaller corpus of data.

Ozonetel who collaborated with Swecha decided to develop a Telugu SLM. They were assisted by IIIT, Hyderabad. There was a dataset of Telugu stories — some 40000 pages of stories, preprocessed by some 8000 students. The idea was to give children access to the kind of stories that used to appear in Chandamama Kathalu magazine. This magazine was in print till 2012. Chandamama was available in all Indian homes through the 1940s till 2012. It published long-running mythological and magical Indian stories.

After building a dataset, it is being assessed whether the data is to be tokenized. Tokens, as we know, are the basic units of text or code that a language model uses to process and generate language. Tokens could be words or parts of words or characters or other segments of text or code.

Microsoft soon after the release of their paper Tiny Stories developed an SLM using 21 million stories. This SLM was capable of generating coherent text. That gave Swecha a lot of hope.

Swecha also worked upon optical character recognition (OCR). It is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine encoded text. They used an open source OCR tool and converted 70 per cent of text. The remaining 30 per cent were typed out by students. They had a corpus of 45000 stories. Big Stories were added too. They generated half a million lines of text.

The corpus was uploaded on Hugging Face, so that other companies can use this dataset. They wanted to open up this dataset.

They are now researching if an LLM is to be built, what kind of tokenization is needed. They are interacting with IIIT. They will take four or five months before they have their LLM.

Flexible AI Norms

Countries across the world are working on regulatory frameworks for AI. Google advocates risk-based approach (instead of uniform rules) for AI applications. There should not be ‘one size fits all’ approach which hinders innovation.

Different AI models pose different risks, and the regulation should be framed in proportion to the risks posed. The regulation should be directed to the application level, rather than the technology level.

The application layer for generative AI means the stage where the technology is being deployed for use cases.

Google is doing continuous research on biases — what the bias means and how to address it. Basically, it can be addressed by training models on good data. Models should not be trained on unsafe data.

Of late, the government released an advisory that if there is bias in content generated by algorithmic research, search engines or AI models (such as ChatGPT and Bard), there will not be any protection under the safe harbour clause of Section 79 of the IT Act.

In order to reduce bias, there should be cross-border flow of trusted data. Such a flow will facilitate the use of diverse demographic data for training, and that is useful to address bias.

Indian government will share public data available with it with only those firms which have a proven track record and can be called trusted sources. Google supports this stand.

Wish You a Merry Xmas, 2023. OTT Platforms

OTT stands for over-the-top and has become a new way to consume the media. OTT delivers audio and video content directly to the audience through the internet. It thus bypasses the traditional channels such as cable, satellite and broadcast TV. Thus, the name over-the-top of the old school methods.

Content providers use Content Delivery Networks (CDNs) to distribute their offerings. CDNs have their servers all across the globe. The viewers access the content fast through streaming, regardless of the location. The content is accessed through internet-connected devices, say consoles, computers, smartphones, tablets.

There are apps or web interfaces. In fact, they act as gateways to content libraries. Here the users can browse, select and play what they want, e.g. YouTube or Netflix.

There are some key technologies involved. The content is encoded into formats suitable for different devices. They are adjusted for different internet speeds. The content is transcoded. There are streaming protocols. There is HTTP Live Streaming. There is Dynamic Adaptive Streaming over HTTP called DASH.

OTTs are different from linear TV which is seen by appointment. OTTs could be seen at your convenience. There is a wider choice from a library of content. It has diversity of content — different genres and niche interests. One is at liberty to choose a convenient subscription plan and cancel it anytime. The content becomes personalized since there are recommendatory algorithms and the play lists can be curated.

OTTs have disrupted traditional media landscape. It encourages independent content creators. It enables new methods of monetizing content.

Technological advances such as 5G and fiber internet promise faster streaming. There is personalization of content. It uses interactive techniques and immersive techniques such as AR/VR.

To Peak AI Should Go Beyond Transformer

We have been into a post-ChatGPT world for almost a year, while entering the coming year 2024. It is a short time to ask the question — has this technology of generative AI peaked?

Instead, this is the time to leverage the generative AI technology in different fields. Google has released its most awaited AI model Gemini in December 2023. It has come some nine months after GPT-4. It was expected that Gemini may push the envelope further. However, Gemini Ultra hardly inches ahead of GPT-4 on performance benchmarks. There is no model yet seen which beats GPT-4. Is at this the limit of LLMs? How can we jump from here to artificial general intelligence (AGI) that puts cognitive ability of the model on par with that of human beings? LLMs have taken us so far, but no further. Of course, there is a chance that a model will emerge eventually.

Transformer architecture used since 2017 scale up by increasing the number of parameters, and though OpenAI has not disclosed the parameters of its models, it has been estimated that GPT-3 has 175 billion parameters. LLMs scale linearly with the amount of data and compute. But such scaling is not practical. It is an expensive proposition. Thus, transformer architecture has limitations that prevent LLMs reach AGI.

Transformers are not good at generalizing. It affects their capability to reach AGI.

Something has to be conceived on top of a transformer which provides it some capacity of reasoning.

Generative AI Commercialization

The year 2023 was a year when generative AI such as ChatGPT entered into the collective consciousness of people all over the world. It was a ‘hype cycle’. The year 2024 will be the year for commercialization of generative AI. All across industries, there would be licensing of generative AI. Organizations plan to purchase generative AI software such as Microsoft Co-pilot. It will bring artificial intelligence more directly into the lives of the workers and custtomers.

Technology officers have shown deep interest to buy enterprise-level generative AI software such as Microsoft Copilot –almost half of them would buy generative AI software in the next six months. There are still many who has not yet made a spending decision. Very few are left behind who are not interested in this technology.

Large Language Models (LLMs) of 2023

We have been hearing about LLMs for a while, but they became a part of our consciousness in 2023. LLMs are the foundation of chatbots. Many big tech companies are now in race to build LLMs.

LLMs are advanced AI models which do NLP or natural language processing. They have been trained on massive corpus of data. They understand relationships between words. They are able to answer our queries. They can translate from one language to another language. They can generate text and are harbingers of generative AI. They can summarize a voluminous document into a concise format.

LLMs are now becoming multi-modal, and are trained on not only text, but on images and audio.

Let us learn about the LLMs available in 2023.

GPT-4: It has been released in March 2023 by OpenAI. It has become the current benchmark. It processes both text and images. Its training methodology has not been revealed. It has a trillion plus parameters. (six times the parameters of GPT-3, based on 175 billion parameters). It has been fine-tuned on Reinforcement Learning by Human Feedback method (RLHF method). This RLHF generated data are again used to train the model. This enhances its performance. It shows the least hallucinations. In November 2023, its new version called GPT-Turbo has been released. It is updated till April 2023 in terms of data. It can handle larger prompts.

Gemini: Google released Gemini multi-modal LLMs in 2023 in three versions — Nano, Pro and Ultra. Its chatbot Bard has underlying LLM Gemini Nano. A separate article has been written on Gemini.

GPT-3.5: It was released towards the end of November 2022 by OpenAI. It is the underlying model for ChatGPT. Since Google has now released Ultra (Gemini), a new version of brand will appear called Bard Advanced. Gemini Pro intermediates between GPT-3.5 and GPT-4. GPT-4 handles only text. It hallucinates more. ChatGPT plus works on GPT-4.

Llama 2: It has been releases by Facebook in March 2023. It is an open-source AI model. There is a model with 7 billion parameters and another with a 70 billion parameters. GPT-4 outperforms Llama-2 or Google’s PaLM-2.

PaLM-2: It has been launched by Google in May 2023. It is very powerful. It has 540 billion parameters. It has reasoning capabilities. It has been trained on 100 languages. The older version of Bard was based on PaLM-2.

Claude-2: It has been developed by Anthropic, founded by former OpenAI employees. Claude 1 has been released in July 2023. It has huge context-length. (The number of words a model considers in its input). Claude-2 is a new version released in November 2023. It has higher context length than GPT-4.

Mistral 7B: A Paris-based startup Mistral has built not a larger language model, but a niftier language model. Mistral 7B was released in September 2023. Another version Mistral 8x7B has been launched. It is a watered-down version of GPT-4. It completes with Llama-2 of Facebook.