Overcoming GPT Token Limit

The content window is the amount of information the model can receive and its response to the user. The sum of received and created information is the content that model can operate with.

ChatGPT has the content window of 4 thousand tokens. GPT-4 has 8 thousand tokens. GPT-3.5 Turbo has 16 thousand tokens. These are not enough to load a book or a web-site. Tokens are pieces of words that are used as inputs to AI model. Before processing the prompt, the AI model breaks down input into tokens. Each token will not correspond to the start and end of the word. Tokens include trailing spaces and even sub-words. The number of tokens processed in a single API request depends on the length of the input-output text. One token is roughly equivalent to 4 characters or 0.75 words for English text.

The context window for GPT is the number of previous words the model factors in while generating the next word. The larger the context window, the more context the model has to generate the next word. The default context window for GPT is 1024 tokens.

How to overcome this?

Vector Indexes

Suppose you have 50 documents with information of 50 thousand tokens. It is 35-40 thousand characters. This information that the model has should be used to answer the query.

To accomplish this, all these documents will be split into chunks, say we get 20 pieces of 2000 characters each. These 20 pieces must be converted into vectors. When the user puts a question, that is transformed into a vector also. We then make use of cosine distance and find the vectors of the document pieces closest to the question vector. The search is for the most suitable vectors where the information on the topic is likely to be contained.

The last step is to use the vectors of these pieces into text. It is added to GPT content. Then a question that the user has put forward is asked again.

In short, vector search is a tool that enables you to add only relevant information from all the data one has loaded to the model’s context. This overcomes the contextual window limit in GPT.

Google’s Ad Business

Google’s online advertising business is most lucrative. It generated 80 per cent of its total revenue, and amounted to $225 billion in 2022.

Google’s main source of revenue is online advertising. Directly it sells ad space on its own website and apps. Secondly, it intermediates between advertisers and publishers (third party web-sites and apps) that can supply such space.

Google’s advertising technology is being abused — the ad exchange programme favours Google over its rivals. In each ad tech supply chain, the company plays a central role. It charges high fees for its service. Google is present at almost all levels of the so-called ad tech supply chain.

The above case of EU Anti-trust body, is a direct attack on the blackbox of online advertising. Google automatically calculates and offers ad space and prices to advertisers and publishers as a user clicks on a web page. There were three earlier cases against Google where it was fined for abuses of dominance. Such charge sheets can pave the way for as much as 10 per cent of a firm’s global sales. However, they seldom approach that level, and hence the impact on the earnings of Silicon Valley firms is often muted.

EU’s anti-trust arm appealed to Google to come forward with solutions. It has ordered Google to break up ad business.

Google’s ad technology helps advertisers and publishers. They rely upon it for real-time ads (not linked to a search query). For example, banner ads in news web-sites.

There are three tools. First, publisher ad servers for publishers to manage ad space on their web-sites and apps. Second, ad buying tools for advertisers to manage automated ad campaigns. Third, ad exchanges for meeting of the publishers and advertisers to buy and sell ads. Google operates ad buying tools — Google Ads and DV 360. Google operates a publisher ad server — DoubleClick for publishers or DFP and it also operates an ad exchange — AdX.

The EC has taken view that Google breaches anti-trust rules by distorting ad tech. It has publisher ad servers– DFP, programmatic ad buying tools under Google Ads, and favours its own exchanges AdX using tools Google Ads and DV 360.

European Union and US anti-trust regulators agree on one thing — the era of Google’s dominance in ad technology must end.

Since 2014, Google has favoured its own advertising exchange platforms by abusing access to information on rival bids for ad space. It has also harmed other ad exchanges by placing bids for advertising on its own platforms.

Google is active on both sides of the market. It has the publisher ad server. It also has ad buying tools. Thus it holds a dominant position on both the selling and buying side. It also operates the largest ad exchange. There are conflicts of interest in this situation.

Google acquired firms for 15 years to dominate the market. Its 2007 acquisition of online advertising giant DoubleClick deal was worth $3.1 billion.

European Union anti-trust regulators feel that Google must break up. It is a viable option for the California-based tech giant’s alleged monopoly abuses. Of course, there are legal obstacles to the breaking up. It does not mean a legal battle is inevitable. The Commission could be swayed by Google’s arguments or accept a settlement. Google has pointed out that breaking its tech suite would diminish the availability of free, ad-supported, content that benefits everyone.

Google has already entered into an agreement with the French competition regulator, and the company can lean on that to convince the regulators in Washington and Brussels of a less intrusive remedy to the alleged abusive behaviour.

AI Regulation

EU has taken an initiative to regulate AI by approving The AI Act in European Parliament, which is to be sent to the European Council.

The approach of Europe is risk-based regulation. The greater the risk of AI, the higher the restrictions.

It is clarified that generative AI companies will have to disclose the data used in training the systems.

It proposes to ban the use of AI where it affects the livelihood, safety and rights of people. It also disallows real-time facial recognition and biometric identification in public.

India is concerned about this subject, but has to start consultations with the stakeholders. India’s Digital India Bill may bring some principles and guardrails for AI.

There could be three aspects — state interventions, industry-based certification and code of conduct.

The issue is the MNC tech platforms are interconnected and operate across geographical boundaries.

Europe may get the first mover advantage. It is for the rest of the world to decide how far to align with it or deviate from it. Europe’s GDPR — General Data Protection Regulation has become a benchmark at which the rest of the world looks.

Real-time surveillance systems should be banned. So should facial recognition in public places.

In-context Learning

Building LLMs or large language models is done several ways. One can train the models from scratch. One can fine tune the open source models or one can use hosted APIs.

Many developers start with in-context learning. In-context learning makes the models usable off the shelf-no fine tuning is necessary. Their behaviour is managed through clever prompting and conditioning (on private contextual data).

If you are building a chatbot to answer questions about pharmaceuticals, all the relevant documents could be pasted in ChatGPT or GPT-4 prompt. At the end, a question is put forward. This works for very small datasets. It is not scalable. At the most, 50 pages of input text can be processed. The performance deteriorates when measured by inference time and accuracy. Thus there is a context window.

In-content learning addresses this issue cleverly. All the documents are not sent with each prompt to LML. Only a few relevant documents are sent.

And the relevance is decided by the LLM itself.

We can divide the work-flow at the higher level into three stages.

Data pre-processing/embedding

Here the data about pharmaceuticals is stored. The data can be retrieved later. Typically, the documents are split into chunks of text. These chunks are embedded. They are then stored into a vector database.

Prompt construction/retrieval

Here the user submits a query — say what are the antibiotics? The application constructs a series of prompts to be submitted to the LLM. A compiled prompt is a combination of a prompt template coded by the developer, illustrations of valid outputs (few-shot illustrations), information retrieved from the APIs, and relevant documents retrieved from vector database.

Prompt execution/inference

After the compilation of the prompts, these are submitted to a pre-trained LLM for inference (proprietary model APIs and open source self-trained models). Some developers may add operational systems such as logging, caching and validation at this stage.

All this seems to be heavy work. It is, however, easier than the alternative training or fine tuning the LLM itself.

In-context training can be accomplished without a team of specialized ML engineers. No infrastructure is to be hosted. Not necessary to buy dedicated instance from OpenAI or Google.

The pattern reduces AI problem to a data engineering problem. For small datasets it works fine, and even outperforms fine tuning.

The bigger issue is : what happens if the context window is expanded? It is possible. It is an area of research. There are, however, trade offs-of cost and time.

Translatotron 3 : Speech-to-Speech Translation

Machine translation (MT) deals with one important area, speech-to-speech translation, (S2ST). In this area, Google is a significant player. It introduced the S2ST system for the first time in 2019. An improved version was put in 2021. DeepMind researchers put a third iteration of S2ST in a paper published in May, 2023. It called Translatotron 3.

The preceding version Translatotron 2 too was very efficient.

The present version is unsupervised end-to-end model for direct speech-to-speech translatron.

This model is not trained in two languages as done conventionally. This model on its own finds consistent patterns and regularities in the given data. In the training phase, the model learns one language speech-text datasets. It relies on unsupervised cross lingual embeddings in both languages. These embeddings are mapped in shared space through self-learning.

Initially, the model learns the structure of both the languages separately. The learning is extended to find a common ground. It understands to link to and relate to the qualities of both the languages. It leads to cross-lingual embeddings which initializes a shared encoder. The encoder can handle both languages equally well.

Further improvements in the model are attributed to masked autoencoder. In encoding, this tool is provided for a part of the data. During the decoding stage, it has to infer or predict the hidden information. The model, in other words, is pushed into the guessing game.

Additionally, the model uses back-translation technique as a self-check. It ensures coherence and accuracy in translation.

Conventionally, S2ST used the pipeline of automatic speech recognition + machine translation (MT) + text-to-speech synthesis. Translatotron relies on different architecture. It maps source language speech to target language (there is no reliance on reliance on intermediate representation). It becomes more effective.

It also captures, so claim the researchers, the NVC gestures.

Human Element Indispensable in AI

Generative AI models such as Bard and ChatGPT depend on human-generated content for training. Thus these models do depend on human element to succeed. Researchers from prestigious institutions such as Imperial College of London, Oxford, Cambridge conducted a study called ‘The Curse of Recursion : Training on Generated Data Makes Models Forget.’ LLMs do face a major threat in future.

If LLMs are trained on AI-generated content, and not the human-generated content, there are significant risks. Their reliance is on the existing data. Data that has been originally created by the human beings.

If Bing is asked about drones, it will present in its answer the material it has collected from articles written by human beings. The data could be in the form of papers, books, photos.

If the models rely on AI-generated content for presenting the information, there are adverse effects. It could be called a ‘model collapse’. The models deviate from reality and become corrupted. There is a deteriorative process in which models gradually forget the true underlying data distribution. In long term learning, this process may set in.

The errors in AI-generated content will multiply as time passes. There is a cumulative effect. The future drafts are more distorted.

Such a model collapse can perpetuate a model bias on sensitive attributes such as ethnicity gender, complexion etc.

To curtail the risks in future, it is necessary to preserve original human-generated content and data.

At present, there is no fool-proof method to distinguish between AI-generated and human-generated data.

In future human-generated data is likely to become a valuable resource for training the AI models.

Backpropagation and Gradient Descent

In a neural network, the neurons take inputs, multiply them with weights, and add a bias value. All this is run through an activation function.

Neurons generate output, and that becomes the input for the other neurons. The output neurons in the last layer produce the output, of the network. This is a feed-forward.

Neural network has input neurons, the hidden neurons, and the output neuron.

Let us consider the idea of backpropagation with gradient descent. The whole network is treated as a multi-variate function. The loss function calculates a number which denotes how well the network performs. (here output is compared to the know good results).

The set of input data coupled with desired good results is called the training set. The loss function is designed to increase the number value(as the networks behaviour moves further away from correct).

Gradient descent algorithms take the loss function and use partial derivatives to determine what each variable (weights and biases} in the network contributes to the loss value. It then slides backwards visiting each variable and adjusting it to decrease the loss value.

Calculus of Gradient Descent

Concepts from calculus are necessary to understand gradient descent.

Derivative as a notion must be understood. It gives the slope (or rate of change) for a function at a single point. In other words, the derivative of a function gives the rate of change at a given input.

Partial derivative is another concept. It gives us a multi-dimensional or multi-variable function. It just isolates one of the variables to find the slope for the given dimension.

What is the rate of change (slope) of a function at a specific point? Derivatives can answer this question. Given multiple input variables to the equation, what is the rate of change for just one variable? Partial derivatives answer this question.

Gradient descent utilizes these ideas. Each variable of an equation is visited. It is adjusted to minimize the output of the equation. This is our training goal. If the loss function is graphically plotted, the movement incrementally is towards the minimum of a function. We want to find the global minimum.

The size of the increment is known as the learning rate in ML.

Coke Studio

Coke Studio is a well-recognised music property of Coca Cola. It was initiated as Coke Studio Pakistan in 2008 as a TV series featuring established and emerging artists from various music genres. Coca Cola company partnered with Rohail Hyatt to create Pak version of the show. The first season was premiered in June 2008. These artists collaborated live in studio sessions. The show later shifted to a closed studio format which remains its format to this day. Coke Studio Pakistan was available later across channels and on YouTube. Rohail Hyatt ran it till season six. Strings took over thereafter for four seasons.

Coke Studio was first launched in India in 2011 on TV (MTV India and DD National). Some seasons were also aired on Big FM and AIR. The first season was produced by Leslie Lewis. The later seasons had various other producers. The show continued till 2015. Coke Studio then remained dormant for some years. After an eight year hiatus, it stages a comeback in February, 2023. It has been launched in two avatars — Coke Studio Bharat and Coke Studio Tamil. This time around, the format is digital friendly, and the songs can be experienced across devices and platforms — TV, YouTube or audio OTTs such as Spotify, Gaana, Saavan, Wynk Music and Audible.

India is a country which is crazy for film music. Coke Studio has to stand on its own in such an environment.

Coke Studio in 2023 has been created as Coke Studio Bharat by Ankur Tiwari and Kausar Munir. It covers the indie music scene.

Coke Studio is located at Churchgate , Mumbai.

Coke Studio is an extension of Coca Cola’s real magic philosophy. Music has an ability to unite and uplift. It is the connection point. There is a cultural connect. It is a new experience.

Coke Studio is monitored by technology and data analytics. No doubt, Coke Studio Bharat and Coke Studio Tamil are creative properties, but the money to create them comes from Coca Cola’s advertising budget. Therefore, these properties must deliver in terms of weekly-plus audiences — audiences who are picking up the bottle on a weekly basis. However, the company looks at it on a long term basis. One cannot equate a song to an ad, and expect it to deliver as an ad would. Every song is not an ad.

Coke bottle or packs are their biggest real estate. Each bottle has been turned into a portal that can transport a person straight into Coke Studio. The bottle carries a dynamic QR code, which on scanning, offers an AR, 360 degree experience of the show. It also has a karaoke option. To enter the studio, one needs to feed in one’s phone number. It facilitates tracking — how many people picked up a bottle of c Coke, and how often.

Most of the people who take the AR experience are youngsters below 25 years of age.

The content is updated every month. It tracks the consumer’s liking for the content. There is then data available to change the content or media metric. It also indicates those takers who come back to the franchises. The impact of Coke Studio is measured through consumer engagement.

Lot of song discovery happens through the shorts format. Such as Reels. The research team has changed its approach, and intend to leverage shorts to strategize.

It is not a typical ad model. Here the artist is at the centre. He may not like any tweaks to the creative composition. The artists have their own eco-system that gives them the feedback.

In a person’s usual day, there are breaks and there are meals. Coke wants to be a part of both.

To make the experience physital or physical, the company intends to hold concerts.

Coke Studio adds fizz to Coca Cola.

Memory-Efficient Zeroth Order Optimizer (MeZO)

LLMs which are pre-trained are fine tuned to adapt to specialized domains, accommodate human instructions and cater to individual preferences. LLM is adjusted on a smaller and domain specific dataset.

Scaled up LLM models are demanding computationally to fine tune. It is memory intensive too for the process of backpropagation.

Princeton University researchers have addressed the memory issue by developing a new optimizer –MeZO, a memory-efficient zeroth order optimizer. It is an adaptation of traditional ZO-SGD method of estimating gradients. ZO method can estimate gradients using only two forward passes. Thus it is memory-efficient. It is a modification of ZO-SGD method. MeZO can improve non-differentiable goals ( such as accuracy or F1 score) still using the same amount of memory as inference. MeZO outperforms zero-shot, ICL and linear probing in experiments.

While evaluating, MeZO was able to train a 30-billion parameter model using a single NVidia A 100 80 GB GPU whereas backprapogation can only train a 2.7 billion parameter language model with the same memory constraints.

Software Bots

Traditionally, robots have been imagined as physical robots. However, there are software bots too which mimic the work life. They are used in Robotic Process Automation (RPA). If what is being done is a process with a structure, you can use tools such as bots. These can be created in a short time to do the routine work. They take pleasure in calling the bot their own — they in fact name it. There are software tools to discover the process. It helps an organisation to learn which processes can be automated. To begin with, they identify 5-10 processes, but soon the utility of the automation and use of bots dawns on them, and they are ready to have hundreds and thousands of bots.

Bots often imitate or replace a human user’s behaviour. They can be used to automate certain tasks (they run without specific instructions from humans).

Bots can help you comply with the regulations. They can make submissions to the authorities. They can do booking for a travel company. They can operate machines for recipes in a quick response restaurants which formerly used to rely on chefs.

Some examples of bots are chatbots, web crawlers, social bots and malicious bots.

Bots have taken over mundane tasks. They do them faster and more accurately.