Blog

Retrieval Augmented Generation (RAG)

In text embedding, what is encoded is the semantic information such as concepts, geographic locations, persons, companies, objects and so on.

In RAG applications, what is encoded are the features of company’s documents. Each embedding is stored in vector store. (Here there is recording and comparing of embeddings). While making inference, the application computes embeddings of new prompts. These are sent to vector database. Those documents are retrieved from the database whose embedding values are closest to that of the prompt. LLM generates responses based on these documents.

It is a simple mechanism that customizes LLMs to respond to proprietary documents or information that was not included in the training data.

Retrieval is a core step towards augmenting an LLM with relevant context.

Vector embeddings make it possible to work with any unstructured or semi-structured data. Semantic search is just one example. Dealing with data other than text such as image, audio and video is another big topic. Customer feedback can be categorized by embeddings.

LLMs, though popular, are known for their hallucinations. Here they generate fallacious responses which can be factually incorrect. That is where RAG or Retrieval Augmented Generation comes to our rescue. It combines the power of retrieved material with the generative creativity of the model. Let us understand this concept.

RAG retrieves facts from an external knowledge base to keep LLMs updated. It supplements internal representation of information. The model’s answers are cross-referenced with original content. RAG obviates the need to train the model continuously.

It is like subjecting an LLM to an open book exam. The LLM browses a document, as opposed to trying remembering facts from its memory.

There are two phases — retrieval and content generation. In retrieval, algorithms search to retrieve information relevant to a user’s query or prompt.

RAG came to the notice of developers after a paper ‘RAG for Knowledge-Intensive NLP Tasks‘ was published in 2020 by Patrick Lewis and his team at Facebook AI Research.

RAG makes LLM more efficient by tapping additional data resources, without retraining. The answers are timely and contextual. Chatbots become smarter by using RAG.

If RAG is to be implemented, there should be vector databases which allows rapid coding of new data, and searches against that data is fed into the LLM.

In absence of RAG, an LLM uses semantic search where the search limited to deep understanding of specific words/phrases used in the prompt. The search is based on key words. As it is too literal, it may miss information. Semantic search goes beyond keyword-based search. Semantic search is an integral part of RAG.

20th January 2024
AI Still Untested

AI is the buzz word. In conversations, it is AI, AI and more AI. The CEOs at the World Economic Forum, Davos (2024) tell that generative AI still has a lot to prove. It is necessary to arrive at the real value of AI in different sectors. Though ChatGPT is a great conversational tool and can be used as a search engine, it is still not ‘enterprise ready’. There are issues of hallucinations and biases. The solutions proposed by AI are to be within regulatory framework and compliance requirements. C-suite executives would still like to wait till AI goes beyond hype.

Of course, the summarization and message drafting skill of generative AI could be put to good use. As AI companies believe, AI could make contributions to productivity. Still, this has to be seen on company’s P&L account and balance sheet. Sales is one area where AI could make a positive contribution.

In medicine, AI has a limited role at present. It could be a personal assistant to a physician. It can facilitate medical transcriptions. However, it cannot replace a physician. Here we are dealing with human lives. AI can be used for drug design. This could be its most vital use.

19th January 2024
Marc Benioff Counters Sam Altman

Sam Altman calls it fair use of data drawn from copyrighted material in books or media while training the AI models. And the models do not reproduce the same material verbatim. Marc Benioff, Chief Executive, Salesforce and the owner of Time magazine disagrees with Sam. According to Benioff, the AI companies have ripped off the IP to build their technology. In a Bloomberg interview at Davos (2024), he calls the training data bluntly as stolen data.

Time too is in negotiations with OpenAI to license the content. Several other companies are doing so. OpenAI has signed an agreement with Associated Press to access some of the news from its archives. It has also reached a deal with a German media company. They are trying to arrive at a fair price for the data. The idea is to standardize payments. Salesforce markets its AI-powered software, and it includes a trust layer to prevent misuse of customer data.

19th January 2024
Open Knowledge and Trade Secrets of LLMs

Most of the expertise to develop LLMs is available in research and open-source projects. However, certain aspects are kept as closely guarded secrets.

What is open and shut in case of LLMs are algorithms such as transformers and attention mechanism, the datasets and architecture of the model.

What we can call trade secrets are the fine-tuning techniques the companies use for training it for a specific task, the proprietary software used to give it an edge and the creative use of the model for product development, drug development, protein research etc. Even the exact number of parameters are trade secrets. The training to optimize a model is a close secret. Hardware custom made could be a trade secret.

Though datasets are publicly available, the specific selection of data for training, its cleaning and preprocessing augment the performance of an LLM. Last but not the least, the AI team, the pool of manpower used, makes a difference.

It should be noted that in this developing field, what is trade secret today becomes common knowledge tomorrow.

19th January 2024
Brand Building of Microsoft’s Copilot

As we know, Microsoft has made investment in OpenAI, and its Copilot relies on ChatGPT. Microsoft has infused AI into its various products and has emerged as a leader in AI. Of late, Microsoft is trying to lessen its dependence on OpenAI products to reduce its vulnerability.

Though Microsoft and OpenAI’s partnership is mutually beneficial, Microsoft would like to protect its position in the market if there happens to be an upheaval at OpenAI such as the ouster of Sam Altman, though what started in the weekend was settled right on Monday. The patch up saved Microsoft stock from nosediving. Though the whole episode ended happily, it exposed the Achille’s Heel of Microsoft. The C-suite at Microsoft wanted to set matter right.

There are plans to push Copilot to other backend LLMs. It is sensible to make Copilot survive without OpenAI, if necessary

As we have observed here, there are legal suits against OpenAI of copyright infringement. An adverse court ruling can affect both OpenAI and Microsoft.

In the new models of PCs and laptops, there would be dedicated AI hardware. Microsoft continues to develop its own in-house AI models (Phi-2). Microsoft calls it SLM. It has the potential to reduce Microsoft’s reliance on ChatGPT.

Microsoft would like to integrate open source LLMs into Co-pilot. The users would not know if Copilot is being powered by OpenAI or some other model.

Microsoft does not mind using its exclusive rights to AI infrastructure of OpenAI, but it withdraws a bit on the partnership. The distinction being made between Copilot and ChatGPT is a smart move. Microsoft wants to build Copilot brand and wants to reinforce its association with AI. At present, there is a feeling that Copilot is playing second fiddle to ChatGPT.

Copilot has been integrated to windows 11. In some displays, it starts by default. However, still it is not a killer app. As time rolls on, Copilot will be further integrated, and the users will feel the difference. Microsoft would like Copilot to get the brand recognition it deserves.

18th January 2024
LLMs Run Short of Data

LLMs have ingested the data on Internet and have now run short of data. In future, LLMs will depend upon visual and sensory data. LLMs will become multi-modal.

LLMs have billions of parameters, and they have become very powerful. They will get further enriched by images, audio and video.

Some are developing LLMs in specialized areas, e.g. one LLM is being designed to solve math and geometry problems. An LLM can be made by training it on medical data.

LLMs are asked to respond to prompts such as the camels have wings, and they are flying in the skies in Dubai. It is interesting to see the output of camels flying near the tallest building Burj Khalifa.

18th January 2024
Last Technology Humans Ever Invent

An ex-employee of Open-AI Zack Kass is of the opinion the AI will have tremendous impact on our lives. He is optimistic that AI will play a role in solving global problems. Kass was one of the first 100 employees of OpenAI. The decision to leave the job was a tough one for him. At his home in Santa Barbara, he keeps promoting AI.

There are concerns about AI — doom and gloom. However, AI is a bright technology that brings more joy, and no suffering. He may be considered naive, but he does not mind wearing this label.

AI will have an impact on culture. It will affect business. It will affect medicine and education. AI should be harnessed for ethical considerations. AI-powered teachers in the second and third world, AI-powered physicians will bring about positive changes. Life sciences and biosciences will get exciting.

AI could be the last technology humans ever invent and has the potential to provide us more fulfilling lives. There is less suffering.

The doomsday predictors are worried about the killing effect Ai may exert.

There are four risks — idiocrasy, identity displacement, alignment problem and bad actors.

If AI solves most of the problems, what the human brains will do? They could be focused on hard problems. It is not necessary to think that human brains will cease to evolve. This is idiocrasy.

Some people pass their whole lives in pursuing a craft or trade, say ironsmiths and textile workers. Automation affects their work, and they lose their identities. This is identity displacement.

The third risk is that of existentialism. It is a reasonable apprehension. Can a model be trained to align with human interest? Of course, it can be done. The models henceforth will be examined and must meet the international alignment standards. You cannot allow AI to enslave you.

There is lack of faith in the basic goodness of human beings. This idea has been internalized. One cannot anthropomorphize AI.

Some issues deter the building of AGI. A compute deficit, an energy deficit and wrong policy. Then we think about the concept of AI versus AI. There is an arms race. AGI, if aligned properly with human interests, it will force the actors to strive for the right things.

17th January 2024
New Thinking on AI

At World Economic Forum, Davos (2024), in conversation with Bloomberg, Sam Altman says that in a reasonably close future, artificial general intelligence (AGI) will come, but he counters the thinking this would disrupt the world too much. It will not take over society.

OpenAI, since its foundation in 2015, has a mission to achieve AGI with the backing of Microsoft, and has a been valued privately at $100 billion. OpenAI is interested in developing this technology safely.

Sam was also in talk with Bill Gates previously. He dismissed the suggestion that tech startups are being run by youngsters who are 24. Many of the startups have been set up by people with experience and maturity and have people in their thirties, forties and even fifties. OpenAI too has many people in their forties and fifties.

IMF’s economist Gita Gopinath says the effect of AI on India will be relatively less as significant number of people are still in the agricultural sector. In advanced economies, about 60 per cent jobs may be affected by AI. This number will be 40 per cent in emerging markets and 26 per cent in the low-income countries.

17th January 2024
AI vs. Regulators

Silicon Valley is not known for its co-operation with regulators. They hold veiled contempt for those who require explanation of concepts. They firmly believe any regulation of a technology will result in failure. They are unaware that China effectively controls emerging technologies.

Ultimately, there could be over-regulation by those who do not have any stake in the technology. Judiciary force-fits the technology into regulatory structures.

European Union has adopted a wide-spread approach to regulate AI. French president Macron is its harshest critic. Instead of promoting the technology, the approach is to protect the consumers of this technology. EU may lag behind in AI and could remain a laggard for a long time.

NYT has sued OpenAI and Microsoft in the US for copyright violation, since the training data was used either without permission, payment or acknowledgement.

AI companies will have to scale back its profit sources. The companies will have to enter into agreements with publishers. This indicates that their case is lacks strong legal grounds. Judges may write restrictions on AI use and training.

Innovation in AI is happening in the USA. US itself may have a comprehensive regulatory approach. This is a classic but defining clash.

16th January 2024
Embeddings and Vectors

Vector embeddings refer to numerical representations of data. Each data point is represented by a vector in high-dimensional space. Here embeddings and vectors are the same thing.

Vector is an array of numbers with a specific dimensionality. Embeddings refers to the technique of representing data as vectors. These capture the underlying structure or properties of data.

Vector embeddings are created through an ML process. A model is trained to convert any pieces of data into numerical vectors.

A dataset is selected. It is preprocessed. A neural network model is selected that meets our data goals. Data is fed into the model. The model learns patterns and relationships within the data. (there is adjustment of internal parameters). To illustrate, it learns words that often appear together. The model after learning generates numerical vectors. Each data point (say a word or an image) is represented by a unique vector. At this point, the model’s effectiveness can be assessed by its performance on specific tasks or asking humans to evaluate it. If the embeddings are functioning well, it can be put to work.

Word embeddings can have dimensions ranging from a few hundred to a few thousand. Humans cannot visualize such a diagram. Sentence and documents embeddings may have more dimensions.

Vector embeddings are represented as sequence of numbers. Each number in the sequence corresponds to a specific feature or dimension and contributes to the overall representation of the data point.

The actual numbers within the vector are not meaningful on their own. The values and relationships are relative.

Applications of Vector Embeddings

They are used in NLP. They are used in search engines. They are used in personalized recommendation systems. They are used for visual content. They are used for anomaly detection. They are used in graph analysis. They are used in audio and music.

15th January 2024