Retrieval Augmented Generation (RAG)

In text embedding, what is encoded is the semantic information such as concepts, geographic locations, persons, companies, objects and so on.

In RAG applications, what is encoded are the features of company’s documents. Each embedding is stored in vector store. (Here there is recording and comparing of embeddings). While making inference, the application computes embeddings of new prompts. These are sent to vector database. Those documents are retrieved from the database whose embedding values are closest to that of the prompt. LLM generates responses based on these documents.

It is a simple mechanism that customizes LLMs to respond to proprietary documents or information that was not included in the training data.

Retrieval is a core step towards augmenting an LLM with relevant context.

Vector embeddings make it possible to work with any unstructured or semi-structured data. Semantic search is just one example. Dealing with data other than text such as image, audio and video is another big topic. Customer feedback can be categorized by embeddings.

LLMs, though popular, are known for their hallucinations. Here they generate fallacious responses which can be factually incorrect. That is where RAG or Retrieval Augmented Generation comes to our rescue. It combines the power of retrieved material with the generative creativity of the model. Let us understand this concept.

RAG retrieves facts from an external knowledge base to keep LLMs updated. It supplements internal representation of information. The model’s answers are cross-referenced with original content. RAG obviates the need to train the model continuously.

It is like subjecting an LLM to an open book exam. The LLM browses a document, as opposed to trying remembering facts from its memory.

There are two phases — retrieval and content generation. In retrieval, algorithms search to retrieve information relevant to a user’s query or prompt.

RAG came to the notice of developers after a paper ‘RAG for Knowledge-Intensive NLP Tasks‘ was published in 2020 by Patrick Lewis and his team at Facebook AI Research.

RAG makes LLM more efficient by tapping additional data resources, without retraining. The answers are timely and contextual. Chatbots become smarter by using RAG.

If RAG is to be implemented, there should be vector databases which allows rapid coding of new data, and searches against that data is fed into the LLM.

In absence of RAG, an LLM uses semantic search where the search limited to deep understanding of specific words/phrases used in the prompt. The search is based on key words. As it is too literal, it may miss information. Semantic search goes beyond keyword-based search. Semantic search is an integral part of RAG.

print

Leave a Reply

Your email address will not be published. Required fields are marked *