We have already discussed RAG — retrieval augmented generation. It accesses external knowledge sources to respond to user queries. However, we do need a more nuanced, complex and adaptive RAG. The traditional vanilla RAG has its limitations. Agentic RAG has now emerged. It is an advanced architecture — it combines the foundational principles of RAG with autonomy and flexibility of AI agents.
Vanilla RAG is a linear pipeline. The user queries are processed through retrieval. It struggles with flexibility. There is no iterative refinement. Agentic RAG addresses these shortcomings. Agents act autonomously. They coordinate complex task — planning, reasoning with multiple steps and tool utilization. The retrieval system becomes dynamic.
Agents are incorporated at various stages of RAG pipeline. Agents decide whether external knowledge is required. They select apt retrieval tools — vector search, web search, APIs. It formulates queries customized for the task. Agents after retrieving data validate the data. Agents can resolve queries with accuracy and speed from internal sources, documentation and community fora. It is similar to the fine-tuning of an LLM.
The architecture is not confined to a single agent. It can use multiple agents.
RAG has its limitations. An agentic RAG may not respond since information is not available in database. It is a waste of compute. In addition, it does not scale with more compute.
Google has moved to RIG retrieval interleaved generation.