Building LLMs or large language models is done several ways. One can train the models from scratch. One can fine tune the open source models or one can use hosted APIs.
Many developers start with in-context learning. In-context learning makes the models usable off the shelf-no fine tuning is necessary. Their behaviour is managed through clever prompting and conditioning (on private contextual data).
If you are building a chatbot to answer questions about pharmaceuticals, all the relevant documents could be pasted in ChatGPT or GPT-4 prompt. At the end, a question is put forward. This works for very small datasets. It is not scalable. At the most, 50 pages of input text can be processed. The performance deteriorates when measured by inference time and accuracy. Thus there is a context window.
In-content learning addresses this issue cleverly. All the documents are not sent with each prompt to LML. Only a few relevant documents are sent.
And the relevance is decided by the LLM itself.
We can divide the work-flow at the higher level into three stages.
Data pre-processing/embedding
Here the data about pharmaceuticals is stored. The data can be retrieved later. Typically, the documents are split into chunks of text. These chunks are embedded. They are then stored into a vector database.
Prompt construction/retrieval
Here the user submits a query — say what are the antibiotics? The application constructs a series of prompts to be submitted to the LLM. A compiled prompt is a combination of a prompt template coded by the developer, illustrations of valid outputs (few-shot illustrations), information retrieved from the APIs, and relevant documents retrieved from vector database.
Prompt execution/inference
After the compilation of the prompts, these are submitted to a pre-trained LLM for inference (proprietary model APIs and open source self-trained models). Some developers may add operational systems such as logging, caching and validation at this stage.
All this seems to be heavy work. It is, however, easier than the alternative training or fine tuning the LLM itself.
In-context training can be accomplished without a team of specialized ML engineers. No infrastructure is to be hosted. Not necessary to buy dedicated instance from OpenAI or Google.
The pattern reduces AI problem to a data engineering problem. For small datasets it works fine, and even outperforms fine tuning.
The bigger issue is : what happens if the context window is expanded? It is possible. It is an area of research. There are, however, trade offs-of cost and time.