In-Context Learning refers to the ability of attention-based neural networks such as transformers to predict the response to a query by learning from illustrative examples presented in context to them. To illustrate, we can give a few examples of translation matter from English to Marathi, and the transformer learns from it. It then translates on its own, though it was not trained previously for this. This is called ICL. An MIT study says that GPT-3 can learn a new task from a few examples without the need for any new training. There are smaller linear models inside the hidden layers, which get trained to complete a new task by using simple learning algorithms. A model infers using inputs without updating its weights to tackle problems not encountered in training. It is called In-Context Learning or ICL. It is the ability to infer from a short prompt of tokens from an unseen task. It formulates relevant per-token and next-token predictions. A model performs well by remembering exemplar-label meanings from context to make predictions.
GPT-3 demonstrated ICL post-training by auto-regression. Research indicates emergent ICL in transformers. It is influenced by linguistic data characteristics, e.g. burstiness and skewed distribution. Transformers learn in weight (IWL) when they are trained on data lacking these characteristics. Here data stored in model’s weight is used.
When training data is bursty, the objects appear in clusters and has a large number of tokes or classes, it is necessary to investigate ICL. ICL capability arises when training losses keep declining. Research indicates, though ICL is an emergent phenomenon, there is a possibility that it may last temporarily. This is called transience.
ICL as a concept is different from the concept of context-window which we have already explained. It is fixed-size window that slides over the input-sequence to capture the context of each token.