Alex Robert, prompt engineer, Anthropic tweeted about his experience while testing Claude Opus 3, a new LLM that was launched in March 2024. Alex reported that the LLM demonstrated a type of ‘metareognition’ or self-awareness during its needle-in-the-haystack evaluation.
In AI, metarecognition refers to the capability of the model to monitor its own internal processes. It is akin to self-awareness. However, this is anthropomorphizing (There is no self here). ML experts are of the opinion that AI models do not possess a form of self-awareness similar to humans.
The test was to measure the model’s recall ability. Here a target sentence (needle) is inserted into a large block of text or documents (the haystack). The AI model is asked to find the needle. The information is to be pulled from the large processing memory — context window. Here it consisted of 2 lac tokens (word fragments).
While being tested, Opus apparently suspected that it is being subjected to an evaluation. It was asked to find a sentence about pizza toppings. Opus spotted the sentence and also recognized that it was out of place considering the other topics discussed in the documents.
In response, Opus stated. ‘The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association.’ The other material is programming related, and the sentence is out of place. It is unrelated. It may have been inserted as a joke or to test whether I am attentive.
Albert called such metawareness surprising and feels that there is a need for deeper evaluations of LLMs to assess their true capabilities and limitations.
Opus found the needle but went a step further and said the needle is out of place in this haystack. It recognized that it has been inserted to test its attention abilities.