A Test for AGI

There is a well-known test for AGI created by Francois Chollet in 2019. It is called ARC-AGI, short for Abstract and Reasoning Corpus for AGI. It is designed to assess whether an AI system can efficiently acquire new skills outside the data it was trained on. It shows how close we are to general intelligence. Till date, the best performing AI could reach the solution of one third tasks in ARC-AGI. This is due to excessive focus on LLMs, which are known to lack the ‘reasoning’ ability.

LLMs find it difficult to generalize since they are geared by memorization. They fail anything that is unrelated to their training data. What LLMs do is to memorize reasoning patterns, and cannot generate new reasoning.

Chollet and Mike Knoop arranged a competition to build open-source AI scoring high on ARC-AGI. About 17800 submissions were received. The best ones recorded 55.5 % score. They failed short of 85% human-level threshold. Many submissions where just brute force to a solution. The performance on puzzles was poor.

It is not correct to set ARC-AGI as benchmark since the very definition of AGI is not yet clear. AGI has already been reached if it is defined as AI ‘better than most humans at most tasks.’

In 2025, Knoop and Chollet will have another competition with another benchmark. However, defining intelligence for AI will be as controversial as it has been for human beings.

print

Leave a Reply

Your email address will not be published. Required fields are marked *