OpenAI has already released its o1 reasoning model. It has entirely skipped o2 model and has unveiled its new o3 model on the last day of the ’12 day’ period.
Since it was unveiled on such a significant benchmark, it raises the question whether it has achieved industry’s holy grail — AGI. It is a the most-sought-after milestone, where AI performs at least as well as humans at certain tasks.
We have already discussed Francois Chollet who created a benchmark scale from Google Research. He treats the o3 model as very impressive and considers it a big milestone on the way to AGI. The threshold score on ARC-AGI metric of Chollet is 85 per cent, and this model has scored 87.5 per cent on it. It is the first model to do so. Yet, there are many simple tasks o3 cannot solve. Still, it treads on new territory and deserves scientific attention.
O3 shows increase in AI capabilities and adaptation to novel task never seen so far.
Previously, AI capabilities increased on scaling the AI models in terms of size, dataset and compute power. Still these capabilities slow down over time. Still o3 shows that limits have not yet been reached. Though it is not AGI, the new models could lead to chatbots that can handle more complex queries and solve problems step-by-step. It shows deep semantic understanding.
o3 mini is also unveiled. It is a smaller, distilled and fine-tuned for particular tasks.
The models are now available for safety researchers, and OpenAI has not yet announced the public release date.
There are reasons to believe that OpenAI will continue to release o1, o3 and other successive models in this category.
Leave a Reply