Ilaya Sutskever, co-founder of OpenAI and now head of Safe Superintelligence in a recent conference in Canada expressed his opinion about the current AI systems based on pretraining. According to him, pretraining, as we know it, is going to end unquestionably. LLMs learn patterns from vast amounts of unlabelled data sourced from internet, books and other sources. Already, the data drawn has reached its peak, and there is no more data to be drawn. He compared this situation to fossil fuels — oil is a finite resource. Similarly, internet contains a finite amount of human-generated content.
Existing data can still take AI development farther. However, the industry is tapping out on new data to train on. There will be a shift away from the way the models are trained.
The more a system reasons, the more unpredictable it becomes. AI systems which play chess are unpredictable to the best human chess players.
Sutskever is optimistic about agentic models, which can understand things from limited data. Apart from being agentic, future systems will be able to reason. Today’s AI is mostly pattern matching based on what the model has seen. Future AI systems will be able to work out things step by step, more like the way we think.
Sutskever feels that futue models would depend not only on text but also on muli-modal data , including computer vision