The whole field of AI and natural language processing essentially emanated from a proposal that machines could learn and generate language like humans put forward by Alan Turing in his landmark paper Computing Machinery and Intelligence (1950). He proposed Turing Test where a machine would be considered intelligent if it could carry on a conversation not distinguishable from that of a human. As such he did not develop language models, but anticipated that the language generation was the key to machine intelligence.
Noam Chomsky between 1950s and 1960s developed formal models of syntax. He showed how complex sentences could be derived from a small set of rules. It inspired early rule-based natural language processing.
Joseph Weizenbaum created ELIZA, one of the first programmes to simulate human conversation using pattern matching in 1966.
John McCarthy and Marvin Minsky between 1950s and 1970s founded the field of artificial intelligence. They advocated ML and reasoning systems that could be applied to language.
The modern shift was to the idea of machines learning language patterns from data happened between 1990s and 2000s. In early 1990s, there emerged a statistical machine translation group at IBM.
Later, neural networks and deep learning models were developed in 2013-2017 with RNNs and then Transformers such as GPT from 2018 onwards. These realized the idea of machines generating human-like language.
It was surprising how well modern AI models can generate human-like language. What was more surprising was the degree of fluency, coherence and creativity in LLMs such as GPT. These exceeded expectations.
GPTs were trained purely on a statistical pattern in text, without real world understanding. Many thought this will be a limitation, and the output will be shallow and simple. Yet what they generate is grammatically correct, textually appropriate and even insightful language.
They have displayed emergent behaviours or capabilities that were not explicitly programmed, e.g. reasoning through multiple steps, code generation, summarizing complex documents and creative writing. These emerged from the text alone.
Models like GPT can answer questions and solve problems they have never encountered before — based on pattern extrapolation, despite lacking understanding or consciousness. This generalization beyond training surprises researchers, especially in zero-shot settings.
Researchers agree about scaling — more compute and more data leading to better performance. But even they did not fully predict the qualitative leap in fluency and versatility.
Leave a Reply