Generative AI chatbots are large language models. However, bigger and more capable AI requires huge processing power. Big Tech will dominate this field. It is difficult for researchers to access such processing power. Few researchers can have access to it.
A small number of industry labs with huge resources can afford to train models with billions of parameters on trillions of words. Thus we should think of more efficient language models. It is necessary to think in terms of functional language models which use datasets of much smaller size than those used by advanced large language models. A small model or mini-model which is nearly as capable is what is required. This project is called the BabyLM project.
LLMs are trained to predict the next word in a given sequence of words. The training uses a corpus of words drawn from websites, novels, newspapers. The model conjectures based on the example phrases. It later adjusts itself depending on how close it was to the correct answer. The process is repeated several times. A model then maps how the words relate to one another. In other words, the bigger the corpus of words it learns, the better it becomes. The sequence of words provide the model a context, and more context guides the model to get what each word means.
OpenAI’s GPT-3 (2020) was trained on 200 billion words. DeepMind’s Chinchilla (2022) was trained on a trillion words.
The learning of language by human being is different. Human anatomy have sensory mechanism. We can feel the pain, sarcasm, insult. We can smell flowers, the wet earth and we can taste chocolates and oranges. We can feel the caressing touch. In the early years, we learn so many words based on the sensory feelings. These words are grasped by us for the first time in written form. Language models can be brought closer to human understanding. Cognitive capacity of the brain is represented by a unit called ‘neuron’. The early models of AI were patterned after the human brain.
Later computer scientists realised that training language model was easier on the huge amount of data, rather than forcing them into psychologically informed structures. Of course, computer now generate human-like text, but the way computers learn and we learn the language is different.
It is now realised that we have to make AI smaller and smarter.