A large language model (LLM) is first of all an AI algorithm. It employs deep learning techniques and massive data in order to understand, summarize, generate and predict new content. A closely related term is generative AI. It is a type of AI that has been architected to generate text-based content.
As we know, language is a means of communication. Spoken languages are based on communications –both human and technological. These have evolved over several millennia. Language chiefly is a system of syntax, and uses words and grammar to convey ideas and concepts. A model of AI that is language model also serves the same purpose.
AI language models date back to Eliza which debuted in 1966 at MIT. All language models are trained on datasets, and deploy various techniques to infer relationships. Then they are in a position to generate new data. Of course, it is based on trained data. Language models are used for natural language processing (NLP). A user puts a query in natural language (input) to get a response in natural language (output).
An LLM is evolution of a language model — here the model expands the data based for its training and its capability to infer. There is no agreement on how large the dataset would be, a typical LLM has at least one billion or more parameters. Parameter here means the variables present in the model on which it was trained, and which it can use to infer new content.
The genesis of modern LLMs goes back to 2017 when transformer models came up as neural models. LLMs thus have a large number of parameters and transformer architecture. LLMs have applications across many different domains.
LLMs are referred to as foundation models. This is a term coined by Stanford Institute for Human Centred AI in 2021. A foundation model is large and impactful. It is a foundation for further optimizations and specific use cases. AI and ML are the techniques to enhance efficiency (input/output ratio), effectiveness ( more output per unit of input), experience and evolution of an organisation. It is because of these benefits that businesses are inclined to invest in this technology.
Working of the Language Models
As we know by now, an LLM model is trained on a large volume of data called corpus. The dataset is generally expressed in terms of petabytes. There are steps in training. First, there is unsupervised learning where the model learns on unstructured data and unlabelled data. Its advantage is the vast data that is available . Here the model gets the capability to derive relationships between words and concepts.
As a next step, LLMs are fine tuned on self-supervised learning. There is some data labelling. It assists the model to identity the concepts more accurately.
Further, LLMs undertake deep learning through transformer neural network process. It enables the model to grasp the relationships and connections between words and concepts using self-attention mechanism. A score is assigned (commonly called a weight) to a given item or token in order to decide its relationship.
After the training process is over, there is a base for AI. If a prompt is given as a query, the model generates a response, say an answer to a question or some new text or summarised text or sentiment analysis.
Uses of LLMs
LLMs are used for text generation, translation, content summary, rewriting content, classification and categorization, summary sentiment analysis and conversational AI and chatbots.
Challenges in Building LLMs
There are huge development costs since we require expensive graphic processing units and massive data. There are operational costs which are high. A bias could sneak into the data. The model could be a blackbox, and we are unable to explain how it arrived at particular decision. There are inaccuracies or hallucinations. LLMs are complex models with billions of parameters. If a prompt is malicious, there are glitch tokens causing malfunction of the LLM.
Types of LLMs
There are generalised zero-shot models, and there are fine-tuned domain specific models. There is a BERT model or bidirectional encoder representations for transformer model. Lastly, there is multi-modal model that handles both text and videos, say GPT4.
Future
Models can acquire artificial general intelligence or become sentient. Models use techniques such as reinforcement learning from human feedback (RLHF). Google uses Realm or retrieval augmented generation language models.