A Large Language Model (LLM) is an artificial intelligence model specifically designed to understand, interpret and generate human-like text based on vast amounts of input data.
LLMs are trained by introducing large datasets. The model is trained to predict the next best course of action by analyzing the context of data inputs that came before it. The method is called autoregressive language modelling. It helps the model learn the ins and outs of syntax, semantics and context.
Many LLMs have been developed of late. GPT-3 or Generative Pre-trained Transformer 3 is one of the largest LLMs developed by OpenAI. XLNet is another LLM developed by Carnegie Mellon University and Google.
One of the earliest examples of language model was ELIZA a programme developed by Joseph Weizenbaum at MIT in 1966. Deep learning arrived in 1990s. It used neural networks to process the data. It paved the way for more advanced language models.
One key milestone in the development of LLM was the use of unsupervised learning. It allows models to learn from large amounts of unstructured data.
Geoffrey Hilton, Yann LeCun and Youshua Bengio are called ‘godfathers of AI’ and have been awarded $1million Turing Prize for their contribution to develop the sub-field of AI called deep learning. They worked both independantly and jointly to develop the conceptual foundations in this field. They identified surprising phenomena through experiments and contributed engineering advance that demonstrated the practical advantages of deep neural networks.
A deep neural network (DNN) is an artificial neural network (ANN) with multiple hidden layers between the input and output layers. Just like shallow ANNs, DNNs can model complex non-linear relationship. The main purpose is to receive a set of inputs, perform complex calculations on them, and give output to solve real world problems such as image recognition.
An LLM is an artificial algorithm that uses deep learning techniques and massively large data sets to understand, summarize, generate and predict new content.
An LLM can perform a variety of natural language processing (NLP) tasks, e.g. generating and classifying text, answering questions in colloquial style and translating text from one language to another.
LLM has many parameters, typically billions of weights or more. It is trained on large quantities of unlabelled text using self-supervised learning. LLMs emerged around 2018 and perform well at wide variety of tasks.
LLMs power many apps such as AI chatbots and AI search engines.
BERT or Bidirectional Encoder Representations from Transformers is an LLM developed by Google. It uses massive text and unsupervised learning to generate human-like text. It is a transformer-based model. It means it uses attention-mechanisms to process text data. It, therefore, handles long range dependencies. It performs well on a wide range of NLP tasks including sentiment analysis and named entity recognition.
BERT’s limitation is that it requires large amount of computational resources to train and run.
Even XLNet is a transformer-based model. It uses a novel training technique called permutation language modelling.