Prior to GPT models, text output was created by rearranging or extracting words from the input itself.
Generative models have the capacity to generate text that is cohesive and human-like. They do so by utilizing probability distributions to forecast the most likely word or phrase.
A pre-trained model has been trained on substantial dataset of samples before attempting any job.
GPT models are trained on unsupervised learning strategy on a substantial corpus of text data.
The model learns the structure and characteristics of a language broadly. It does so by learning from unstructured data. After learning, it utilizes this understanding by answering queries and summarizing text.
Transformer is a specific kind of neural network architecture. It deals with text sequences of different lengths. ‘Attention is All You Need’ paper in 2017 made transformers popular. It is a decoder-only architecture. The ‘self-attention’ mechanism of transformer captures the relationship between each word and other words in the same phrase.
Evolution of GPT Models
GPT-1
The model was trained on 40 GB of text data. It produced cutting edge results for modelling jobs such as LAMBADA. It saves data for relatively short phrases or documents. It entertains each request with a context length of 512 tokens or 380 words.
GPT-2
It descended from GPT-1. It has the same architecture. It is, however, trained on bigger corpus of text data. It has more analytical power. It has been trained on 1.5 billion parameters.
GPT-3
It is an improvement on GPT-2. It has 175 billion parameters. It has been trained on a bigger corpus of text data.
GPT-3.5
Here a method RLHF or Reinforcement Learning with Human Feedback has been used. It adds rule-based human values to 3.5 models. This reduces toxicity and puts a premium on veracity. The model aligns with the user’s intent. It is a move towards making AI ethical and responsible.
GPT-4
It handles both text and pictures as inputs. The output is text. It is close to 1 trillion parameters. The ultimate aim is to predict the subsequent word, given a series of words.
In its training, substantial publicly available data has been used. It inherits the use of RLHF from GPT-3.5. It passes adversarial factuality tests.