An LLM model uses a neural network to learn the probability of each word that could follow a given sequence of words. The neural network has been trained on a massive dataset of text. It learns to identify patterns in the way the words are used together, e.g. the article ‘an’ comes before a noun with a vowel sound, the article ‘the’ is followed by a noun, and the word ‘to’ is followed by a verb. An LLM has to predict the next word. It examines the current sequence of words, and outputs probability distribution over all possible words that could follow. The word with the highest probability is the most likely next word.
To illustrate, let us take a sequence of words:
‘People wished her a happy__. The output of probability distribution is as follows.
birthday 0.8
Deepavali 0.9
Holi 0.7
X’mas 0.4
The word ‘Deepavali’ has the highest probability. An so it is the most likely next word.
The process continues till the response is complete.
The model actually uses vectors of the words to learn the associations and patterns. The accuracy of the prediction depends on the size and quality of the dataset.