In a sequence, to generate text, a model predicts the next word using a probability distribution. This is the basic task in NLP (natural language processing).
First step is tokenization of input text into individual words or parts of words. Each token is a unit of language — word or sub-word.
The language model is a neural network. The input it takes is the sequence of tokens. It is trained on a corpus of vast amount of data. It learns relationships between words and the context.
Context representation involves processing of input sequence up to the current token to create representation of the context. In context representation, what is captured is the information of the preceding words in the sequence.
After context representation, the model calculates the probability distribution over the vocabulary for the next word, Probability is assigned to each word in the vocabulary. It indicates the likelihood of each word being the next word, in the sequence, in the light of the context.
Lastly, the model can either sample from this probability distribution to generate a predicted next word stochastically or simply choose the word with the highest probability as the next predicted word.
Probability distribution is computed using SoftMax Activation over the output layer of the model. It converts the raw scores into probabilities. While being trained, the model is optimized to maximize the probability of the correct next word being predicted for each sequence in the training data.
Here the model leverages its learned relationships between words and their contexts. It enables it to generate coherent and contextually relevant predictions for the next word in a given sequence.