GPT : How It Makes an Inference

Autoregressive language modelling in GPT uses a sequence of tokens to predict the next token in the sequence. The model is trained on huge dataset of text. It learns to predict next token based on the content of the previous tokens.

GPT model’s architecture has a Transformer. In NLP the transformer enables the model to learn long term dependencies between tokens. This understanding is vital for learning the context of a text sequence.

In training a GPT model masked language modelling technique is used. Some tokens in the input sequence are masked. The model is asked to predict them. This helps the model to learn the relationships between different tokens in a sequence.

A trained GPT model is used for inference. The model can generate text, translate language or answer questions.

It can be fine tuned to perform specific language tasks.

GPT model’s transformer layer processes the sequence of tokens. Each token, as we know, is a piece of text, say a word or a character. The model assigns a numerical vector to each token. It is called embedding. The embedding represents its meaning and context. These embeddings are processed through the transformer layers. Ultimately, there is the output sequence.

print

Leave a Reply

Your email address will not be published. Required fields are marked *