The words or parts of words ( say plural marker ‘s’ or the prefix ‘un’ ) are stored in the LLM model as tokens. Each token is not represented as a sequence of alphabets or letters but as a vector which is a sequence of numbers.
The vectors assigned to the word tokens in the model place them in a ‘space’. They encode how ‘close together’ they are in that space.
The distances between words are expressed in hundreds of dimensions. These dimensions encode substitutability ( here happy could be close to sad, though on some other dimension, they could be far apart ).
The LLM has to predict what word or sequence of words is or are most likely to come next.
Here two things are used to facilitate this — a transformer and attention.
A transformer is a mathematical process that recalculates vectors for each token. In other words, it assigns new distances between each pair of tokens, depending on what other tokens are. LLM gets first few words by rearranging the question into a response. It has just to find the most probable next word.
LLM weights all the relationships between words it knows (in thousands of dimensions, based on corpus of training data). Then it looks at what words have preceded and reweights those associations. The reweighting step is what the LLM technicians call a transformer. The revaluation of weights based on the salience given to previous bits of the text is called attention.
These steps are applied to every part of the conversation. Attention is a breakthrough development in natural language AI.