Sampling Techniques in Generating Next Word

LLMs use sampling techniques to generate the next word. Some common sampling techniques used are:

Greedy Sampling Here the word with the highest probability is chosen. Though a very straightforward, it becomes repetitive and less diverse in output.

Top-k Sampling Here the word is selected from the top-k most likely words. It is a random technique. It ensures that words with higher probability are chosen.

First probabilities of all possible words in the vocabulary based on context are calculated. The probabilities are sorted and top k words with the highest probabilities are selected. From this reduced set of top k words, the model randomly selects one word to be the next word in the generated sequence.

There is a balance between selecting highly probable words (coherence) and some randomness (diversity). The value of k determines how many words are considered in this selection process.

Top-(nucleus) Sampling The word is selected from the smallest set of words. The cumulative probability of these words exceeds a threshold p. It dynamically adjusts the set of words to maintain diversity (based on changing probabilities.)

The smallest set of words is determined dynamically based on a cumulative probability threshold (denoted as p).

First, probabilities of all possible words based on context is determined. Ther are sorted in descending order. Starting from the word with the highest probability, the model calculates cumulative probability while iterating through the list of sorted words.

Once the cumulative probability exceeds the threshold p, the model stops considering additional words. The model selects from this sub-set of words whose cumulative probability exceeds p. It is a random selection based on original probabilities of these words.

Temperature Scaling SoftMax probabilities are adjusted before sampling to control randomness of the generated text. Lower temperatures lead to more deterministic outputs. Higher temperatures promote mor randomness.

These techniques achieve a balance between coherent responses and diversity in generated text.

print

Leave a Reply

Your email address will not be published. Required fields are marked *