In NLP, machine translation and language generation, beam search algorithm is commonly used as an extension of greedy search algorithm (which selects the most probable next token at each step of sequence generation).
Instead of selecting only the top-scoring token at each step, beam search tracks a fixed number of candidates (beam width or beam size). At each step, the algorithm expands each candidate by considering all possible next tokens and sets aside top candidates based on their combined probability scores. It enables the algorithm to consider multiple potential sequences.
The steps in short are initialization or starting with a single initial candidate sequence (just the start token). The next set of candidates is generated considering all possible next tokens for each existing candidate. Individual score for each candidate is calculated based on its probability. Top candidates are retained based on their scores (as determined by beam width). The expansion and pruning steps are repeated until a termination condition is met (reaching maximum length or generating an end token). The final candidate is selected based on some criteria (highest overall score or reacting a specific termination condition).
The technique suffers from flaws like repetition or generic responses. There are other techniques such as length normalization, diverse beam search or nucleus sampling to overcome these issues.