In transformer model, we take a sequence of words and then deal with one word or token at a time. In ML and NLP, there are alternative approaches too — deal with larger units of text, say sentences, paragraphs or even entire documents. The current LLMs and chatbots based on them like ChatGPT operate basically on tokens to compute embeddings and predictions. There are other methods and research concepts that treat text at a higher level.
In fact, hierarchical models could be word-level, sentence-level or document-level.
The long-range dependencies could be handled by transformer variants such as Longformer, BigBird and Reformer. Even transformers are modified to summarize sentence embeddings into representations for broader contextual understanding.
Sentence-BERT (SBERT) model focuses on producing embeddings for entire sentences or paragraphs.
Some systems re-aggregate tokens into larger units by applying processing. Some models are trained on chunk-based text. Dynamic context windows allow models to focus on larger or smaller chunks of inputs.
Still, the fact remains that all the methods rely on underlying token representations in most cases but optimize for tasks where sentence-level understanding is crucial. Research in this area is directed to enhance a model’s ability to handle and understand text holistically, rather than just token-by-token.
Leave a Reply