Transformer Model

In deep learning, a transformer model that adopts self-attention is used. The key idea is that it differentially weights the significance of each part of the input data.

This model consists of an encoder and a decoder. The encoder processes the input sequence and generates a sequence of hidden states. The decoder then takes this sequence as an input and generates an output sequence.

Self-attention is a mechanism that permits the model to attend to different positions of the input sequence. In other words, the model weighs the importance of different positions of the input sequence while computing representation of each position.

BERT or bi-directional encoder representations transformer is an ML technique that is transformer-based. It is used for NLP. GPT3 is an autoregressive language model that uses deep learning to produce human-like text. T5(Text-to-Text-Transfer Transformer) is transformer-based model that can be fine tuned for variety of NLP tasks.

Transformers are primarily used for NLP and computer vision (CV).

In the transformer model the positional embedding is added to the data embedding vectors. Positional embedding adds positional information to word embedding. It is done to tag data elements coming in and out of the network. Attention units follow these tags, calculating a kind of algebric map of how each relates to the others. Attention queries are typically executed in parallel by calculating a matrix of equations in what is called multi-headed attention.

print

Leave a Reply

Your email address will not be published. Required fields are marked *