Feedforward

In a transformer model, feedforward refers to a neural network layer that processes each position in a sequence independently of others.

This layer consists of two linear transformations with a non-linear activation function in between (typically ReLU — rectified linear unit). It is necessary to capture complex patterns and relationships within the input sequence.

The feedforward layer in a transformer is a multi-layered perception (MLP) that acts on the output of self-attention layer. It introduces non-linearity in the model. This enables the model to be good at capturing complex relationships between the input elements.

Thus, when we break down feedforward, it takes output from the self-attention layer and processes it. It then applies a non-linear transformation to this output (using a ReLU activation function). The transformed output is then fed through another linear layer to create a new representation of the input sequence. It is this enriched representation which is better suited for the next self-attention layer or the final output layer of the model.

Essentially, feedforward layer enhances the contextual information extracted by self-attention layer.

We have referred to ReLU as activation function to introduce non-linearity — a critical property for neural networks to learn complex patterns in data.

If the input to the ReLU function is positive, the output remains unchanged.

If the input is negative, the output becomes zero.

It makes ReLU computationally efficient and helps address the vanishing gradient problem (which can hinder training in deep neural networks).

Mathematically, ReLU(x) = max (0,x), which means it returns O for negative input values and the input value itself for positive input values.

Illustration

An input vector has three features — x1, x2 and x3. There is a hidden layer with two neurons. We require a single output. Each connection between neurons has a weight associated with it, and each neuron has a bias term.

Copy Code

Input Layer_____ Hidden Layer_______ Output Layer

x1, x2, x3 — —- >h1, h2 —–> Output

^ _______________^___________________^

l ________________l_____________________l

Weights_________ Weights _____________Weights

(W1)___________ (W2)__________________(W3)

Bias_______________Bias___________________Bias

[B1]_______________[B2]___________________[B3]

The weighted sum of inputs plus bias is calculated for each neuron in the hidden layer. It is then passed through an activation function like ReLU.

Output from each neuron in the hidden layer is connected to the output neuron in the output layer (with associated weights and biases). Again, the weighted sum of inputs plus bias is calculated for the output neuron. This result is the final output of the network.

This is the process of one feedforward pass (through the neural network).

print

Leave a Reply

Your email address will not be published. Required fields are marked *