Backpropagation and Gradient Descent

In a neural network, the neurons take inputs, multiply them with weights, and add a bias value. All this is run through an activation function.

Neurons generate output, and that becomes the input for the other neurons. The output neurons in the last layer produce the output, of the network. This is a feed-forward.

Neural network has input neurons, the hidden neurons, and the output neuron.

Let us consider the idea of backpropagation with gradient descent. The whole network is treated as a multi-variate function. The loss function calculates a number which denotes how well the network performs. (here output is compared to the know good results).

The set of input data coupled with desired good results is called the training set. The loss function is designed to increase the number value(as the networks behaviour moves further away from correct).

Gradient descent algorithms take the loss function and use partial derivatives to determine what each variable (weights and biases} in the network contributes to the loss value. It then slides backwards visiting each variable and adjusting it to decrease the loss value.

Calculus of Gradient Descent

Concepts from calculus are necessary to understand gradient descent.

Derivative as a notion must be understood. It gives the slope (or rate of change) for a function at a single point. In other words, the derivative of a function gives the rate of change at a given input.

Partial derivative is another concept. It gives us a multi-dimensional or multi-variable function. It just isolates one of the variables to find the slope for the given dimension.

What is the rate of change (slope) of a function at a specific point? Derivatives can answer this question. Given multiple input variables to the equation, what is the rate of change for just one variable? Partial derivatives answer this question.

Gradient descent utilizes these ideas. Each variable of an equation is visited. It is adjusted to minimize the output of the equation. This is our training goal. If the loss function is graphically plotted, the movement incrementally is towards the minimum of a function. We want to find the global minimum.

The size of the increment is known as the learning rate in ML.

print

Leave a Reply

Your email address will not be published. Required fields are marked *