Gradient descent is an optimisation algorithm. It is used in training ML models and neural networks. Its aim is to minimize costs of the model by adjusting the weights of the neurons.
To begin with, random weights are assigned. The input data is fed forward to generate an output. This is compared to the desired output so as to calculate an error. This error is propagated backwards through the network using backpropagation. The weights of each neurons are modified based on their contribution to the error. The steps are repeated till the cost function reaches a minimum value.
There are different types of gradient descent — batch or stochastic or mini-gradient descent.
Stochastic gradient descent (SGD) updates weights of the neurons after each training example is processed. It is faster, but less accurate.
Batch gradient descent updates weights of neurons after processing all training examples. It is slower, but more accurate.
Mini-batch gradient updates weights of neurons after processing a small batch of training examples. It is a compromise between stochastic and batch gradient descent.
There are other descents such as Adam, Adagrad, RMSprop, Adadelta and Nadam.
In SGD weights are adjusted after each training example is processed.
Update rule:
w=w — learning rate * gradient
Where w=weight, learning rate is a hyperparameter that controls the step size of the update, and gradient is the gradient of loss function with respect to the weight. It tells us how much we need to change the weight to reduce the loss.