At the start of the training, the weights in a neural network are typically assigned arbitrarily. Precisely, there is a structural randomness, or they are randomly initialized.
Random Initialization
Symmetry is avoided. If weights were initialized to the same value (say zero), neurons in the same layer would learn the same things. That makes training ineffective.
Random weights ensure neurons process inputs differently, allowing gradients to flow, and useful features to emerge.
Initialization Methods
Uniform or Normal Random Initialization
Weights are chosen from a uniform or normal distribution. It is not an ideal method for deep networks due to issues such as vanishing or exploding gradients
Xavier Initialization
For sigmoid, the variance of activation is stable across layers.
He Initialization
For ReLU or variants weights are chosen. These are designed to maintain variance of activations and gradients.
Bias Initialization
It is typically initialized to zero since this does not affect the symmetry-breaking problem.
In short, weights are intialized randomly, but not arbitrarily. They are initialized in a way that ensures effective learning from the start.
Weights are initialized in PyTorch and Tensor Flow/Keras. One can also use GloroNormal for different activations.
Leave a Reply