Weights are the learning elements of the model. In fact, they are the core of the learning process. These are small positive values that transform input data within the network’s hidden layers.
As you know, a neural network is a series of nodes or neurons. There is a set of inputs, weight and a bias value within each node. While an input enters the node, it gets multiplied by a weight value. The resulting output is either observed or passed on to the next layer.
The model weights are small positive values. The sum of all weights equals one. These values allow weights to indicate the percentage of trust or expected performance from each model.
Initialization of Weights
They are randomly initialized to begin with. The model learns them. The gradients in the backward pass are used to update them slowly. The Gaussian Distribution is widely used to initialize the parameters. In deep networks heuristic can be used to initialize weights. It depends on the nonlinear activation function.
W is drawn from normal distribution with variance k/n. Here k depends on activation function.
Activation Function
Activation introduced non-linearity into a neuron’s output. If there is no activation, a neural network becomes just a linear regression model. Several activation functions are used — sigmoid, tank, ReLU, Leaky ReLU, ELU, parametric ReLU, Softmax and Swish Function.
The choice of activation function in the hidden layer controls how well the network learns from the data set.
The choice of activation function in the output layer defines the type of predictions the model can make.
Weights and Biases
Weights and biases are used to transform inputs into outputs. Weights are the learning elements of the model. They form the core of the learning process. They are small positive values. They transform input data in the hidden layers of the network. A neural network is a series of nodes( neurons). In each node, there is a set of inputs, weights and bias value.
While the input enters the node, it is multiplied by a weight value. The resulting output is observed or passed on to the next layer.
Biases are added to the weighted inputs before being passed through activation function. They enable shifting of activation functions to better fit data.