Parellelization in AI chips (say Nvidia chips) denotes their ability to perform many calculations simultaneously, rather than one after the other, as traditional CPUs do. It accelerates data processing in AI and ML tasks.
First of all, parellelization is a great help in training neural networks. It further helps in executing AI tasks. There are massive matrices, and repetitive mathematical expressions ( additions and multiplications). There are independent computations on large datasets.
These tasks are broken into smaller pieces and are executed at the same time.
In such chips, there are thousands of mini-processors or cores. Each core can handle a small task. This many tasks are processed in parallel.
A CPU processes one task at a time, while an AI chips (GPU) can process thousands of operations simultaneously.
In training neural network, there are matrix multiplications on data batches, e.g. 1000 images. Each image can be processed independently. The task is split across many cores and each core processes one image or part of that image. It results into faster training and better scalability.
Matrices in neural networks are made of floating point numbers. These numbers represent data and model parameters. Input data of images is represented as a matter of pixel values ( say 28*28 matrix of numbers ranging from 0 or black to 255 –white. On feeding the model, it is flattened into a 784xj vector.) Neural networks learn by adjusting weights, stored in weight matrices. Input of 784 image pixels are connected to 128 neurons in the next layer. The weight matrix is 128*784. Result Z is 128*1 vector. Each value is the output of a neuron in the hidden layer. It is followed by a non-linear activation function such as ReLU or sigmoid.
In short, matrix multiplication combines input matrices and weight matrices to produce activations and predictions.
Leave a Reply