There is a dog in the ___. The options could be kennel, courtyard, street. How the token will be predicted by AI?
First, we will embed a vector, then form a weight matrix and then do a matrix multiplication.
Words are first converted into vectors. Here we have a vocabulary of 5 words:
[dog, in, the, kennel, courtyard. street]
Let us assign each word a 3D vector.
Word Embedding (3D Vector)
dog 1.0, 0.5, 0.2
in 0.3, 0.8, 0.1
the 0.2, 0.4, 0.9
These are taken from pre-trained embedding matrix., learned from large text data.
In step 2, we shall combine input embeddings
The sentence is ‘dog in the’. We will average the embeddings for simplicity (Alternatively, we can concatenate).
dog [1.0, 0.5, 0.2]
in [0.3, 0.8, 0.1]
the [0.2, 0.4, 0.9]
Average [1+0.3+0.2]/3, (0.5+0.8+0.4)/3, [0.2+0.1+09]/3
= [0.5, 0.566,0.4] Input vector**x**
In step 3, we use weight matrix W to transform the input vector into logits (scores for each word in the vocabulary)
W= [0.2 0.6 -0.3]
[-0.1 0.9 0.4]
[0.7 0.2 0.5]
[0.3 0.1 0.2]
After multiplying the input vector
X [0.5, 0.566, 0.4] with W raised to T
Each score
1.kennel
0.2x 0.5+0.6+ 0.566+(-0.3) x 0.4=**0.62
2.courtyard
0.1x 0.5+0.9 x 0.566 +0.4x 0.4 = 0.05 + 0.51 + 0.16
= ** 0.62**
3 street
0.7 x 0.5 +(-0.2) x 0.566 +0.5×0.4
=0.35-0113 + 0.2
= **0.437 **
These logits are converted to probabilities in next step ( Softmax)
Softmax (xi) e raised to xi / Summation j e raised 2 j
Use logits [ 0.32, 0.62, 0.437 ]
Compute exponentials
e x 0.32 = 1.377
e x 0.62 =1.858
ex 0.437 = 1.548
Sum = 4.783
Probabilities
kennel 1.377/4.783 = 28.8%
courtyard 1.858/ 4.783 = 38.9 %
streel 1.548/4.783 = 32.3%
The highest probability is that of courtyard (38.9%). AI chooses courtyard.
There is a dog in the courtyard.
To recapitulate,
1 embed words into vectors
2 merge the input vectors
3 multiply to get scores for each output token
4 Softmax
5 pick the most likely word.