In applied maths, there is a subject of linear algebra that is a prerequisite for machine learning. Neural Language Models (NLM) address the n-gram data sparsity issue through parameterization of words as vectors — called word embeddings. These are then used as inputs to a neural network. The parameters are learned as a part of the training process.
Machine learning model output is looked at as decision boundaries. Linear models have a straight line as a decision boundary. Non-linear models have curves as decision boundaries.
Neural networks too follow the same mathematics. It is a logistic regression, but these networks classify more powerfully than logistic regression. A minimum neural network has one single hidden layer, and represents any function that can be represented by logistic regression.
Word embeddings are useful in finding nearest neighbours in the embedding space. The embeddings could be used as an input to supervised learning tasks. It creates a mapping of discrete variables, e.g. words to a vector of continuous variables. It also tackles the curse of dimensionality.
Word2Vec creates word embeddings — Continuous Bag of Words (CBOW) and Skip-Gram Model.
CBOW model learns to predict by the context. It tries to maximize the probability of the target word by looking at the context.
Skip-gram model is designed to predict the context of the word. Both these models are used with neural networks with an input layer, a hidden layer of word embeddings and an output layer. Skip-gram models have larger sizes due to the use of more parameters in predicting multiple target words.
GloVe or Global Vectors for Word Representation is an unsupervised learning algorithm for obtaining vector representations of words. Word2Vec is a shallow neural network to create vectors. GloVe uses a global matrix factorization technique. Word2Vec is for larger tasks, while GloVe for smaller ones. Word2Vec is slower to train, GloVe is faster.