The first Large Language Model especially suited for natural language processing (NLP) and which is transformer-based trained on massive data is GPT-1 by OpenAI in 2018. GPT stands for Pre-trained Transformer. GPT-1 had 117 million parameters. It was pretrained on large text corpus followed by fine-tuning on specific tasks. The transformer architecture was introduced on the basis of Vaswani et al paper in 2017 ‘Attention is All You Need.’
The earlier NLP moder were Eliza (1966) statistical NLP models such as Hidden Markov models, n-gram models, Word2Vec (2013) representing word meanings. Early deep learning models were ELMO (2018) that used LSTM and BERT — bidirectional transformer (2018) by Google. However, they were trained differently from GPT.
GPT is the first true LLM used for NLP.
Leave a Reply