Phi1 : Small Language Model from Microsoft

In language models of AI, we have large language models and small language models. Microsoft recently revealed its model Phi-1 with 1.3 billion parameters. Traditionally, there is a feeling that larger models are superior. However, Microsoft focussed on the quality of data for training the model. Phi-1 has been trained on curated text-book level dataset. It has already outperformed GPT-3.5 with 100 billion parameters.

Phi-1 has the transformer architecture. The crucial part is its training with text-book-level data. It completed the training process with 8 Nividia A 100 GPUs. The training process was completed in just four days. Instead of parameter count, the focus was the quality of training data. Phi-1’s accuracy score is 50.6 which surpasses that of GPT-3.5’s of 47%, though it runs with 175 billion parameters.

Microsoft wants to open source Phi-1 on HuggingFace. It is not for the first time that Microsoft is dealing with a small language model. It previously has introduced Orca, a 13 billion parameter model trained on synthetic data using GPT-4. Orca too has surpassed ChatGPT.

The belief that increased stock size is essential for better performance has ben dismissed. High quality data vests a smaller model with accuracy.

print

Leave a Reply

Your email address will not be published. Required fields are marked *