DeepSeek emerges as a disruptor in the technology industry. OpenAI accuses it of using a technique called distillation, that allows a new model to learn from a pretrained model. In distillation, a pretrained model is repeatedly questioned (distillation) to train a new model. OpenAI suspects that DeepSeek may have inappropriately distilled their models.
There are doubts about such distillation. DeepSeek R1 could have used RL (reinforcement learning). There is a whole paper written by them on this topic. They used SFT or supervised fine tuning. This added domain knowledge with good rejection sampling. It learnt rejection from scratch. RL is a new paradigm shift. It adds the reasoning skills.
SFT is an ML technique where a pre-trained model is further trained on a labelled dataset specific to a particular task. The model has already acquired general repository during its pre-training phase. It is leveraged and adapted to perform well on more specialized tasks. According to the summary attached by DeepSeek on GitHub page, it has applied RL (without relying on supervised fine tuning). It allows the model to explore chain-of-thought (CoT) for solving complex problems. It has validated reasoning capabilities of LLMs through RL, without the need for SFT. It is a breakthrough.
It also uses a mixture of experts’ techniques to assign different parts of the training task to specialized units or experts within the model.
It makes the system more efficient using optimization techniques to find and process information without using much memory. It also predicts two words at a time, instead of one.
Leave a Reply