In a matter of weeks in Jan 2025, DeepSeek has become the cynosure of the tech industry. Since ChatGPT was launched in November 2022, the tech industry realized that the Chinese model DeepSeek has managed to train a foundation model to rival ChatGPT and Llama at much lower costs by using fewer GPUs. It was a damp squib on the higher demand for advanced chips for the American models.
DeepSeek was started by hedge fund manager Liang Wenfeng in 2021. He purchased Nvidia chips for an AI side project while running his trading fund. He wanted to leverage AI to identify patterns that could affect stock prices. It was later that this project got converted into a standalone AI venture. In the training process, the model used rounding to make calculations easier. The cloud capacity was reconfigured to ensure the model splits tasks over multiple chips more efficiently.
DeepSeek has made most of its training results public and made the model open source.
The stock prices of tech stocks fell across the markets from New York to Tokyo. Maybe, it is an overreaction to the new model, which has the potential to lower the cost of developing an LLM. DeepSeek achieved inference at a much lower cost. This lowers the entry barrier for nations that would want to develop their own models.
Nandan Nilkeni is not in favour of India focusing on building LLMs. Arvind Srinivas, Perplexity AI does not agree. He expects India to build both LLMs and use the existing LLMs to build wrappers on top of them.
US president Trump has cautioned the US tech to take DeepSeek as a wake-up call.
One thing is certain. The Chinese so far emphasized only on making money. They ignored innovation. Innovation comes from a deep sense of curiosity and a desire to create. It is not always business or profit -driven. DeepSeek is taking China in the right direction.
Leave a Reply