Author: Shabbir Chunawalla

  • DeepSeek: Copying Charges

    DeepSeek emerges as a disruptor in the technology industry. OpenAI accuses it of using a technique called distillation, that allows a new model to learn from a pretrained model. In distillation, a pretrained model is repeatedly questioned (distillation) to train a new model. OpenAI suspects that DeepSeek may have inappropriately distilled their models.

    There are doubts about such distillation. DeepSeek R1 could have used RL (reinforcement learning). There is a whole paper written by them on this topic. They used SFT or supervised fine tuning. This added domain knowledge with good rejection sampling. It learnt rejection from scratch. RL is a new paradigm shift. It adds the reasoning skills.

    SFT is an ML technique where a pre-trained model is further trained on a labelled dataset specific to a particular task. The model has already acquired general repository during its pre-training phase. It is leveraged and adapted to perform well on more specialized tasks. According to the summary attached by DeepSeek on GitHub page, it has applied RL (without relying on supervised fine tuning). It allows the model to explore chain-of-thought (CoT) for solving complex problems. It has validated reasoning capabilities of LLMs through RL, without the need for SFT. It is a breakthrough.

    It also uses a mixture of experts’ techniques to assign different parts of the training task to specialized units or experts within the model.

    It makes the system more efficient using optimization techniques to find and process information without using much memory. It also predicts two words at a time, instead of one.

  • Humanoid Robots

    OpenAI is exploring the humanoid robot market, which is expected to touch $38 billion by 2035. (Goldman Sachs). In India, a robotics company proposes to launch an advanced AI agent capable of processing vast volumes of multi-modal data.

    The company will be backed by Reliance Addverb’s. The humanoid robots are yet-to-be-named. They will be launched by the end of 2025. The idea is to market military-grade robots.

    Addverb has a robotics facility Bot-verse in Noida. It has a massive building block in black surrounded by green infrastructure. It makes products ranging from autonomous mobile robots to sorting robots. They are made as per the demand of the clients. These are not humanoid robots since they are in prototype stage. The facility has a production capacity of 1 lac robots annually. It is a 6 lac sq.feet complex. The robots integrate to AI, IoT, 5G and edge computing.

    Addverb will collaborate with Reliance to develop humanoid robots. It has already introduced India’s first quadruped, Trakr. It also unveiled Heal, a medical cobot.

  • DeepSeek and India’s AI Infra

    Domestic AI infrastructure and data center providers are hopeful that there will be greater business potential with the availability of low-cost open source DeepSeek model.

    The very fact that Deepseek has been built at lower cost inspires the Indian startups and companies to build LLMs on similar lines using the least number of GPUs.

    In addition, DeepSeek is open source. Startups can leverage its APIs at significantly lower costs.

    There could be an increase in take-up of GPU as a service model. Compute providers can lease the GPUs at an hourly rate.

    The open source models could be 10 times cheaper than working with the closed source models.

    According to experts, DeepSeek follows a balanced loading approach while tackling a prompt. Models such as ChatGPT go through all the knowledge and put all the capabilities at one go. It increases the compute requirement. On the other hand, DeepSeek loads when required.

    DeepSeek does not actually put all capabilities at one go while interacting with the users. It first understands the language and then whether the question is biological, medical, business or mathematical. It will then load the necessary knowledge. The approach is effective and requires less compute and is cost-effective. ChatGPT on the other hand answers any question by looking at a complete repository as it is trained on a single super-knowledge base. DeepSeek uses a two-geared approach.

    Open source models lower entry barriers but increase at the same time the demand for infrastructure capable of supporting large scale inferencing and deployment. Thus, data centers play a critical role in AI revolution.

    Such models require cutting edge algorithms and optimizations which enhance performance, keeping the costs under control.

    DeepSeek architecture paves the way for more distributed and energy efficient data centers. They should provide flexible GPU leasing.

  • AI Education in the USA

    The top universities teaching AI in the USA turn out professionals who are in demand in the IT and technology sector. There are opportunities in the field of computer science and artificial intelligence.

    The US News and World Report has ranked some universities for AI education.

    Carnegie Mellon University, Pittsburgh, Pensylvania

    The number one institute for AI education, it teaches the AI-related concepts in detail.

    Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts

    It is a world class institute that teaches a variety of subjects including AI. It is known for its research quality. There are brighter chances in top companies after completing AI education here.

    University of California, Berkeley

    It is one of the best institutions to learn AI. It inculcates in the students critical thinking and problem solving skills. It also teaches programming languages.

    Stanford, University, California

    It is a private research institution. It attracts students from all over the world. It provides good opportunities in the field of AI.

    Georgia Institute of Technology, Atlanta, Georgia

    It is a public research and technology institution. There 7 colleges and 31 departments under it. These are known for science and technology educations.

  • Beyond Token-by-Token

    In transformer model, we take a sequence of words and then deal with one word or token at a time. In ML and NLP, there are alternative approaches too — deal with larger units of text, say sentences, paragraphs or even entire documents. The current LLMs and chatbots based on them like ChatGPT operate basically on tokens to compute embeddings and predictions. There are other methods and research concepts that treat text at a higher level.

    In fact, hierarchical models could be word-level, sentence-level or document-level.

    The long-range dependencies could be handled by transformer variants such as Longformer, BigBird and Reformer. Even transformers are modified to summarize sentence embeddings into representations for broader contextual understanding.

    Sentence-BERT (SBERT) model focuses on producing embeddings for entire sentences or paragraphs.

    Some systems re-aggregate tokens into larger units by applying processing. Some models are trained on chunk-based text. Dynamic context windows allow models to focus on larger or smaller chunks of inputs.

    Still, the fact remains that all the methods rely on underlying token representations in most cases but optimize for tasks where sentence-level understanding is crucial. Research in this area is directed to enhance a model’s ability to handle and understand text holistically, rather than just token-by-token.

  • AI Videos

    We have already created a blog on AI video generator Sora from OpenAI. The Chinese companies too intend to compete with the Silicon Valley in this area. Shengshu Technology, a Chinese company has released Vidu 2.0, a revamped video generator. This Chinese company is Beijing-based. These video generators convert images into short videos.

    Vidu 2.0 comes immediately after the release of DeepSeek-V3, an LLM that satisfies global benchmarks. It is trained at a remarkably low cost.

    Vidu 2.0 offers an English-language interface. However, it is infused with the Chinese values. It prevents the image of political figures from being manipulated. Images of President Trump can be manipulated, but not of Xi’s. The same political views have been baked into DeepSeek as well. It is all praise for the human rights record of the Chinese state.

    Chinese AI tools could be used to create social media content. However, there are international concerns surrounding AI-generated videos. Deepfakes could be misused to target, deceive and harass individuals or to generate non-consensual porno.

    Technically, the video generating models have to improve further. Some facial expressions could be wonky. The limp movements could defy the laws of physics. There are other indications that these are AI-generated. However, what impresses is the speed with which a video is generated. In a matter of seconds, a fake clip of President Trump crying or embracing Elon Musk can be generated. This technology has the potential to cut costs. It is more than half as cheap as industry-generated images.

    There should be guardrails in place on Vidu to prevent its misuse.

    It should be noted that this model could generate only eight seconds clips. To create longer clips or hyper-realistic deepfakes is still a tedious and time-consuming process.

    The economically cheaper video generating products put people on notice far beyond Silicon Valley.

  • AI Infrastructure: India

    Reliance proposes to build, what may become the world’s biggest data center by capacity in India. It will leverage the rising demand for AI services.

    This data center will be powered by Nvidia’s semiconductors and will be set up in the town of Jamnagar, Gujarat. It will have a total capacity of three gigawatts. The project will be completed in record time — in 24 months. The largest data centers operating now are less than 1 gigawatt (DC Byte).

    Reliance is joining big league of companies such as Microsoft, Alphabet and Amazon that are pouring billions of dollars into data centers to deliver AI capabilities to customers worldwide. We have already observed about Stargate project of OpenAI, SoftBank and Oracle which proposes to put around $100 billion to $500 billion in the project.

    Every time a query is put to models like ChatGPT, it processes the query by operating. The company OpenAI have the pay for computing resources every time a user has a query. There is a demand for inferencing. A pretrained a model makes predictions or generates output. It is called inferencing. Here GPU time is consumed. Inferencing costs can be onerous. The whole project may have an investment of $20 billion to $30 billion. There were training costs to prepare the LLM for processing. Training is energy intensive. There is training time too. There are hosting costs of maintaining GPU servers.

    In the US, Facebook plans to invest $60 billion-$65 billion to build AI infrastructure. It will build more than 2-gigawatt data center, that is large enough to cover a significant part of Manhattan. Facebook plans to end 2025 with more than 1.3 million GPUs.

  • India: Wake up from AI Slumber

    India’s National Programme on AI initiated in 2018 has not made sufficient progress. India’s AI mission backed by a budged of $1.2 billion is amazingly very low, when compared with international benchmarks. Under CHIPS and Science Act, the US has committed $280 billion plus in private AI ventures. The Chinese government has invested $208 billion plus in AI startups. India is spending only a fraction of this. The US has proposed $500 billion Stargate programme.

    The US wants to strengthen its technological position in the 21st century, especially in emerging technologies. In fact, the US restricts the export of advanced AI chips and model weights. It can restrict India’s effects to build advanced AI capabilities. Additionally, access to large pre-trained AI models and their weights is crucial for applications. It could force India to rely on outdated models or to spend heavily to build its own models from scratch.

    The US restrictions extend to cloud computing, critical frameworks such as Tensor Flow and PyTorch. It restricts India’s R&D efforts.

    India STEM manpower has a quality deficit. India’s engineers do not have skills required for advanced AI R&D. Indian academia is outdated. Indian industry operates in silos. India has become a consumer of AI technologies developed abroad. This environment is not conducive for innovation.

    India should be favourable to industries from abroad, e.g. Tesla and Starlink. India should be flexible to e-commerce. India can improve its defence ties. Indian market alone will not compensate for India’s inability to produce intellectual property, foster talent and creation of infrastructure.

    India cannot afford to be complacent. It should wake up from AI slumber.

  • AI Infrastructure Project: the USA

    The USA has announced an ambitions $500 billion investment in AI infrastructure. It will be a private sector investment aiming to take a lead over the rival nations in the business-critical technology.

    It will be a joint venture of OpenAI, SoftBank and Oracle. The venture will be called Stargate.

    Its project will be to build infrastructure of data centers and create more than 1 lac jobs in the US. The first project is under construction at Texas.

    Masayoshi Son, Sam Altman and Larry Ellison launched the project from the White House in presence of President Trump.

    The project will have 20 such infrastructure data centers spanning half a million square feet each.

    The ultimate aim is to build artificial general intelligence (AGI). Microsoft has already collaborated with OpenAI for a $100 billion data center with an AI supercomputer also called Stargate.

    AI does require leave capital expenditure in computing power (pushing demand for specialized data centers) to link thousands of chips together in clusters. They should also produce a lot of electricity, by setting up captive plants if necessary.

    Stargate is a part of US-China tech battle. India should also look at foundational AI development. India’s reliance on US technology is evident but it will never adopt Chinese AI application programming interface.

  • Luo Fuli, Star behind the Star Performer: DeepSeek

    DeepSeek is the pathbreaking Chinese model that rivals the performance of ChatGPT, Gemini and Claude AI. It tops the charts of App store — Apple. It has spread ripples through the stock market. It has been developed by a young and talented team.

    One member of the team deserves special mention — Luo Fuli, a girl 29-year-old, hailed as AI Prodigy in China. She is known for her pioneering contributions to natural language processing (NLP). She has read at Beijing Normal University and has specialised in computer science. She later got a seat at Institute of Computational Linguistics (Peking University) where she published 8 papers. Alibaba and Xiaomi collaborated with her. She developed VEO, a multiligual pretraining model. She joined DeepSeek in 2022. She played a major role in developing DeepSeek-V2. It rivals ChatGPT.

    Lei Juan, Xiaomi founder, offered her a compensation package of 10 million yuan per annum.