Old Pals Turn Rivals in AI Race

Denis Hassabis and Mustafa Suleyman both were London residents and now compete with each other by being in the rival companies in AI — one is at Google and the other at Microsoft.

Mustafa Suleyman is a Syrian immigrant — father a taxi driver and mother a nurse. At the age of 11, he was accepted as a student at Queen Elizabeth’s School. From one of the roughest areas of London, they moved to a safer locality in north.

Denis Hassabis and Mustafa Suleyman came to know each other as the younger brother of Denis was friend of Mustafa Suleyman. Those days, Denis was 20, and was a chess player and a video game designer. His parents ran a toy shop at London.

Dr.Hassabis, now 47, is the Chief Executive of Google DeepMind, devoted to AI. Suleyman is 39 now and has been appointed the Chief Executive of Microsoft AI.

They meandered from London to the Big Tech quite unusually. In 2010, they were both co-founders of DeepMind, a seminal AI research lab. Their paths diverged when Google acquired DeepMind in 2014 for $650 million.

ChatGPT from OpenAI arrived in 2022, and that kicked off an AI race. Dr. Hassabis was put in charge of AI research by Google. Mustafa Suleyman established another startup, Infection AI that struggled to gain traction. Unexpectedly, Microsoft hired him and most of his team.

Already, Google was rattled by Microsoft-backed OpenAI who introduced ChatGPT. Microsoft and Google are now rivals competing in the field of AI. Mustafa Suleyman is the head of Microsoft’s AI division.

Dr. Hassabis in an interview claimed that what Mustafa Suleyman has learnt about AI is a result of his association with DeepMind all these years.

Hassabis studied for a computer degree, and Suleyman studied philosophy and theology at Oxford. He dropped out to set up a help line for Muslim teenagers and working as a human rights officer for the mayor of London. Dr. Hassabis did PhD in neuroscience.

In 2010, they discussed how they could change the world.

Dr. Hassabis was on the verge of completing his post-doctoral work at the Gatsby Computational Neuroscience Unit (a University College London lab) that combined neuroscience with AI. Dr. Hassabis was a hard-working young man, and he invited Suleyman to build a startup. Shane Legg too joined. He was an AI researcher. All three met at an Italian restaurant and believed that AI could change the world.

They obtained funding from Peter Thiel, a venture capitalist from Silicon Valley and set up DeepMind by the end of 2010. Its stated mission was to develop AGI.

Dr. Hassabis and Dr. Legg pursued intelligent machines. Suleyman’s task was to build products. As competition became fierce, and manpower was being poached, they decided to sell the DeepMind to Google.

Mustafa Suleyman’s leadership style was aggressive. He was placed on leave (2019). He too needed a break after hectic work for 10 years. He moved to Googles Cailfornia office but was not happy. He left to set up Inflection AI. He remained independent for some time. In March, 2024, Inflection AI vanished into Microsoft with Suleyman put in charge of a new Microsoft AI business.

Suleyman now shuffles his time between Silicon Valley and London and is an official rival to DeepMind.

Suleyman and Hassabis still text to each other and may meet over a dinner. Dr. Hassabis says that he is not much worried about any rivals.

Small Language Models (SLMs)

Microsoft released in April 2024 Phi-3 mini, the first among three small language models (SLMs) the company plans to release.

SLMs are compact versions of LLMs. Phi-3 mini is an SLM that has 3.8 billion parameters, and is trained on a much smaller dataset, as compared to GPT-4.

It supports a context window of up to 1.28 lac tokens.

The company will add more models including Phi-3 small and Phi-3 medium to the Phi-3 family.

These models are cost-effective to operate and perform better on smaller devices such as laptops and smartphones.

SLMs are fine-tuned and are customized for specific tasks. They undergo targeted training (less computing power and energy consumption).

After receiving the prompt, the time taken by a model to make predictions is called inference latency. SLMs process quickly as they are smaller in size. They are more responsive and are suitable for real-time applications.

According to Microsoft, Phi-3 mini has outperformed models of the same size and next size across a variety of benchmarks (language, reasoning, coding, math).

It is ideal for analytical tasks as Phi-3 mini has strong reasoning and logic capabilities.

Speculation about GPT-5

As we all know, OpenAI keeps working on GPT series, and is currently working on GPT-5. OpenAI is comparatively a smaller organization when pitted against tech biggies such as Facebook, Google, Apple and Amazon. However, it stole the march on them by releasing ChatGPT in late November 2022. It should be noted that its tie-up with Microsoft does not rob it of all that it has achieved. Many had jumped on the AI bandwagon since then. OpenAI is striving to maintain the lead. Maybe, GPT-5 is the answer.

Undoubtedly, GPT-5 will be having many new features which its previous counterparts lacked. However, what these new features would be is a matter of guess. GPT-5 is being trained since 2023, and may perhaps have new parameters or a new architecture.

GPT-5 is likely to be multi-modal — text, images, voice, video and it may have internet access by default. (GPT3 was limited to a previous date in terms of internet access).

GPT-5 could be a smart agent with new capabilities. It could perform some tasks autonomously, say ordering things, taking calls and so on.

Sam Altman is not sure about the time of the launch. However, he confirms the organization will launch an amazing new model this year. It is not clear whether it will be an upgrade the existing model or something new, say GPT-4.5 or GPT-5 itself.

One thing is certain. It will pave the way for more versatile and capable AI models.

Key Terminology of LLMs

We shall try to understand the key words associated with large language models.

LLMLarge Language Model: It is a neural network, also called a foundation model that understands and generates human-like text. The text generated is contextually relevant. The examples of LLMs are GPTs, Gemini, Claude, Llama.

Training: An LLM is trained on a vast corpus of dataset. The model learns to predict the next word in a sequence. Its precision is enhanced by adjusting its parameters.

Fine-tuning: A pre-trained model performs broad number of tasks. It is fine-tuned to perform certain specific tasks or operate in a specific domain. The model is trained on such specific data, not covered in the original training data.

Parameter: A parameter is a variable part of the model’s architecture, e.g. weights in neural networks. They are adjusted to minimize the difference between the predicted and actual output.

Vector: In ML, vectors are an array of numbers representing data. The data can be processed by the algorithms. In LLMs, words or phrases are converted into vectors (called embeddings) which capture semantic meanings.

Embeddings: These are dense vector representations of text. Here familiar words have similar representations in vector space. Embeddings capture context and semantic similarity between words. The technique is useful in machine translation and text summarization.

Tokenization: Text is split into tokens — words, sub-words or characters.

Transformers: This neural network architecture relies on self-attention to weigh the influence of different parts of the input data differently. It is useful in NLP tasks. It is at the core of modern LLMs.

Attention: Attention mechanism enables models to focus on different segments of the input sequence while generating a response. The response is contextual and coherent.

Inference: Here the trained model makes predictions — generates text based on input data (using knowledge gained during trained).

Temperature: It is a hyperparameter and controls the randomness of predictions. A higher temperature produces more random outputs while a lower temperature makes the output deterministic. The logits are scaled up before applying SoftMax.

Frequency: The probability of tokens based on their frequency of occurrence is considered. Here we can balance the generation of common and less common words.

Sampling: In generating the text, next are words randomly picked based on its probability distribution. It makes the output varied and creative.

Top-K-sampling: The next word is limited to the K most likely next words. It reduces randomness of text generation, though the variability in the output is maintained.

RLHF – Reinforcement Learning from Human Feedback: Here model is fine-tuned based on human feedback.

Decoding strategies: In decoding, output sequences are chosen. There is greedy decoding — next word that is most likely is chosen at each step. In beam search, the technique is expanded by considering multiple possibilities at the same time. This affects diversity and coherence of the model.

Prompting: Here inputs or prompts are designed to guide the model to generate specific outputs.

Transformer-XL: The transformer architecture is extended. The model learns beyond a fixed length without compromising coherence. It is useful in long documents of sequences.

Masked Language Modelling (MLM): Certain input data segments are masked during training, prompting and the model is expected to predict concealed words. It is used in BERT to enhance effectiveness.

Sequence-to-sequence Models — Seq2Seq: Here sequences are converted from one domain to another. To illustrate, translate from one language to another. Or converting questions to answers. Here both the encoder and decoder are involved.

Generative Pre-trained Transformer (GPT): These are developed by OpenAI.

Multi-Head Attention: This is a component of transformer model — model focuses on various representation perspectives simultaneously.

Contextual Embeddings: Here the context of the word is considered. These are dynamic, and change based on the surrounding text.

Auto-regressive Models: These models predict the next words based on previous words in a sequence. It is used in GPTs. Each output word becomes the next input. It facilitates coherent long generation.

Saudi to be AI Superpower

In March, 2024, all tech executives, engineers and sales representatives were stuck in a traffic jam while heading for a conference some 50 miles outside Riyadh in Saudi Arabia. The main attraction was investments of billions of dollars to build tech industry in Saudi to complement its oil industry.

Some 2 lac plus people attended the event. Amazon announced an investment in data centers and AI in Saudi. IBM talked about close collaboration with Saudi. Huawei and other firms made speeches. All were ready to invest in Saudi.

Saudi has decided to become a dominant player in AI. It is pouring in big money. Saudi created $100 billion fund in 2024 to invest in AI and other technology. It wants to build a domestic tech industry. Even its neighbour UAE is investing in tech and AI. Saudi Arabia together with the neighbour will create a new power center in the global tech industry.

The US may have misgivings about the Saudis and Chinese collaboration in this area. To counter this, the US has already brokered a Microsoft deal with G42.

The scorching heat of the desert (reaching 110 degrees F in summer), the not so liberal environment, the attitude towards the LGBTQ, group — all these are issues. But a strong will to overcome the issues will be very helpful.

Foundation Models

Foundation models as a term got currency in August 2021. It was coined by the Stanford Institute for Human-Centred AI (HAI) Center for Research on Foundation Models (CRFM). These models are trained on a broad spectrum of generalized and unlabelled data. They are capable of performing a wide variety of tasks such as understanding of language, generating text and images and conversing in natural language.

As the original model provides a base or foundation on which other things are built, these are called foundation models.

These models are adaptable across various modalities — text, images, audio and video.

These models laid the groundwork for chatbots — process user inputs and retrieve relevant information.

Large language models (LLMs) fall into this category of foundation models. The GPT-n class are an example of this. Being trained on a broad corpus of unlabelled data, they are adaptable to many tasks. This earns GPT-n the tittle of foundation model.

AI’s Exciting Journey

AI symbolizes human urge to evolve. It characterizes human ingenuity and thirst for knowledge.

We are reminded of Alan Turing’s ‘universal machine’ and the Turing Test which led to a digital revolution. There was a constant revolution from basic computers to today’s sophisticated AI systems.

The 1990’s were remarkable years. Google in 1998 established a data organization. It foreshadowed the later advancements of AI we are witnessing today.

In early 2000’s, we saw advancements in neural networks. It peaked with ImageNet competition of 2012.

Indian AI sector had shown growth by leaps and bounds, and it was in the forefront of healthcare innovations. Here AI matched human diagnostic accuracy.

In 2012, there was a significant boost in computing power for AI on account of GPUs. It accelerated deep learning and real-time decision-making.

In 2020, the large language model (LLM) appeared-GPT3. It generated text.

AI’s Usefulness

India stands on the cusp of an AI revolution and can benefit from AI in the following areas :

1. Heathcare: AI makes it possible to do remote diagnostics and disease prediction. In underserved regions, AI provides and improves access at lower costs.

2. Agriculture: AI can promote precision farming techniques to increase yield and improve sustainability.

3.Education: There could be personalized learning platforms. These enhance learning experiences. It is especially useful for rural children.

4. Urban planning: AI can predict traffic and optimize traffic flows. It can help public transport and infrastructure management.

5. Financial inclusion: AI can facilitate the extension of credit facilities to the unbanked.

6. Indian platforms and IP: India must develop various useful platforms and IP. India should own technology stacks or pharmaceutical molecules. AI can be used for societal benefits.

Commoditization of AI

In the IT field, it does not matter which brand of PC, laptop or smart phone one is using. All these devices could be easily swapped. Even the databases and cloud systems are not unique. Could AI also go the same way?

Commoditization occurs when products or services are interchangeable. The offerings are not distinguishing enough.

AI tools and platforms are extensively used. There could not be significant advantage to any one firm, as AI is available to all. Is AI in the danger of being commoditized?

AI will cease to be a buzzword or a novelty. It will be taken for granted.

Some corporates will leverage AI faster and better than their competitors. However, this is going to be short-lived. The same tools and techniques will be soon used by others.

Business edge will result from innovative culture and forward-looking thinking. You can use a smart phone to make movies, but one cannot be a Spielberg by using a smart phone.

MLOps and LLMOps

MLOps(Machine Learning Operations) and LLMOps are concepts concerned with managing the life cycles of ML models. The areas of focus differ.

MLOps is a broader term that cover the operations processes for all types of ML models — efficient development, deployment and monitoring of these models.

LLMOps is specifically designed for LLMs. These models are used for NLP and LLMOps addresses the unique challenges associated with the lifecycle of these complex models.

Both these concepts of have some common goals — efficiency, reliability and fairness. They have some distinct considerations. The metrics relied upon are accuracy, precision and recall for MLOps. LLMs use more nuanced metrics such as BLEU and ROUGE to assess language fluency and coherence. LLMs, in addition, put a premium on interpretability, fairness and bias mitigation.

MLOps are adaptable across various ML domains. LLMOps are specialized.