Key Terminology of LLMs

We shall try to understand the key words associated with large language models.

LLMLarge Language Model: It is a neural network, also called a foundation model that understands and generates human-like text. The text generated is contextually relevant. The examples of LLMs are GPTs, Gemini, Claude, Llama.

Training: An LLM is trained on a vast corpus of dataset. The model learns to predict the next word in a sequence. Its precision is enhanced by adjusting its parameters.

Fine-tuning: A pre-trained model performs broad number of tasks. It is fine-tuned to perform certain specific tasks or operate in a specific domain. The model is trained on such specific data, not covered in the original training data.

Parameter: A parameter is a variable part of the model’s architecture, e.g. weights in neural networks. They are adjusted to minimize the difference between the predicted and actual output.

Vector: In ML, vectors are an array of numbers representing data. The data can be processed by the algorithms. In LLMs, words or phrases are converted into vectors (called embeddings) which capture semantic meanings.

Embeddings: These are dense vector representations of text. Here familiar words have similar representations in vector space. Embeddings capture context and semantic similarity between words. The technique is useful in machine translation and text summarization.

Tokenization: Text is split into tokens — words, sub-words or characters.

Transformers: This neural network architecture relies on self-attention to weigh the influence of different parts of the input data differently. It is useful in NLP tasks. It is at the core of modern LLMs.

Attention: Attention mechanism enables models to focus on different segments of the input sequence while generating a response. The response is contextual and coherent.

Inference: Here the trained model makes predictions — generates text based on input data (using knowledge gained during trained).

Temperature: It is a hyperparameter and controls the randomness of predictions. A higher temperature produces more random outputs while a lower temperature makes the output deterministic. The logits are scaled up before applying SoftMax.

Frequency: The probability of tokens based on their frequency of occurrence is considered. Here we can balance the generation of common and less common words.

Sampling: In generating the text, next are words randomly picked based on its probability distribution. It makes the output varied and creative.

Top-K-sampling: The next word is limited to the K most likely next words. It reduces randomness of text generation, though the variability in the output is maintained.

RLHF – Reinforcement Learning from Human Feedback: Here model is fine-tuned based on human feedback.

Decoding strategies: In decoding, output sequences are chosen. There is greedy decoding — next word that is most likely is chosen at each step. In beam search, the technique is expanded by considering multiple possibilities at the same time. This affects diversity and coherence of the model.

Prompting: Here inputs or prompts are designed to guide the model to generate specific outputs.

Transformer-XL: The transformer architecture is extended. The model learns beyond a fixed length without compromising coherence. It is useful in long documents of sequences.

Masked Language Modelling (MLM): Certain input data segments are masked during training, prompting and the model is expected to predict concealed words. It is used in BERT to enhance effectiveness.

Sequence-to-sequence Models — Seq2Seq: Here sequences are converted from one domain to another. To illustrate, translate from one language to another. Or converting questions to answers. Here both the encoder and decoder are involved.

Generative Pre-trained Transformer (GPT): These are developed by OpenAI.

Multi-Head Attention: This is a component of transformer model — model focuses on various representation perspectives simultaneously.

Contextual Embeddings: Here the context of the word is considered. These are dynamic, and change based on the surrounding text.

Auto-regressive Models: These models predict the next words based on previous words in a sequence. It is used in GPTs. Each output word becomes the next input. It facilitates coherent long generation.

Saudi to be AI Superpower

In March, 2024, all tech executives, engineers and sales representatives were stuck in a traffic jam while heading for a conference some 50 miles outside Riyadh in Saudi Arabia. The main attraction was investments of billions of dollars to build tech industry in Saudi to complement its oil industry.

Some 2 lac plus people attended the event. Amazon announced an investment in data centers and AI in Saudi. IBM talked about close collaboration with Saudi. Huawei and other firms made speeches. All were ready to invest in Saudi.

Saudi has decided to become a dominant player in AI. It is pouring in big money. Saudi created $100 billion fund in 2024 to invest in AI and other technology. It wants to build a domestic tech industry. Even its neighbour UAE is investing in tech and AI. Saudi Arabia together with the neighbour will create a new power center in the global tech industry.

The US may have misgivings about the Saudis and Chinese collaboration in this area. To counter this, the US has already brokered a Microsoft deal with G42.

The scorching heat of the desert (reaching 110 degrees F in summer), the not so liberal environment, the attitude towards the LGBTQ, group — all these are issues. But a strong will to overcome the issues will be very helpful.

Foundation Models

Foundation models as a term got currency in August 2021. It was coined by the Stanford Institute for Human-Centred AI (HAI) Center for Research on Foundation Models (CRFM). These models are trained on a broad spectrum of generalized and unlabelled data. They are capable of performing a wide variety of tasks such as understanding of language, generating text and images and conversing in natural language.

As the original model provides a base or foundation on which other things are built, these are called foundation models.

These models are adaptable across various modalities — text, images, audio and video.

These models laid the groundwork for chatbots — process user inputs and retrieve relevant information.

Large language models (LLMs) fall into this category of foundation models. The GPT-n class are an example of this. Being trained on a broad corpus of unlabelled data, they are adaptable to many tasks. This earns GPT-n the tittle of foundation model.

AI’s Exciting Journey

AI symbolizes human urge to evolve. It characterizes human ingenuity and thirst for knowledge.

We are reminded of Alan Turing’s ‘universal machine’ and the Turing Test which led to a digital revolution. There was a constant revolution from basic computers to today’s sophisticated AI systems.

The 1990’s were remarkable years. Google in 1998 established a data organization. It foreshadowed the later advancements of AI we are witnessing today.

In early 2000’s, we saw advancements in neural networks. It peaked with ImageNet competition of 2012.

Indian AI sector had shown growth by leaps and bounds, and it was in the forefront of healthcare innovations. Here AI matched human diagnostic accuracy.

In 2012, there was a significant boost in computing power for AI on account of GPUs. It accelerated deep learning and real-time decision-making.

In 2020, the large language model (LLM) appeared-GPT3. It generated text.

AI’s Usefulness

India stands on the cusp of an AI revolution and can benefit from AI in the following areas :

1. Heathcare: AI makes it possible to do remote diagnostics and disease prediction. In underserved regions, AI provides and improves access at lower costs.

2. Agriculture: AI can promote precision farming techniques to increase yield and improve sustainability.

3.Education: There could be personalized learning platforms. These enhance learning experiences. It is especially useful for rural children.

4. Urban planning: AI can predict traffic and optimize traffic flows. It can help public transport and infrastructure management.

5. Financial inclusion: AI can facilitate the extension of credit facilities to the unbanked.

6. Indian platforms and IP: India must develop various useful platforms and IP. India should own technology stacks or pharmaceutical molecules. AI can be used for societal benefits.

Commoditization of AI

In the IT field, it does not matter which brand of PC, laptop or smart phone one is using. All these devices could be easily swapped. Even the databases and cloud systems are not unique. Could AI also go the same way?

Commoditization occurs when products or services are interchangeable. The offerings are not distinguishing enough.

AI tools and platforms are extensively used. There could not be significant advantage to any one firm, as AI is available to all. Is AI in the danger of being commoditized?

AI will cease to be a buzzword or a novelty. It will be taken for granted.

Some corporates will leverage AI faster and better than their competitors. However, this is going to be short-lived. The same tools and techniques will be soon used by others.

Business edge will result from innovative culture and forward-looking thinking. You can use a smart phone to make movies, but one cannot be a Spielberg by using a smart phone.

MLOps and LLMOps

MLOps(Machine Learning Operations) and LLMOps are concepts concerned with managing the life cycles of ML models. The areas of focus differ.

MLOps is a broader term that cover the operations processes for all types of ML models — efficient development, deployment and monitoring of these models.

LLMOps is specifically designed for LLMs. These models are used for NLP and LLMOps addresses the unique challenges associated with the lifecycle of these complex models.

Both these concepts of have some common goals — efficiency, reliability and fairness. They have some distinct considerations. The metrics relied upon are accuracy, precision and recall for MLOps. LLMs use more nuanced metrics such as BLEU and ROUGE to assess language fluency and coherence. LLMs, in addition, put a premium on interpretability, fairness and bias mitigation.

MLOps are adaptable across various ML domains. LLMOps are specialized.

Frameworks for LLMs

To interact with LLMs and make them more accessible for various applications, there are frameworks such as LangChain, Llama Index and frameworks for LLM serving.

LangChain provides standardized interface for interacting with multiple LLMs. It offers tools for building apps with LLMs.

Llama Index helps to organize and curate data sources for the LLMs.

LLM serving frameworks are designed to optimize the process of deploying LLMs in production environment. They handle tasks such as model loading, inference and routing requests.

Essentially, LLM frameworks are toolkits that help developers to interact with and leverage the LLMs more effectively.

There could be standardized interfaces with different LLMs (irrespective of their architecture or API). There is prompt engineering to get the desired output from an LLM. There are tools and libraries to help developers design and optimize prompts. There is performance optimization by using frameworks that provide tools to optimize LLM inference for better performance.

Some frameworks enable chaining multiple LLMs together to create more complex workflows and apps. Frameworks integrate with other development tools and libraries,

Apart from LangChain and LlamaIndex, we have OpenLLM and Ray Serve frameworks.

Ries Passed Away

A well-known name in the marketing field for his work on positioning, AI Ries passed away on October7, 2022 at his home in Atlanta at the age of 95. AI Ries and his colleague Jack Trout (at Trout and Ries, Manhattan) proposed to their clients that creative advertising was not enough to persuade consumers to buy. They advocated positioning — to find a slot in the mind and be the first to occupy that slot. As IBM did by owning that slot — computers. As Volvo did it — safety. As FedEx did it — overnight. Burger King’s burgers were boiled, and not fried.

In 2005, Ad Age ranked the most important marketing ideas of the last 75 years. Positioning stood at No.56. In 2009, Ad Age conducted a survey on the best books on marketing. The No.1 ranked book was Ries and Trout’s book ‘Positioning : The Battle for Your Mind’ (1981).

Mr. Ries was inducted into American Marketing Association’s Marketing Hall of Fame in 2016. Positioning as a concept is a milestone in the evolution of modern marketing. It has influenced a whole generation of marketers.

Ries had started an advertising firm in 1963. Trout joined in 1967. They implemented positioning as concept. In essence, it was a position in the mind.

Trout passed away in 2017 at the age of 82.

In 1979, the firm was renamed Trout and Ries. They converted it to a consulting firm in 1989. They moved to Greenwich, Conn.

Ries is survived by a daughter, Laura and wife Mary Lou Ries, two other daughters Dorothy and Barbara and a son Charles.

Ries and Trout separated in 1994. Both set up separate firms.

Laura added emphasis on visual imagery while extending her father’s concept of positioning. Laura referred to her father’s contention — own a word in the mind. She says mere words are not enough — a visual is much more powerful.

Patenting of Medicines

In healthcare, about 50 per cent cost is that of medicines used. Some medicines cost exorbitantly high on account of patenting. These days generic medicines of certain pharma companies compete with patented medicines, bringing the costs of patented medicines down to some extent. Generics make the medicines affordable.

In early 2024, Indian patent rules have been modified — the objections to patents at pre-grant stage have become difficult. That has made patenting easy, and that has increased the prices of drugs.

There are no provisions in the Indian Patent Act to oppose patenting. If there is successful opposition, it results into generic companies being allowed to produce the same drug. There is competition. There could be opposition to patent even after it has been granted.

In early 1970s, the Patent Act was changed, making drugs affordable. India granted process patents, and not product patents. Any Indian company thus could produce the patented drug by using an alternative process. It made generics available, which were exported. India emerged as a leading generic exporter by late 1980s, and leading generic producer by 1990s.

In 1995, there was TRIPS Agreement (trade-related intellectual property rights). It reintroduced product patents which are novel and inventive. In 2005, the Patents Act was amended in the light of TRIPS. Most of the drugs patented in the US and Europe, however, were only new forms (me-too drugs). There was no significant increase in the therapeutic benefit. India was slipping back to pre-2005 situation.

Political parties and civil society here introduced an amendment to the Patent Act — Section 3(d). It was to ensure that an old drug in a new form would not be patented — unless its therapeutic efficiency is significantly better.

Indian Patent Act was amended to allow opposition to patent at pre-grant and post-grant stages. There could be revocation of patents. Rules were framed to accommodate this. The government can issue licences to other companies without the consent of the patent holder. The flexibility of TRIPS was leveraged.

The rules are sought to altered because of the pressure of big pharma. There is a demand for the repeal of Section 3(d). This is the demand while negotiating FTAs with the US, UK and EU.

PGO (pre-grant opposition) comes from civil society and patient’s groups. Genetic companies are reluctant to file PGOs. PGOs were replied to by the applicant. The opponent filed a rejoinder. It is then the decision of the patent controller. The modified rules allow the patent controller to decide whether PGO is maintainable. It could be dismissed. It is an arbitrary power. In past, opposition scuttled many patents (as they were frivolous). Amendments allow non-deserving patents. The opponent has also to pay fees. It is a financial burden.

The patent holder was to report to the patent controller how the patent is worked out every year. This has to be done now every three years. Now working of patents is one of the bases for seeking a compulsory licence. The public is in the dark. It makes compulsory licensing difficult.