Blog

Fine Tuning an LLM

LLMs, as we know, are pretrained for natural language processing (NLP). Such pre-trained models have certain weights assigned to the tokens. The model is further trained to improve its performance for a new or for a specific task. The model can also be trained to adapt to a new domain.

LLMs are fine tuned on a new data set to make them improve for a specific task. The specific task could be translation or summarization or question answering. The pre-trained model has been trained on vast dataset, whereas for fine tuning we will use a smaller dataset relevant to the specific task, say question answering.

There are various techniques of fine tuning. The most common is the use of supervised learning. The model is trained on labelled dataset. In question answering, the dataset would consist of pairs of questions and answers. The model is trained to predict the correct answer of each question.

A model can be fine tuned by training it from scratch. However this is time-consuming and computationally expensive. A model can be fine tuned by freezing some of the layers of the model. Here we will not revise weights of the model. That prevents overfitting. In partial fine tuning, only a subset of parameters are adjusted.

Fine tuning is an effective way to make the model more efficient. However, fine tuning could also lead to over-fitting. Overfitting means the model has mastered the specific details too well, and hence it loses the capacity to generalize for the new data. To prevent over-fitting, we can use smaller learning rate, regularization and early stopping.

26th September 2023
Large Language Models (LLMs)

A large language model (LLM) is first of all an AI algorithm. It employs deep learning techniques and massive data in order to understand, summarize, generate and predict new content. A closely related term is generative AI. It is a type of AI that has been architected to generate text-based content.

As we know, language is a means of communication. Spoken languages are based on communications –both human and technological. These have evolved over several millennia. Language chiefly is a system of syntax, and uses words and grammar to convey ideas and concepts. A model of AI that is language model also serves the same purpose.

AI language models date back to Eliza which debuted in 1966 at MIT. All language models are trained on datasets, and deploy various techniques to infer relationships. Then they are in a position to generate new data. Of course, it is based on trained data. Language models are used for natural language processing (NLP). A user puts a query in natural language (input) to get a response in natural language (output).

An LLM is evolution of a language model — here the model expands the data based for its training and its capability to infer. There is no agreement on how large the dataset would be, a typical LLM has at least one billion or more parameters. Parameter here means the variables present in the model on which it was trained, and which it can use to infer new content.

The genesis of modern LLMs goes back to 2017 when transformer models came up as neural models. LLMs thus have a large number of parameters and transformer architecture. LLMs have applications across many different domains.

LLMs are referred to as foundation models. This is a term coined by Stanford Institute for Human Centred AI in 2021. A foundation model is large and impactful. It is a foundation for further optimizations and specific use cases. AI and ML are the techniques to enhance efficiency (input/output ratio), effectiveness ( more output per unit of input), experience and evolution of an organisation. It is because of these benefits that businesses are inclined to invest in this technology.

Working of the Language Models

As we know by now, an LLM model is trained on a large volume of data called corpus. The dataset is generally expressed in terms of petabytes. There are steps in training. First, there is unsupervised learning where the model learns on unstructured data and unlabelled data. Its advantage is the vast data that is available . Here the model gets the capability to derive relationships between words and concepts.

As a next step, LLMs are fine tuned on self-supervised learning. There is some data labelling. It assists the model to identity the concepts more accurately.

Further, LLMs undertake deep learning through transformer neural network process. It enables the model to grasp the relationships and connections between words and concepts using self-attention mechanism. A score is assigned (commonly called a weight) to a given item or token in order to decide its relationship.

After the training process is over, there is a base for AI. If a prompt is given as a query, the model generates a response, say an answer to a question or some new text or summarised text or sentiment analysis.

Uses of LLMs

LLMs are used for text generation, translation, content summary, rewriting content, classification and categorization, summary sentiment analysis and conversational AI and chatbots.

Challenges in Building LLMs

There are huge development costs since we require expensive graphic processing units and massive data. There are operational costs which are high. A bias could sneak into the data. The model could be a blackbox, and we are unable to explain how it arrived at particular decision. There are inaccuracies or hallucinations. LLMs are complex models with billions of parameters. If a prompt is malicious, there are glitch tokens causing malfunction of the LLM.

Types of LLMs

There are generalised zero-shot models, and there are fine-tuned domain specific models. There is a BERT model or bidirectional encoder representations for transformer model. Lastly, there is multi-modal model that handles both text and videos, say GPT4.

Future

Models can acquire artificial general intelligence or become sentient. Models use techniques such as reinforcement learning from human feedback (RLHF). Google uses Realm or retrieval augmented generation language models.

26th September 2023
Translation Models

Among the various languages, the translation technologies server as a bridge. Translation software has been around since the 1950s. The software was largely used by researchers and professionals. Google Translate was launched in 2006, which made instant translation possible. With the advent of AI and generative AI, tools such as ChatGPT which responds to prompts brought about a translation revolution.

We can now translate a whole lot of documents with these tools very quickly, and with few errors. Many organisations are making use of such tools.

Google Translate is literal. Generative AI introduces context and grammar to the translations. Chatbots are conversational and can engage with the user.

The European Union (EU) have 24 official languages. They have a translation department. The translations run into millions of pages per annum. AI-based tools translate press releases.

There is a huge market for language translation. Indian Bhashini project brings varies languages together on its platform. ITDC has lanched a chatbot — AskDISHA. It is built using Bhashini.

Soon a new layer of voice and listening tools can be added to AI-powered translation models.

25th September 2023
UPI

India’s Unified Payments Interface (UPI) is the world’s largest payment network. It has digitized the economy. It has been introduced in 2016, and has become a game changer in payment system. It has already overtaken both credit card and debit card transactions. UPI’s user base has grown to 260 million.

In early 2023, UPI has been linked to Singapore’s PayNow enabling Indian users to remit up to SGD1000 per day. The focus is on NRIs abroad. The US too has large Indian diaspora but the rival payment networks such as Visa and MasterCard lobby against UPI.

Despite digitization, India still has a lot of cash transactions people still do cash-on-delivery (CoD) transactions in India’s e-commerce market worth $30 billion. The cash payment is preferred by Tier I and beyond, though there has been rise of digital currency led by UPI.

As Mastercard CFO says, though UPI is driving a digital shift in India, still it is not a money maker.

24th September 2023
Changes in the Computer Industry

In the last half-century, the computing industry has witnessed significant changes. The first was the mainframe era, the second was the shift to PC-server computing and third was the rise of Internet, mobile and cloud.

Currently, we are on the brink of a fourth transition — data-centric computing driven by AI.

Microsoft had introduced Azure AI platform. It is based on vision, speech, language, decision-making and custom ML. In collaboration with OpenAI. Microsoft aims to provide customers with a choice that suits their needs. This partnership is called Azure OpenAI Service. It brings OpenAI’s models such as ChatGPT to Azure customers. There are enterprise benefits of Azure such as enhanced security, compliance, and responsible AI.

With ChatGPT in preview in Azure Open AI Service, developers can integrate AI-powered experiences directly into their own applications. There would be bots to handle unexpected questions. The customer support will be faster. Claims processing would be automatic and so on.

24th September 2023
R&D Focus

India seeks to develop its economy at a faster pace. To achieve this, India must focus on R&D and innovation. R&D and innovation lie at the heart of the new global economy.

IIT-Bombay research paper has shown that there is a decline in PhD applications across the engineering stream and across all IITs since 2021.

All IITs have now full faculty strengths. And there is enough funding for core research areas. Besides the demand for cutting-edge research is pretty high. And still there is disinterest in pursuing PhD.

Maybe, attractive employment opportunities to graduates abroad could be the reason. Within the country itself, there are opportunities both in the public sector and private sector to students quite early on in their academic career. As it is students enrolling for Masters programmes are less in number, and Masters students are those who apply for PhD. In addition, PhD makes one a specialist in a niche area. It is a cause for concern.

In certain sectors, employment opportunities progressively decrease as the specialisation increases. Indian organisations too are not much interested in increasing their R&D expenditure. All this has to change.

There are less centres in India to incubate research and commercialize it. It makes research unattractive for students.

PhD programmes must be improved both qualitatively and quantitatively.

23rd September 2023
Google Exploring AI Partnership in India

Google is in a position to commercialise generative AI in India. They expect to serve SMBs (small and medium business), the public sector and even consumers. The company has moved forward from the trial or proof of concept stage. They are working with HDFC to build a centre of excellence around generative AI in the insurance sector. Apollo Hospitals will use generative AI to let the doctors decide next best course of action after examining the patients.

The government is wedded to ‘Make AI in India’ and ‘Make AI work for India.’ Google is ready to partner with the government to realise these goals. They can help in model training. They can provide secure data architectures. They use Vertex AI product. Here they exploit both Google and third-party models as well as open source models. Models can be trained in local language by partnering with Bhashini, the native AI-based language platform.

Google expects to train 1 million professionals and students to learn generative AI and related ML concepts. They have announced, a chair with the IIT, Mumbai to streamline Google Cloud and Google AI. They would like to help the government in cyber defence. They are training 1000 government employees in cyber defence.

Google has infused AI into all its products. Google is helping ONDC project. It is a technology service provider (TSP).

23rd September 2023
AI and Data Science Courses in IIT-B

IIT, Bombay has already a centre for ML and data science to run the post-graduate courses and doctoral programme. They have now decided to introduce AI and Data Science as a mandatory course for B.Tech and BS Courses at undergraduate level from this academic year (2023). Right now IIT-B runs science courses in Economics, Chemistry and Mathematics.

AI-DS will be introduced with mandatory credit in the first semester of the second year. Interested candidates can take up advanced courses later. The idea is to introduce the foundation course in these areas to make the students future-ready.

Earlier it was thought that computer and electrical students will be using AI. It is now realised that other branches of engineering too will be using AI for different reasons. AI is the present and future of industry, and students should be equipped with it.

22nd September 2023
Future of Big Tech

Bigness of technology firms such as Alphabet, Amazon, Apple, Meta and Microsoft stares us in the eye. A point of inflection has been reached with the arrival of generative AI. Bigness of all parameters such as revenue, profitability, consumer engagement, all-pervasiveness, community is so challenging. To sustain this, a company has to combine technology, brand and business successfully. Being big and remaining big by itself is a challenge.

Google’s YouTube and Workspace productivity tools has more than 2 billion users per month. It has formidable offerings such as Google search, Chrome browser, Android operating system, Google Play Store of apps, Google Maps, Google Translate, Google Mail and more. On Alphabet’s platforms collectively human beings spend 22 billion hours a day. (The Economist).

Google has benefited most by digital advertising. There is a revenue of $300 billion per annum. There is an average annual growth rate of 28 per cent in revenue ever since its IPO in 2004. Of the total inflows, 80 per cent is accounted for by online ads. The share prices have soared 50-fold. It is the world’s fourth most valuable company. However, its bread and butter digital advertising business is maturing. The world is now data driver, and data is the new oil. Internet landscape is the refinery. There are no oil wells left to be exploited by Google. The ban on cookies restrict the growth.

Investors now want cost efficiency and capital discipline. The growth rate of digital ads should remain at 10 per cent or nearly that in the next few years. There is already a huge decline from 20 per cent witnessed in the past decade.

Google’s total pie is $700 billion. Any farther gain in share could invite the scrutiny from the regulators. Already, there is an anti-trust suit in the US. Search engine is lucrative business attracting targeted ads. However, there are parallel searches on YouTube. TikTok and Instagram. Videos, however, are not as monetisable in terms of value.

Google has been active in generative AI, but Microsoft in collaboration with OpenAI too has moved fast in this space. Apple too is a valueable company.

Apple has metamorphased from a desk-top to iPhone company. Mircrosoft has transformed itself from a Windows company to a cloud-computing company.

Facebook focused big on metaverse. Though Gartner is still hopeful about metaverse, still these are early days.

Year 2022 brought AI into mainstream. It could make Big Tech bigger or may challenge them. Big Tech is growing in revenues and profits for the last 10 years. However, there are seeds of challenge here. Big Tech may run out of ways to sustain bigness. Slight sign of a slow down, and the market punishes Big Tech disproportionately. Layoffs of manpower is a tell-tale sign.

Big Tech too attempts to purchase growth, e.g. Microsoft trying to acquire Activision.

Upstarts will bring new technologies to the market. Margins will be under stress. Growth will be stunted. Big Tech may shrink to make place for new players. Or else, Big Tech will reinvent themselves.

22nd September 2023
Painful Addiction

Modern medicine aims to add to quality of human life. As we know, while we are afflicted by diseases, there is pain associated with them. Therefore, an incidental aim of modern medicine is to manage pain, and make life pain-free as far as possible.

There are many pain killer medicines used by doctors. The most commonly used is acetyl salicylic acid or aspirin. As it causes acidity, many more products were developed. Paracetamol or acetaminophen is a preferred pain killer.

In some conditions, the intensity of pain is much more, say post-surgery. There are also injuries which cause immense pain, say a soldier hurt by bullets. There are cancer patients who experience unbearable pain.

For such severe pain, in past morphine was used as a painkiller. These days morphine derivatives are used, e.g. pethidine and tramadol. As we know, these are very, these are very potent medicines, and are CNS depressants. An overdose causes depression of vaso-motor centre, and the patient may die. Besides, these pain killers could be taken habitually, and can cause addiction.

There were 6.45 lac deaths in the US due to overdose of pain medications between 1999 and 2021. It has been called opioid epidemic.

Media covered this tragedy in great detail. It highlighted the role of the Sackler family who ran a pharma company called Purdue Pharma which marketed OxyContin, a pain killer drug based on Oxycodone. In fact, there were three Sacklers — Arthur, Mortimer and Raymond. Arthur, the oldest brother pioneered medical advertising. He roped in physicians to prescribe Purdue products. He expired in 1987. His stake in the company was given to the surviving two brothers.

OxyContin was heavily promoted. In fact, it became a key drug that contributed to opioid epidemic.

Many states filed suits against the Sacklers for their involvement in opioid crisis.

The Sacklers were not alone in causing the epidemic. John Kapoor introduced a more powerful pain killer than OxyContin called Fentanyl which is a synthetic opioid. It was cleared by the FDA for cancer patients in hospital settings. It was not a new molecule, but its novel administration — sub-lingual — was patented. It was introduced as a spray to be used under the tongue. It was called Subsys. Pliable doctors prescribed it for all sorts of patients.

By the time Mr. Kapoor and his team was caught and prosecuted, many lines were lost or destroyed. All members of Kapoor’s team and he himself, were punished.

21st September 2023