Training LLMs

We have already learnt that an LLM is trained on vast amounts of data consisting of mountains of text. It then learns to predict the next word. Each prediction requires small adjustments to improve its chances of getting the prediction right. All this training gives an LLM a statistical understanding of proper language. All this is a part of pre-training. However, an LLM fumbles when asked to crack a joke to elevate the mood. Here reinforcement learning through human feedback (RLHF) helps. OpenAI introduced this technique in March 2022. (As you know, ChatGPT was released in November 2022 eight months later.)

There are three steps in RLHF. To a given prompt, human volunteers are asked to choose two potential LLM responses. This is repeated thousands of times. The data is used to train a second LLM. It is called the reward model. It assigns higher scores to responses a human would like (and lower to everything else). In RLHF, knobs and levers are tweaked of the original LLM to help reinforce the behaviours that earn it a reward.

It takes time and is cumbersome. The same results can be achieved with less effort. It is called Direct Preference Optimization (DPO). Archit Sharma and Eric Mitchell presented DPO in December 2023.

There is an observation. For every reward model, there is a specific theoretical LLM that scores full marks. Each LLM conceals an implicit reward model. Researchers can tinker with it. LLM, instead of learning from another LLM, can learn directly from the data. Thus, the intermediary is removed. It makes the process efficient. DPO is being used extensively by leading LLMs. Facebook has integrated DPO in its model. French model Mistral uses DPO.

Architectures of Decoder-Only (GPT) and Encoder-Only (BERT) Models

There was a debut of Transformer in 2017 which stimulated the race to produce new models. OpenAI took the first initiative in June 2018 to create GPT: a decoder-only model that excelled in Natural Language Generation (NLG). It ultimately powered ChatGPT. Google responded by releasing BERT in October 2018, four months later. BERT is an encoder-only model designed for Natural Language Understanding (NLU).

Decoder-Only Models

The decoder block in the Transformer generates an output sequence based on the input provided to the encoder. Decoder-Only models eliminate encoder block entirely. Instead, multiple decoders are stacked together in a single model. These models accept prompt as inputs and generate responses by predicting the next most probable word (or say token) one at a time in a task called Next Token Prediction (NTP). Decoder-Only thus excels in NLG such as conversational chatbots, machine translation and code generation. As ChatGPT is widely used, the public is familiar with such models. ChatGPT is powered by decoder-only models such as GPT-3.5 and GPT-4.

Encoder-Only Models

The encoder block in the transformer accepts an input sequence and creates vector representation for each word (or token). Encoder-only model eliminates decoder and stacks multiple encoders to produce a simple model. These models do not accept prompts. They rather accept an input sequence for a prediction to be made upon a missing word in the sequence. Encoder-only model lacks the generating capacity (of new words). Thus, they are not used for chatbot applications. Instead, encoder-only models are used for NLU tasks such as Named Entity Recognition (NER) and Sentiment Analysis. The vector representation provides a deep understanding of the input texts to the BERT models. Though it is technically possible to generate text with BERT, that is not for which this architecture is meant. The results are not as good as decoder-only models.

Thus, Transformer model has both has both encoders and decoders, GPT models are decoder-only models and BERT models are encoder-only models.

It is GPT model that made transformer pre-training popular. It covered broad understanding of language nuances (word usage and grammatical patterns). This produced a task-agnostic foundational model. After training, a foundational model can be fine-tuned for specific task Fine-tuning involves training only the linear layer (a small feedforward neural network). The weights and biases of the rest of the model or the foundational portion remain unchanged.

GPT-4o and Prafulla Dhariwala

We have already observed about the release of GPT-4o by OpenAI on May 13, 2024. Sam Altman, CEO two days later attributed this release to the efforts taken by Prafulla Dhariwala, a Pune resident who now works as a research scientist heading the Omni team at OpenAI.

Altman said GPT-4o would not have seen the light of the day, had it not been for the vision, talent, conviction and determination of Prafulla over a long period of time. It will be hailed as a revolution the way we use computers.

Omni team’s first contribution is GPT-4o. It is OpenAI’s first native multi-modal model. On X, Prafulla mentions that it was a huge organization-wide effort and was a result of hard work done by his team.

Prafulla joined OpenAI in 2016 as a research intern. He rose through the ranks to be a research scientist working on generative AI models and unsupervised learning. Prafulla won in 2009 the National Talent Search Scholarship (GOI). He won a gold medal at Astronomy Olympiad at China. He also won a gold medal at the International Mathematical Olympiad in 2012 and at Physics Olympiad in 2013.

His PCM score at 12th class was 295 out of 300. He excelled at entrance examinations. He scored 330 out of 360 at Jee-Mains. However, instead of IIT, he joined MIT. He took Bachelor’s in Computer Science at MIT in 2017 with a perfect GPA of 5.0/5.0.

SQL Turns 50

1974 May. Donald Chamberlain and Raymond Boyce released a paper — SEQUEL which was a structured query language. It could be used to manage and sort data. Since SEQUEL has been copyrighted by another company, it was renamed Structured Query Language (SQL). Database companies such as Oracle adopted it together with relational database products in the 1970s. The rest is history.

SQL is now 50 years old. It was designed and adopted around databases. It could manage data. We could interact with data. It ranks third among the most popular languages used by programmers. It facilitates the placement of programmers. Some other equally old languages are COBOL (1959) and FORTRAN (1958). They have become legacy languages. SQL is still being used even for AI and analytics.

Why has it survived so long? It is not easy language. It has a peculiar syntax. Database vendors must support SQL. Each vendor has his quirks and nuances to implement this support. The approach for one database may change from that of another database. In SQL, there could be mistakes. The consequences are disastrous.

SQL is based on strong mathematical theory. It is effective and support the use cases it is designed for. SQL combined with relational databases is mapping the data. It is reliable. It is scalable. SQL works.

It returns multiple rows per single request. It is easier to get data on what is happening within a dataset, and within the business and its apps.

SQL makes it easier to compartmentalize and segregate information into a number of tables. Tabulation makes it easy to use the data for different tasks.

SQL remains contemporary by moving with the times. It has added support for geographic information system (GIS) data. It can be combined with vector data. Vector searches could be conducted for generative AI.

There were attempts to replace SQL. NoSQL data bases were developed to replace relational databases. Instead of replacement, such databases added their own SQL-type languages replicating some features of SQL.

NLP advocates called for doing away with SQL’s standardized and clunky approach. Still such attempts led to methods that were as much clunky as what they tried to replace. Generative AI may take on the task of writing SQL for developers. LLMs have already been exposed to large quantities of SQL code while being trained.

SQL may move behind the curtain, but will continue to pay a crucial role in how we interact with the data and use data. SQL is here to stay.

Alternatives to GPUs

For an AI model which has 15-30 billion parameters, there is a need to have infrastructure with GPUs. However, they can use a CPU to get started and then can switch over to GPU. In the meantime, they can think of an accelerator like Gaudi that gives similar performance at a lower cost/lower power.

There is a waiting period of 16 weeks to procure GPUs. In the meanwhile, the existing infrastructure can run the models — it is necessary to evaluate what we are trying to run, the parameters involved and the use cases. There is Intel Developer Cloud where customers can come and try out and run their models.

Xeon cost is lower than that of a GPU. The cost is exponentially lower for using accelerators. Intel has recently announced Gaudi 3. It is 50 per cent better on inference performance, and 40 per cent lower in terms of power consumption.

There are alternatives — from a Xeon, a Gaudi to a GPU.

GitHub

GitHub is Microsoft-owned community of developers — a software collaboration and innovation platform. It has 13.2 million developers associated with it. Indians have second highest number of generative AI projects on GitHub. It allows developers to create, store, manage and share their code. It is heartening to see the contributions of Indian developers of generative AI projects on GitHub. The US, India and Japan are the major contributors to generative AI projects.

Internationally, over 50,000 organizations are using GitHub Copilot. It has 1.8 million paid subscribers. Infosys has embraced GitHub platform. Cognizant’s 35000 developers have been trained on GitHub Copilot. There are 40000 more waiting for training. MNCs have increased their usage of GitHub.

Generative AI in the past two years have changed the developer landscape. It is a tool embedded inside the development environment.

In 2022, GitHub Copilot Chat was released. It unlocked the power of natural language in coding, debugging and testing. It allows developers to converse with their code in real time.

Copilot allows a new way of building software with natural language. It is expressly designed to deliver developer creativity. It is faster and easier. Developers can act as systems thinkers. They lower the barrier to entry to software.

Developers get started on a task. It is the most challenging aspect. It reduces the cognitive burden. Copilot workspace then serves as an AI thought partner. GitHub has made coding a lot easier.

Google’s AI-Powered Search

There is competition between OpenAI and other competitors with Google to bring generative AI to the search engine. Googling, as it is popularly known, will be supercharged with Gemini. This was announced at Google’s annual developer conference, 2024 at Mountain View, California. Sunder Pichai called this fully revamped, new search experience. It is going to be rolled out for US users this week and will come to other countries soon.

There will be a major change– some searches will come out with AI Overviews. It is a more narrative response that spares people the task of clicking through various links.

There is a search bar. An AI-powered panel will appear underneath. It will present summaries drawn from Google search results.

There will also be AI-organized page that groups results by theme or presents, say a day-to-day plan for people turning to Google for specific tasks.

Google has ruled the market ever since its founding in 1998 because of its superior algorithm. It surpassed Yahoo! and dominated the market inviting anti-trust suit.

These days the online search is basically changing. Rivals are encroaching on Google’s turf. ChatGPT and Claude chatbots are easier to use and have been welcomed with open arms. These are threats to Google’s position and could affect the entire business of Google.

On May 13, 2024, OpenAI announced GPT-4o to power its chatbot. People can ask verbally, or show an image, and the chatbot will respond in milliseconds. On May 14, 2024, Google released its new search engine. It has now to balance its search advertising business and yet show it has stood its ground. It is trying to differentiate it from the rivals. Google is trying to translate AI innovations into profitable products and services at scale.

Last year (2023), Google had a search advertising business of $175 billion. Generative AI-powered searches will require more computing power than producing a list of links. It will affect the margins. It is trying to bring down the costs of generative AI search.

Searching also takes a lot of hard work. The company has to reduce that hard work. AI-powered Google search will be able to process billions of queries.

However, if AI Overview fully addresses the queries of users, people may click on fewer ads. It is like rocking the boat too hard. The websites rely on search giant to draw traffic. There could be fewer visitors on account of the changes.

Google is sourcing information directly in search results. There is so-called ‘zero click’ effect. Users obtain the information they are seeking without click through to the source. It is a blow to the web publishers. In competing with AI tools such as ChatGPT, Google deprives publishers of the traffic. They get deprioritized.

Google disagrees and says AI Overviews included links to get more clicks. Google will monitor the effect of AI Overviews on traffic.

Just now internet has become a mess because there is scrambling for Google’s rank. There are tricks to give the content a best shot to rise to the top. This is necessary for survival. AI Overviews will clean up the mess.

New ChatGPT Based On GPT-4o

OpenAI was to hold a livestream conference on Monday,13th May 2024 at 1700 GMT to demo some updates to ChatGPT and GPT-4. This was announced on X. OpenAI was planning to do this before the Google’s I/O development conference.

However, OpenAI decided to delay this. As we know ChatGPT wowed the world with its human-like written content and top-notch software code. Soon after its advent in late November 2022, it became the fastest application to reach 100 million active users. However, the traffic to ChatGPT fluctuates and is now again returning to its May 2023 peak. OpenAI has to expand the user base of ChatGPT, its flagship product.

GPT-4 will now be called GPT-4o, when 0 indicates its omnichannel or multi-modal.

OpenAI’s ChatGPT can receive and respond to voice commands, images and videos. GPT-4o juggles audio, images and videos faster than previous version of technology.

The app is available, free of charge, for both smart phones and desk top computers. It is a way to look at future of interaction between humans and matachins.

It is a way to combine chatbots with voice assistants, e.g. Gemini chatbot with Google Assistant. OpenAI will share the technology with users over the coming weeks.

The new app cannot generate video but can generate still images that represent frames of video.

ChatGPT has already demonstrated that machines can handle requests more like people. Previously, it was a patchwork of three different AI technologies — one that converted voice to text, one that generated a text response and one that converted this text into a synthetic voice. The new app based on a single AI technology GPT-4o can do all this. It is a more efficient technology. The company can afford to offer it to users for free.

Generative AI in Education

LLMs are in a position to provide answers to research questions on varied subjects and create images if necessary. ChatGPT and its other counterparts have fared well at various examinations. They are wonderful tutors. They can change the way the syllabi are designed and the students learn and get assessed.

The big question, however, is whether generative AI can change the way the education has been imparted for the last so many years. Though it can, we have to wait for this to happen. There are many rough edges to iron out, and guardrails must be set up.

Gen AI is being used to create content, to learn about any topic and to stimulate creativity. Generative AI assist in brainstorming and offering feedback. These tools help teachers to personalize education.

Generative AI pose certain issues — there could be lack of coherence, accuracy and reliability. It could not be used for critical tasks. The models do hallucinate while giving answers confidently. There are issues of model bias. There are copyright issues. There are privacy issues.

Besides, LLMs are expensive. There is requirement of GPUs for processing large volume of data. It requires intensive training. It is to be seen how academia will be able to bear the costs.

There could be a division of digital haves and have nots. It is an issue of access.

It is necessary to tread cautiously.

AI Implications

Madhumita Murgia has written a book Code Independent exploring the grey and murkier areas of AI. She tells the story of data labelers of Kenya who contribute their labour to train the algorithms of self -driving cars. There is a story of a mother whose child is added to the list of potential criminals. The addition is not by a human agency but a machine that uses facial recognition technology with all its racial bias. The book goes beyond Silicon Valley. The book points out the global north-south divide.

Big Tech elite in Silicon Valley are paid hefty sums of money while the mundane monotonous repetitive task is outsourced to the poor populace of the developing world. It is referred to as data colonialism. This term was first used by academics Nick Couldry and Ulises A. Mejas.

AI jobs in Kenya and Philippines are handled by data labelers who could afford school education for their children out of their earnings for the first time or could avail of healthcare for their parents. They are doing digital jobs (maybe better than physical jobs in not so conducive conditions). The governments see these as employment generation. AI should bring prosperity to everyone, but there is no such thing happening in global south. AI still is a minimum-wage job. The workers sign an NDA : no-disclosure agreement. They cannot talk about all this to anyone, cannot unionize, and at times do not know who their employer is.

There is hype around AI as companies feel FOMO — fear of missing out.

People should understand the implications of this technology. They should understand the dynamics of change in the tech industry. Third, people should understand how AI affects other industries.