Creativity, Data and Technology

Creativity was once considered to be the sine qua non for advertising. The advent of digital advertising changed the perspective. The idea now is the to keep the message rolling across the media so as to have as wider a reach as possible. This puts creativity in the backseat. Acceptable creativity is now average creativity. However, one must try to have an effective amalgam of creativity, data and technology. These three put together will produce impactful and real campaigns. The work created must solve real problems. It should stir a whole mass of people. This unlocks the true power of creativity.

Gemini — A Giant Stride or a Leap of Faith

On December 6, 2023, Google launched its most powerful AI model — Gemini, calling it a huge leap upward in AI model.

Gemini has three versions — Pro, Ultra and Nano. Nano, as the name itself indicates is the lighter version. It runs natively and off-line on Android devices. Gemini Pro is the heavier version that is expected to power Google’s AI services. It will be the backbone of Bard, Google’s chatbot. Gemini Ultra is the most advanced LLM designed for data centers and enterprise applications. Though all these models currently support English, soon they will support other languages too.

Gemini will be integrated with Google’s search engine and Chrome browser, and other Google products.

In comparison to GPT-4, Gemini scored 90 per cent on the massive multi-task language understanding test. It surpassed human beings with a score of 89.8 per cent while GPT-4 had a score of 86.4 per cent.

However, the testing methods used are different. Google uses chain-of-thought with 32 samples, whereas GPT-4 uses 5-shot prompting technique to get the score.

Bard will benefit a great deal with Gemini Pro by getting the advanced reasoning capacity. Bard Advanced, a second version of Bard, will follow in 2024. It will access Gemini Ultra.

While doing all this, Google will have to take care of changing tech regulations and AI ethics and tackle problems such as LLM hallucinations.

Generative AI: What Next?

It was reported when Sam Altman was fired as CEO of OpenAI it was on the brink of a breakthrough — a new algorithm Q* which can solve math problems of high-school standards with great accuracy, though GPT-4 could do this with 70 per cent accuracy. Q*’s. perfect scores vested it with logical reasoning, thus deviating from identification and replication of patterns learnt during training.

If true, we are one step closer to what is being described as AGI — artificial general intelligence. In fact, here there is absorption, deciphering, and replication of various patterns learnt in the training phase, and in addition reasoning ability. This power could improve in subsequent iterations. AGI then could be equated with high intelligence.

AI, as we know it today, is narrow. Its algorithms are designed to perform a narrow range of tasks, though LLMs are more versatile. Generative AI is good at writing and language translation. It works by statistically predicting the next likely word, and by logging the contextual association of words to each other. Even while solving math or writing a code, they are working through statistical association. In order to solve novel problems of math, they must have greater reasoning capabilities.

Real AGI will perform a lot of tasks and tackle problems far better than humans can. By definition, AGI will perform new tasks without instructions. This model could be self-aware or conscious. It may possess traits such as curiosity, self-will or a desire for self-preservation. All these traits which we associate with the living beings.

Could such a model be ethical or altruistic? Such concepts have variations across the cultures. However, AI that is not aligned with the goals good for humans could be dangerous.

Sam uttered a sentence a day before he was fired — ‘push the veil of ignorance back and the frontier of discovery forward’. Was he hinting at Q*? Many such rumours float around.

Vinod Khosla on AI

We know investor and venture capitalist Vinod Khosla (68). He has also invested in AI startups including OpenAI. He had invested $50 million in OpenAI in 2019. He also poured funds into other startups like Replit as well. .He believes that we will have access to free professional services (medical, legal etc.) and human-like robots on account of AI.

He believes that in the next 10 years, the world will have free doctors, free tutors for everybody and free lawyers to access the legal system.

In the next 25 years, say by 2048, we will have a whole team of robots which will stand upright the way we do. These will be bipedal robots. It will form a large industry, just like automobile industry of today.

LLMs will have capabilities that have not so far been seen. We have not yet realized the limits of AI capability. There is a fear about AI turning sentient or robots turning conscious. All this is non-sensical. Just think positive how AI will benefit humanity. Why should we focus on dystopian angle of one per cent probability of something untoward happening?

The path to make lives better for 7 billion plus people on the earth runs through AI.

Perplexity AI

Several former Google AI researchers such as Andy, Srinivas, Yarats, Ho have founded a startup Perplexity AI which is responsible for chatbot Perplexity Co-pilot. This has the potential to be a market leader in web search combining web index and chatbot interface. It can affect the market leader Google’s position. It has also released its LLMs-pplx-7b-online and pplx-70b-online where the digits indicate their parameter sizes. They are fine-tuned and augmented versions of open-source models from Mistral and Mela. Parameters refer to the number of connections between model’s artificial neurons. They indicate how ‘intelligent’ and powerful the modes are. The higher the parameters, the more knowledgeable the model.

They have Perplexity API so that others can use and build their own apps.

They also offer helpful, factual and uptodate information.

Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent is an optimization technique mostly used in ML and other fields to minimize an objective function, say an error with respect to its parameters. This method is iterative — it starts with a starting judgement for the parameters and iteratively improves upon them till the minimum is reached.

Each iteration tries to calculate the gradient of the objective function with respect to its parameters. The gradient indicates the direction of the steepest ascent. SGD uses it to decide the direction in which to update its parameters.

In the updating of parameters a scaled version of the gradient is deducted from their current values. The scaling factor is called the learning rate which determines the size of the step taken along the gradient direction.

The above two steps are repeated till the objective function converges to a minimum or a stopping criterion is met.

SGD is faster than other optimization algorithms, especially while dealing with large datasets. The calculation is done only for a small sub-set of data points — a mini-batch in each iteration (and not the whole data set).

It is scalable since it can be extended to large and complex models with multiple parameters.

It is less sensitive to noise in the data.

SGD’s limitation is that it can converge to a local mininum of objective function instead of a global minimum. Another limitation is its tuning — choice of optimal learning rate and other hyperparameters. There is lot of experimentation.

To avoid local minima, SGD’s variant momentum is used. Adagrad variant adjusts the learning rate for each parameter individually to improve convergence.

SGD is used in machine learning (ML) and neural networks and in optimization of signal processing and finance.

SGD is a variant of the gradient descent algorithm. Unlike standard gradient descent, SGD uses only a small batch of data points (mini-batch) to estimate gradient instead of the entire dataset. It is stochastic and hence more efficient and scalable.

It can perform regularization to prevent overfitting.

It is stochastic since it refers to the randomness of mini-batch updates chosen randomly and uses a learning rate schedule where the rate decreases over time. It uses an estimate of the true gradient rather than the actual gradient.

Randomness allows it explore parameter space more effectively. Otherwise, the algorithm follows a set of rules.

AI in Consultancy and Law Firms

Big consultancy firms and law firms hire interns who do the taxing work of repetitive and time-consuming tasks for the first few years of their jobs. At PwC, these juniors spend time preparing meeting documents for the clients. Junior lawyers keep interpreting complex contracts, which their seniors handle.

Just three or four years later, they reach the prestigious partner level status. Artificial intelligence speeds up time it takes to reach the partner level. It is good to learn by doing some of these documents, but should you do it for two-three years? Probably, not. Just do it two or three times and you are comfortable.

The use of AI will bring about a seismic shift for professional services firms who subject their juniors to years of tedious work before making them partners.

The partner title entitles you to bigger client assignments and lucrative emoluments.

GPT-4 and Radiology

GPT-4 has one important application — processing of medical images. These images range from X-rays to MRIs. GPT-4 prepares a summary of the reports. Some of these summaries are preferable to those written by expert radiologists.

GPT-4, as we know, is multi-modal. The application of image reading is now available for both Android and iOS operating systems.

It is necessary to scan the reports. GPT-4 algorithm is used to interpret them. The report generated carries summary, diagnosis and medication suggestions.

Medical practitioners can use ChatGPT to get these services through API.

GPt-4 structures the reports automatically. Most of the documents, say clinical history of the patient and radiological interpretation of medical images remain unstructured. This makes interpretation difficult.

If these reports are organized, they become easily searchable. It gathers real-world data (RWD) and real-world evidence. (RWE). It facilitates clinical trials.

Microsoft is one company that uses generative AI in radiology. Other companies too are joining.

GPT-4 can serve as a valuable assistant to radiologists. It does not supplant human judgement but supplements it. Radiologists must verify these reports.

There is dearth of radiologists — there are 20000 radiologists for a population of 1.4 billion people. It means one radiologist per 1 lac people. It is a below par ratio. GPT-4 is thus a boon for the field of radiology.

AI Models with Planning and Strategizing Capacity

As we have observed, Google has launched in December 2023 its Gemini multi-modal model. Already Google had created a model that could beat champion Go players. Gemini will not only generate text and images but will be able to do some planning and stratigising. Gemini will use some of those skills for problem solving.

On the other hand, we have heard about Q* of OpenAI. Gemini will compete with ChatGPT. Q* can perform, it is said, grade-school math. OpenAI is thus pushing ChatGPT in the direction of Gemini. They are combining math capabilities with software, which can generate text and images. This is unique. This process resembles the thinking and problem-solving process of the human beings.

Such models can be asked to perform tasks like marketing research for a new product. They will come back with market analysis and additional ideas. Maybe, they require some hand-holding but they perform their responsibility. They do not remain limited to a task. Thus these models are capable of performing broader tasks, rather than just single tasks. This does affect the job market.

Companies have just a handful of foundational models to choose from — OpenAI’s ChatGPT, Google’s Bard or Gemini or Amazon’s model.

There is one disturbing factor — these models have entrenched bias towards people with disabilities and racial minorities. Some of their operations are inscrutable — they have a black box.

Microsoft 365 Co-pilot will be used by 7 million knowledge workers and Google’s Duet AI by 3 billion users from enterprise platform Workspace. (Forrester Research).

We now know the way things are moving. It could bring about disruption. Models with planning and strategizing capacity should be used with caution.

Training Large Language Models (LLMs)

LLMs are trained on a massive amount of data in pre-training. It is unlabelled text data, e.g. web pages, articles, books. It is unsupervised training. The idea is to make the model learn the statistical patterns and structure of language.

The most common pre-training exercise is the prediction of the next word. Here the LLM is given a sequence of words and is asked to predict the next word in the sequence. This actually teaches LLM to learn the relationships between words, and how they are used in different contexts. Alternatively, certain words are masked in the sequence, and the LLM has to predict these words.

After pre-training, the LLMs are time tuned for a specific task, e.g. translation from one language to another or answering a question. Here such labelled data specific to the task is used to train the LLM. It is supervised learning. Here the LLM learns task-specific patterns and relationships between the words.

In both pre-training and fine tuning, forward pass and backpropagation are used. In forward pass, the input data is fed into LLM and the output is computed. The data passes through layers of neurons. Weights and activation function is applied to each neuron. The output of the forward pass is LLM’s prediction.

Backward propagation is the process of using error between LLM’s prediction and the true label to adjust LLMs weights. This is accomplished by computing the gradient of the error with respect to the weights, and then using the gradient to update the weights in a way that reduces the error.

In an LLM’s training, both forward pass and back propagation are important. The forward pass allows it to make predictions. and backpropagation allows it to learn from its mistakes and improve its predictions over a period of time. In other words, it improves its accuracy.

This training process is iterative. In each iteration the LLM receives batch of data, and then the forward and backpropagation processes are applied. The weights are updated in the light of the results of backpropagation process. Then the next iteration begins.

First, the training data is prepared. There is cleaning of data, removal of noise and tokenization of data. The model weights are initialized. It is done randomly or by using pretrained weights from another model. Now the batch of data is fed to the model. In forward pass, it passes through layers of neurons. The LLMs weights and activation functions are applied to each neuron. The output of the forward pass is LLMs prediction for the input data. At this stage, loss is calculated between LLM’s prediction and the true label (desired loss). This loss actually indicates how bad the LLMs prediction was.

In backward propagation, the loss is used to compute the gradient of the loss with respect to LLMs weights. The gradient is used to update the weights so as to reduce the loss.

The process of forward pass, loss calculation and backward propagation is repeated till the LLM has learned to make the accurate predictions for the training data. Later it is fine tuned for specific tasks on smaller amount of data specific to the task.

This process is continuous, since predictions are constantly updated by receiving new input data. In the further iterations, LLM uses predicted word, and the current context to generate the next prediction. It learns range dependencies in the language and make more accurate predictions.

Geoffrey Hinton, a British computer scientist is known for his work on developing backpropagation. LeCun, a French computer scientist pioneered CNNs well suited for image recognition and NLP tasks including machine translation and question answering. Bengio, a Canadian scientist made significant contributions for training neural networks. Nitish Srivastava, a Canadian scientist is known for his regularization technique — dropout — which prevents overfitting. Thomas Mikola is known for his work on word2vec. He is a Czech scientist. Cho, a Korean, working for NY University is known for his work on RNNs. They are well-suited for sequential data and are used for NLP including machine translation and speech recognition.