LLEMMA — LLM for Maths

University researchers and Eleuther AI introduce LLEMMA as an LLM for problems in mathematics. It is an open source model. It surpasses Google’s Minerva in performance.

It is based on code Llma, adopted after Facebook’s Llma 2 model duly fine-tuned. There are two versions of the model — one with 7 billion parameters and the other with 34 billion parameters. The models are fine-tuned on Proof-Pile-2, a set of scientific papers, web-data featuring maths and mathematical code.

LLEMMA is pre-trained on diverse maths data. It can use tools and can prove theorems without additional fine-tuning.

It can leverage Python interpreter and formal theorem provers to solve maths problems.

Google’s Minerva is not open source model. It is based on PaLM model.

This is a subject-specific model and not a general model.

Whether LLMs are suitable for problem solving is a matter of debate. Some scientists argue that LLMs are stochastic in nature and not suitable for math. Training data includes benchmark examples. There are efforts towards enhancing the reasoning and planning capabilities of language models. Maybe, they are not ultimate tools, but are the first step for further research for other types of models.

QR Codes

Masahiro Hara, a Japanese engineer invented the QR code or quick response code. He did so while working for a Japanese automotive technology company Denso. To begin with, the company used the invention for inventory management. Although Hara retained the patent for the code, he did not exercise his rights as a patent owner. He wanted the code to be used by as many people as possible. The usage did not attract any charges. The code thus became popular all over the world.

In 2002, the code gained acceptance across industries in Japan, and since then it has spread all over the world.

The information carried by the code was initially read by an electronic reader. Later, dedicated apps downloaded on devices could read the code. The reading became a child’s play with mobile cameras scanned and read the code. During the pandemic, it provided a convenient mode for payment. Post-pandemic the use increased many fold as it was convenient, contactless payments payment mode.

QR code is now combined with AI. It makes it more useful and easy to adopt. AI makes it secure and improves images. It can personalise codes.

Smart computer vision (CV) algorithms help to identify and locate a QR code within a larger image.

In retail, healthcare, consumer goods, it has emerged as a convenient mode of payment.

In India, the National Payments corporation of India in collaboration with International Card Schemes (ICS) has developed a common standard to launch Bharat QR. This is a digital payment mechanism used by merchants and e-commerce portals. QR code is likely to evolve further. Security aspects will be addressed. There could be micro-codes in future not visible to the human eye. There could be addition of colours to accommodate more data.

Machines with Souls : A Reality or an Oxymoron

It is believed that generative AI soon will lead to artificial general intelligence (AGI). It will so happen that the future artificial intelligence models will need lesser data for training. Instead, these models will focus more on reasoning abilities. In other words, AI will resemble human intellect based on logic and intuition. The changed AI will have components adaptability and common sense. Sam Altman calls it a system that can generalize across many domains.

As we advance in research, the gap between what AI and humans can do is narrowing.

LLMs are not the last word. They are just stepping stones. They will acquire intuitive understanding in future. AI will master abstraction and will hold opinions about men and matters

It was so far thought that intelligence is backed by a soul, by a consciousness. Still, the chatbots show sparks of intelligence.

Since data privacy is being valued in modern age, the new platforms of AI trained on massive data are under scrutiny.

Still the potential of AI raises hopes. AGI will create a symbiotic world of humans and machines. AGI will be the ultimate tool created by humanity. Some researchers are optimistic. Some take a dismal view, e.g. Paul Christiano and Geoffrey Hinton.

As AI advances by leaps and bounds, there are ethical and philosophical issues. Altman expects AI to be a collective endeavour with participation of various stakeholders. Of course, the philosophical issues are weighty enough, because these machines might overshadow biological intelligence. Machines may make the minds obsolete.

GPT-4 has cognition resembling human beings. The real issue is ‘ghost’ inhabiting the machine and the way the machines are becoming soulful.

Happy Vijaya Dashmi, 2023. AI Factories

As we know Apple’s iPhones were manufactured in China by a company called Foxconn. We also know that the chips that power generative AI are made by American company Nvidia. Both these companies Foxconn and Nvidia would like to tie up to create ‘AI factories’. These factories will be powerful data processing centres. They will drive the next generation products such as electric cars. The processing centres will facilitate faster movement to the new AI era. It would include digitalisation of manufacturing and inspection workflows, AI-powered electric vehicle and robotics. They will also facilitate language-based generative AI services.

Adobe’s New AI Tools

Adobe, famous for its Photoshop, has entered the AI race earlier in 2023 by introducing a new family of generative AI tools.

Firefly that creates images, text effects, audio, vectors and 3D for users is Adobe’s AI tool. With other tools such as Midjourney , DALL-E and Stable Diffusion, Adobe has integrated AI into its mainline products. It ensures that this integration fuels creativity and delivers power and precision.

People have generated more than 3 billion images through Adobe Firefly since it was announced early in 2023. Adobe is taking measured steps to roll out AI features across its popular products and is engaging actively with the creative community with beta releases, hoping to get feedback to make the future versions better.

Firefly is based on the content within Adobe’s stock library. It is publicly licensed content and public domain content for which license has lapsed.

Adobe Firefly is generative AI imaging tool. Adobe has also added features to Illustrator and Adobe Express. Adobe has improved Photoshop’s text-to-image capabilities.

Adobe announced all this at the annual MAX event in Los Angelas, California. The keynote was held in the Peacock Theatre.

Co-pilot and Microsoft

Microsoft started through the lens of how people really want to interact with the emerging technology of AI. They thought of this concept of a co-pilot. The first one was the GitHub co-pilot. It was a programming partner for the developers. It helped them to write code more efficiently and effectively. It should be noted that co-pilot is not an auto-pilot. It keeps humans at the centre.

Microsoft initiated being a co-pilot for the web. Microsoft 365 brings co-pilot in Teams and in Word. There is co-pilot in PowerPoint and Excel. There could be co-pilot in security.

The intent is to have co-pilots in products across the spectrum, whether in Windows or Microsoft 365.

Microsoft had an AI layer with Cortana for a few years. The present rendezvous with AI is not sudden. It is a result of 10 plus years of work and research. There was AI in Word as autocorrect. There was designer in PowerPoint. However, the LLMs are more powerful and have given us the new interface. It has enhanced our ability to talk to the computer. The question-answer sessions were not possible in past.

Humans are well-versed in asking questions. Computers have not been great at answering those questions. The search results make you hop from one result to another. However, we can ask specific questions and if the answer does not satisfy us, we can ask follow up questions. The same thing is possible with a Word document. It has become a natural process.

Microsoft is a tool company. It is a platform company. Microsoft has a security co-pilot allowing security researchers to move at machine speed.

Microsoft has evolved and published two versions of responsible AI standard. Technology is both a tool and a weapon.

These are the views of Frank X Shaw, chief communication officer, Microsoft. These have been paraphrased here for the benefit of the readers.

Biased Bots

In recruitment and selection, employers in countries such as the US use some form of AI to screen and rank the candidates for hiring. Several black candidates observed a bias against them by the AI algorithms. There was also bias seen against disabled and over the 40 candidates. One algorithm discriminated against the CVs where the word ‘women’s’ occurred.

Many of these AI tools have been proven to be unduly invasive of the workers’ privacy and discriminate against women, people with disabilities and people of colour.

The federal agencies are working at potential discrimination arising from datasets that train the AI systems and the opaque ‘blackbox’ models that make it difficult to exercise anti-bias diligence.

Is this ‘responsible AI’? Can we indulge in automation in the recruitment and selection market without any restriction? The issue is how to regulate the use of AI in hiring and guard against algorithmic bias.

Implications of AI

As we already know, three AI scientists won the Turing Prize in 2019 — they were Geoffrey Hinton, Yaan Lecun and Bengio. The prize was given for their outstanding work in the areas of deep learning and neural networks.

Despite their collaboration, Bengio and Lecun hold different opinions on AI’s potential risks,

In October 2023, there was a debate about the potential risks of AI between Yann Lecun and Youshua Bengio. As we know Lecun is Facebook’s chief AI scientist. He rolled out the debate on his Facebook page addressing the silent majority of AI scientists to express their opinions about the reliability of AI. It gave rise to a lively discussion, eliciting comments from respected AI community.

Bengio from University of Montreal answered to Lecun’s post. He did not agree with Lecun’s perspective on AI safety. He advised prudence in designing AI systems. He was not in favour of open source AI systems. He compared them to the distribution of dangerous weapons freely.

Lecun focused on safe systems but advised avoiding the catastrophic scenarios. He feels there is enough funding to make AI safe and reliable. He does not agree with the comparison of open AI systems with the free distribution of dangerous weapons. He feels AI is to enhance human intelligence. It does not intend to cause any harm.

Eisner from Microsoft also contributed to the debate. He supported the weaponry analogy of Bengio. It was agreed that though there cannot be zero risk situation, the access could be restricted to minimize the harms.

AI debate has not remained restricted to academicians. It has invited attention of the thinkers and policy makers. With the fast advancing field of AI, there is a need for fruitful debate about the implications of AI.

Important Concepts of Transformer Architecture

Add and Normalize

In transformer architecture, we come a cross the term ‘add and normalize’. The first step is ‘add’ — a residual connection that adds input of the sublayer such as self-attention or feedforward network to its output. This prevents vanishing gradient problem and makes it possible for the model to learn deeper representations. The second stop is ‘normalize’ — there is normalization of the sublayer across the feature dimension. It stabilizes the training process and reduces dependency on initialization.

Multi-head Attention

Multi-head attention enables a neural network to learn different aspects of input sequence by applying multiple attention in parallel. The idea that works here is that different queries, keys and values can capture different semantic information from the same input. To illustrate, one attention head can focus on syntactic structure of the sentence, and other can focus on semantics of the words.

There are four steps in multi-head attention.

  1. To begin with, the input queries, keys and values are projected into h subspaces using linear transformations. Here h is the number of attention heads. Each subspace has a lower dimension than the original input space.

2. Secondly, each projected query, key and value are fed into a scaled dot-product attention function. It computes the attention weights and outputs for each subspace independently.

3. Thirdly, the outputs of h attention heads are concatenated and linearly transformed into the final output dimension.

4. Lastly, the final output is optionally passed through a layer of normalization and a feedforward network.

Multi-head attention has several advantages — it can learn more complex and diverse patterns from the input sequence by combining attention functions. It is cost effective. It improves memory usage by reducing dimensionality of each subspace. It makes the model robust by introducing more parameters. Multi-head attention can be implemented from scratch in TensorFlow and Keras.

Multi-head attention and Self-attention

These two are related concepts, and yet distinct in transformer architecture.

Attention is the ability of the network to attend to different parts of another sequence while making predictions.

Self-attention is the ability of the network to attend to different parts of the same sequence while making predictions.

Multi-head attention makes it possible for the neural network to learn different aspects of the input or output sequence by applying multiple attention functions in parallel.

Self-attention can capture long-range dependencies and contextual information from the input sequence. It can be combined with multi-head attention. It can be regularized by applying dropout or other methods to the attention weights. It reduces overfitting.

1

Private Large Language Models (LLMs) in Indian Banking

LLMs run generative AI applications such as ChatGPT. LLMs facilitate communications and provide information clarity. LLMs are cost-and-time-intensive to develop.

Banking leadership cannot be achieved solely on the basis deposit mobilisation and treasury operations. Technology is a vital ingredient that helps to build nonreplicable customer relationships. This builds the coveted competitive advantage.

HDFC Bank and its rival Axis Bank are contemplating the adoption of private LLMs trained on their internal data. LLMs make available generative AI to let the customers experience healthy customer interface and intuitive experience.

HDFC will launch a private LLM-powered website in next six months. Currently, the site is in beta stage. LLMs would provide an ability to convert buying through a lot or data points. Through simple prompts, a customer would quickly access information he is seeking regarding any product. Ultimately, a customer can get details of his bank account.

A private LLM model will be leveraged to write credit assessment reports, business requirement documents and so on.

Axis Bank is contemplating generative AI-based virtual assistants for customers. In case of operations, customers would be using inference capabilities to automate usage. They plan to use private LLMs for specific use cases by the end of 2024. They are engaging with cloud service providers (CSPs) and software-as-a-service (SaaS) providers to explore various options.