Transformer Alternative: State-Space-Model (SSM)

Since 2017, when Vaswani et al from Google published Attention Is All That You Need paper, transformer architecture has been used in large language models (LLMs).

Of late, there is emergence of non-attention architecture for language modelling, e.g. Mamba which shows promising results in various experiments.

In fact, Mamba belongs to state-space model (SSM) which is a mathematical model used to describe the evolution of a system over a period of time.

The key concepts of SSMs are State Variables (x) representing the internal state of the system, State Equation showing how the state variables change over time, both in continuous time and discrete time, Output Equation shows observed outputs of the system to its internal state. Matrices A, B, C, and D which are parameters of the SSM (A represents system dynamics, B the input matrix, C the output matrix and D the feed forward matrix).

The SSMs are designed as linear time invariant systems where A, B, C and D are constants and system’s behaviour is linear.

SSMs are formulated both in continuous time (using differential equations) and discrete time (using difference equations).

Mamba serves as a versatile sequence model foundation. Mamba-3B model surpasses similarly sized transformers and competes on par with Transformers twice its size.

SSMs offer a different lens on sequence modelling. As SSMs focus on internal state evolving over time (hidden dynamics), It captures long-range dependencies and context effectively.

As we know, LLMs based on attention show Blackbox effect. SSMs provide structured representation of the system. These provide greater interpretability.

Mamba as an SSM shows more computational efficiency compared to transformers.

The most impressive thing about transformers is their impressive expressive power. They excel at capturing intricate relationship. They can generate diverse output.

SSMs may require more data for training as compared to transformers. (They have to master both state transitions and observation equations).

SSMs are appealing theoretically but their implementation and optimization may be more complex than transformers.

Though a promising development, their working requires further exploration and comparison with transformers. Both these architectures may possibly evolve and co-exist serving different needs and domains.

Young AI Professionals

If an IT or AI company employs young manpower under 24, it has a good opportunity to mould them the way they want. They can be trained in those sunrise areas which are in demand. They can be made to imbibe the organization culture (OC) and values. Organizations hire candidates with experience and higher age groups. They could be in their thirties, or forties or even fifties. Even when youngsters are hired, they are five and six levels below the CEO. Since they are far removed from the top management, they do not learn about their thinking and the road map they have in mind for the organization. They do not get mentored by the seniors. They remain fuzzy about organizational policies and the rationale behind these policies. They may not get to meet any of the top management and middle management personnel. It is necessary to think of organization structure when this happens.

Meta to Join OpenAI and Google for AGI

It is reported that Meta will join OpenAI and Google in open-source AI projects to accelerate the march towards AGI or artificial general intelligence. It is very encouraging to know this.

This collaboration will pool the resources, talent and data together facilitating the rapid strides to the goal. There could be faster breakthroughs. It will foster public trust in AI development. Open sourcing will democratize the effort, even individuals and startups could benefit from it.

Of course, there are issues of the loss of IP or intellectual property. There are further issues of coordination, as three titans are coming together. There could be ethical issues, say misuse of the powerful AI technologies.

NYT Suit and Sam Altman

At an event organised by Bloomberg, at Davos (2024), Sam Altman says he was surprised by the NYT suit against Microsoft and OpenAI for copyright violations on the basis of certain examples where the model regurgitates verbatim their copyright material which could have been used for training the model.

He says that a single source of training does not move the needle for OpenAI that much.

Regurgitation, he says, is a rare thing. They keep on working on this to bring this to zero.

Training is for fair use and they collaborate with news organisations and create opportunities for them to monetize their content. In fact, they were in negotiations with OpenAI before the filing of lawsuit.

GitHub Copilot: Over a Million Paid Users

Microsoft’s search engine Bing is integrated to OpenAI’s ChatGPT and that results into 1.9 billion plus chats. Microsoft expects big growth in generative AI business.

GitHub Co-pilot software usage is going up. It has now over 1 million plus paid users of Copilot. There has been a recent addition to Copilot — Copilot Chat. It has been used by digital natives as well as by leading enterprises to improve the productivity of their developers.

Microsoft’s Edge browser is also doing well. It is also supported by DAU-E3 to get more relevant answers and more realistic images.

It is expected that the $10 billion investment of Microsoft in OpenAI may rise in worth to $100 billion.

The company’s cloud business — Intelligent Cloud — is also doing well.

MIMO (Multiple Input, Multiple Output)

MIMO (multiple input, multiple output) technology for wireless communication uses multiple antennas at both the source (transmitter) and the destination (receiver). These antennas at both ends of the circuit minimize errors, optimize data speed and improve the capacity of radio transmissions by enabling data to travel over many signal paths at the same time.

Prof. Arogyaswami Paulraj, a former Indian navy technologist shifted to Stanford University and pioneered the MIMO concept in 1992. The technology got traction in 1998 and attracted further research and development.

Prior to MIMO, the data rates could be increased either by more bandwidth of spectrum or higher transmit power. Both these were constraints and MIMO overcome these.

Prof. Arogyaswami has been awarded this year’s Faraday Medal. In past, he had also received Alexander Graham Bell Medal (2011) and the Marconi Prize (2014).

The Indian government has awarded him Padma Bhushan in 2010.

Several billion mobile phones the world over use MIMO.

History of Artificial Intelligence: AI

In ancient times, craftsmen imagined that artificial objects had been endowed with artificial intelligence. The genesis of artificial intelligence can be traced to the philosophical thinking that intelligence emerges from mechanical manipulation of symbols. This led to the invention of a digital computer in 1940s. It was programmable and was based on abstract mathematical reasoning. Many with fertile imagination thought of developing an electronic brain in future inspired by the computer.

Alan Turing took the lead in doing research in the field. He called this Machine Intelligence. It was in 1956 that the term artificial intelligence was first used at a workshop held at Dartmouth College, US. It inspired artificial intelligence research for decades. There was a prediction that a machine that equals human intelligence would emerge soon, and more money was poured into research.

The project was not so easy. There were funding problems in the 1970s Those years were called the years of AI Winter. There was a silver lining in early 1980s when Japanese government inspired other governments to fund artificial intelligence. However, by late 1980s, funding again dried up.

AI bloomed in 2020s after machine learning showed the potential to be useful in many fields. New methods, powerful hardware and availability of big data — all this was conducive to the development of artificial intelligence. In the 21st century, the study of mathematical logic provided the necessary breakthrough to make AI a reality.

Recent research has been inspired by neurology which has shown that human brain is an electric network of neurons which are fired by pulses. In 1943, Pitts and McCulloch networks of idealized artificial neurons, showed how they can perform certain logical functions. They in fact first described a neural network. Marvin Minsky was inspired by Pitts and McCulloch in 1951. He built the first neural net machine in 1951.

Turing used the term ‘machine intelligence’. The same was later called ‘artificial intelligence’ after his death (1954). In 1955, Herbert Simon created Logic Theorist. Simon worked on the body-mind problem and claimed he has solved it. A system consisting of matter can acquire properties of mind.

As such, artificial intelligence (AI) was formally introduced by John McCarthy in 1956 during Dartmouth Workshop.

Inspired by McCulloch and Pitts (1944) paper, neural networks were translated into hardware. Perception machines (1957-1962) were built. MINOS was built by Alfred Brain in 1960. Though multi-layered neural networks emerged, most had only one layer of adjustable weights.

Back propagation emerged in neural network training in 1980s.

AI research led to the emergence of communication with computers in natural language. Joseph’s Eliza could carry out communications as if there is interaction with a human being.

Corporates boarded the bandwagon of AI in the 1980s. Many expert systems answering questions about a domain of knowledge were developed.

Geoffrey Hinton and David Rumehart in early eighties popularized a method for training neural networks called backpropagation. In 1986, Rumehart and McClelland published Parallel Distributed Processing. That provided new momentum to neural network research.

Between 1993 and 2011, AI came to be established. In 1997, Deep Blue computer defeated Gary Kasparov in chess. Computer speed and capacity increased in 1990s. The concept of intelligent agent which perceives the environment to maximize the chances of success came into vogue. Probability and decision theory were brought into AI by Judea Pearl. Math concepts like Markov models, stochastic modelling and classical optimization became handy for this field.

AI was found useful in robotics, logistics, speech recognition, banking software, medical diagnosis and search engines.

However, all this was attributed to advances in computer science.

At the beginning of new millennium, big data emerged. There was faster computing. And there were advanced machine learning techniques.

By 2016, AI came to be recognized as a distinct market. There were advances in deep learning, especially convolutional neural networks and recurrent neural networks (CNNs and RNNs).

In 2017, Google researchers (Vasawani et al) proposed transformer architecture. It exploits attention mechanism. This led to the large language models (LLMs).

Foundation models are LLMs trained on vast quantities of unlabeled data (2018).

OpenAI released GPT-3 in 2020. In 2023, Microsoft tested GPT-4, an early version of artificial general intelligence.

Ferret: Multi-modal LLM from Apple

In October, 2023, Apple in collaboration with researchers of Columbia University released an open source multi-modal LLM called Ferret. It was released on 30th October 2023 without any commercial license.

Apple is secretive as well as protects its proprietary systems. However, it has released an open-source model. The model can run on small devices such as iPhones and iPads. It is released under non-commercial licence.

The model is powered by 8 Nvidia A 100 chips. It is trained on GRIT dataset. Whether the model would be able to compete with larger models such as GPT-4 is to be seen.

However, this introduction is paradigm shift in Apple’s AI strategy. Open-source invites collaboration and innovation. It departs from its traditional closed-door approach.

Apple is in its early stages of generative AI journey with Ferret. Mobile handsets have a limited capacity to handle models, say they can handle models with 10 billion parameters. Apple researchers have made a breakthrough — a smartphone can be supplemented with RAM with onboard flash storage. It is a small cache for LLM data.

Ferret identifies elements within an image. Beyond that, it answers user’s queries. There are possibilities of image search. It is spatial-aware. The model is capable to analyze and interpret images and text together. It is much like a smart assistant. It can look at pictures and read descriptions.

Open sourcing is a strategic move. Apple can tap global AI talent. It can accelerate Ferret’s capabilities ahead of its rivals such as Google and Microsoft. Of course, the challenge is to remain protective about its IP.

AI Pioneers on X-Risk of AI

There are online discussions amongst the pioneers of AI — Geoffrey Hinton, Andrew Ng, Yann Lecun and YoushuA Bengio — about the existential risks, also called x-risks, posed by AI. Both Hinton and Bengio are concerned about AI’s existential risks, whereas Ng and LeCun do not take this seriously.

When deep learning revolution initiated in 2012, there was unity amongst the researchers. Later, after leaving Google, Hinton was outspoken about the risks of AI. Ng did pioneer work in image recognition after co-founding Google Brain in 2011. Though he was concerned about AI risks, he felt that these were over-hyped. Lecun and Ng say tech leaders are exaggerating existential risks.

Hinton, Bengio and Lecun joined 22 other academics and experts to propose a framework for policy and governance to address the growing risks of AI.

These pioneers are not in agreement and are worried about AI risks, but that has not loosened their friendship a bit. People disagree, but still can remain friends.

Rollout of Generative AI

The year 2024 will be the ‘year of AI and Generative AI’ and will put even 2023 in the shade. Big players in AI such as OpenAI, Microsoft, Google, Meta and well-funded startups such as Anthropic and Inflection will put newer AI models. There could be GPT-4.5 or GPT-5. Large Language Models will become Large Multi-Modal Models (LMMs). At the same time, there would be SLMs or small language models for specific tasks such as Gemini Nano or Microsoft Orca.

The year will mark the beginning of AI adoption. Early adoption could be seen in customer care, software development and marketing campaigns. The biggest opportunity is in banking. In insurance and capital market, there could be early adoption. Gen AI will also be implemented in retail, travel, health and energy sectors.

Generative AI is driving a technological revolution similar to the internet or mobile communications revolution. In the year 2024, there would be evidence that generative AI could transform knowledge work, business processes and supply chains. It will reshape industries, will enlarge human capabilities and creativity.

Telecommunications, e-commerce, public services will adopt generative AI. There would be changes in enterprise resource planning, human capital management advertising and infrastructure.

In edtech, generative AI can produce content. The administrative work can be automated, e.g. student entry, verification, record accuracy etc.

In food tech, generative AI can build personalized menu, can recommend food choices, can do demand forecasting and can forecast food wastage.

Early adopters will leverage the cloud technology both for operational efficiency and strategic growth.

If 2023 was the year of experimentation, 2024 will be the year of rollout and implementation.

Gen AI poses certain challenges — AI bias, IP rights, inherited risks from training datasets, AI hallucinations among others. These will have to be addressed while deploying generative AI.