TensorFlow

Tensor Flow is an open-source ML platform (developed by Google) used for building and training various types of ML models including deep learning models. It is a flexible architecture that facilitates computation across a variety of platforms (desktops, servers, mobile and edge devices). It is used both by novices and professionals. It has rich resources and documentation.

To begin with, TensorFlow had a static computation graph. However, since TensorFlow 2.0, it has become dynamic like PyTorch. It is easier to use and debug now. Though PyTorch is considered user-friendly and a simpler API, TensorFlow has a steeper learning curve with a complex API and abstraction layers.

TensorFlow, of course, has a wider community of users — both industry and academia, since it is Google-backed. PyTorch is catching up.

TensorFlow has a mature ecosystem for deploying models in production. There are tools like TensorFlow Serving and TensorFlow Lite. They suit different environments (mobile and embedded devices). PyTorch is catching up.

Choosing between TensorFlow and PyTorch is a matter of personal choice.

Translation Platforms

In past, we had Google Translate to translate text from one language to another. Of course, the translation output was not as good as it is today when AI is used for translating. The translation was based on statistical techniques to detect patterns between the two languages.

After the arrival of neural networks, the translation has become neural. There is encoding of sequences as the input. Then there is decoding of these sequences as the output in the target language. The decoding happens as the model has been trained on vast amount of data of multiple languages. The model uses attention mechanism too. The model learns the nuances and dependencies of these languages and is able to translate even idiomatic sequences from the source language to the target language.

Many language models are able to translate in 100 plus languages. Neural translation has supplanted statistical translation.

India has set up Bhashini, a language translation and database platform. It is doing proof-of-concepts for multi-lingual call centers or IVR setups. It will shortly offer real-time language translation services on a paid basis.

An app will be launched to demo real-time translation in open format.

Bhashini has been developed by the Digital India Bhashini division under the Digital India Corporation as a section 8 company. It can do text-to-text translation in 22 languages. It has the capability to recognize automated speech, text-to-speech synthesis, OCR, video translation, document translation, language detection, voice-based payments, among others.

It also provides API (application programming interface) integration to startups.

It handles, at present, 40 million inferences per month. It means translations happening on the platform by users using its different features.

It also wants to facilitate e-commerce through ONDC.

Bhashini collects datasets from crowdsourcing model called Bhashadaan.

Medical Innovations

As we know, the premier technical institutes in the country such as IITs promote innovations by introducing an incubator programme. On similar lines, the country’s leading medical institutions such as AIIMs too have joined hands with young entrepreneurs to develop health-related products and software. These products could become useful to both doctors and patients. They should be scalable and feasible. AIIM shares its infrastructure (samples and patients) to encourage these startups.

At present, 10 such projects are progressing at AIIMs. The whole thing started in 2021. Some projects are awaiting validation by clinical trials. Some are waiting for regulatory approval (from CDCO). The startups are being mentored by AIIMs faculty. There are training programmes and bootcamps for these startups so that they develop patient-friendly products.

One such project is that of weight loss. They are developing a clinically validated app (Zeigen ObesityRx) for people struggling to lose weight. Current weight-loss apps do not target a user’s psychology and are focused on workouts and nutrition plans.

A stem-cell-based product is being developed for treating traumatic injuries and burn wounds. It will promote tissue regeneration (without reconstruction or plastic surgery). The stem-cell products are preserved through freezing and storage. These are not user-friendly. AIIMS products are derived from stem cells by products. (double-membrane-shaped vesicles measuring less than 200 nanometers). These are an alternative to stem cells and have similar properties. They can be developed at a lower cost.

The product is in the form of a powder which could be sprinkled over the wound so as to speed up its healing and regeneration.

They are also working on sprays and gels of stem cells.

The powder form can be reconstituted into an injectable liquid which can be introduced into the knee joint to treat arthritis. It treats the underlying cause of the disease. It has been tested on animals such as pigs. It is awaiting human clinical trials.

A gut microbiome is being tested to boost immunity. It will protect the heart and brain health.

Foundational Models

Foundational models refer to large language models (such as GPT series), BERT and others which are trained on a vast corpus of data and serve as the foundation for various NLP tasks (such as text generation, translation, sentiment analysis and more). They are the starting point for building specialized models for specific applications.

Some prominent scientists who contributed to NLP and ML research are:

Geoffrey Hinton: His work laid the foundation for modern deep learning techniques, including those used in foundational models.

Yoshua Bengio: His research has advanced our understanding of neural networks and their applications in NLP tasks.

Yann Lecun: His work on CNNs is well-known. It has applications in computer vision (CV). His contributions are useful in the development of foundational models.

Google Team, OpenAI Team: These teams played crucial roles in the development and advancement of foundational models such as GPT and BERT.

Gen AI: Biggest Human Invention

Jeff Maggioncalda, CEO, Coursera feels that generative AI is at the pinnacle of human inventions. As far as its impact on humanity is concerned, he rates it as high as language, alphabet or writing. Just as the ability to speak changed the course of human history, the generative AI too will do so provided humans master it.

If you know how to use it to your advantage, you will stand out. It is a tech disruption. Coursera runs a course for CXOs — Navigating Generative AI. Then there is an umbrella course on AI. These are massive open online courses (MOOCs). They are free but if you want certificate, you have to pay a nominal amount (a couple of thousand rupees). MOOCs can count as a credit for college degrees.

Coursera has used AI to translate courses in Hindi.

Learn from Silicon Valley’s Gurus

Organizations these days appoint Chief Experience Officers (CXOs) who manage overall customer experience of an organization. They encourage positive customer interactions. They have a background in operations, marketing, sales and customer service, and are often MBAs or have some other master’s degree.

As a CXO, you cannot ever stop learning. Learning is not just restricted to classrooms, but it is much more. CXOs must have international connections involving elite professionals. They must learn from industry leaders to push boundaries in their sectors. They must acquaint themselves with new business environments.

India and APAC, the high-quality education providers, would like to facilitate CXO education by starting a unique Global Executive Immersion programme for Indian CXOs in the Silicon Valley. It is a 6-day, 7-night programme.

The cost of the programme is $15000. Airfare and visa fees are on the participant. Silicon Valley is an ideal landscape for learning. It is the epicenter of technological innovation. It serves as the incubator of cutting-edge ideas and breakthrough advancements. It has the highest concentration of tech companies and Fortune 500 firms.

Participants will arrive in the Silicon Valley on April 26. There will be a welcome reception.

On April 27, there will be a panel discussion on AI. Participants will learn about investment trends. They will identify future growth areas.

On April 28, they will visit Stanford University Campus and learn about its legacy on the Silicon Valley and tech industry. There will be a classroom session on Design Thinking by Barry Katz.

On April 29, participants will converse with OpenAI’s mentor/advisor. There will be a tour of Berkeley Campus and a tour of Intel Museum.

In fact, Silicon Valley has been shaped by technological prowess and Intel Museum pays a tribute it. Intel’s journey has been depicted here — pioneering Intel 4004 to present day processors.

On April 30, there will be a tour of a search giant’s office. It will be followed by a visit to Apple Park visitor center.

The next two days will be spent on a guided tour of the first chip manufacturer in the world and conversation with Venture Capital. Finally, there will be conversation with OpenAI’s Zack Kass, former head of commercialization. There will be Computer History Museum tour.

On May 2, the programme will end with a gala evening of reflection, connection and global insights. There will be lavish dinner party set against the backdrop of Bay Area.

The programme can be attended by CEO and CXOs, MDs, presidents, founders and co-founders and partners with a minimum of 20 years of work experience.

Vector Representation of Words

Consider three words — cat, dog and bird. Each word can be represented by a numerical vector in a high-dimensional space. The vector captures three dimensions — x, y and z.

Cat could be represented by [0.8, 0.2, 0.5]

Dog could be represented by [0.7, 0.3, 0.6]

Bird could be represented by [0.3, 0.9, 0.2]

x, y and z could represent size, animal type and habitat (a different aspect of the word’s meaning or usage).

Algorithms analyze large amounts of text data and construct these word embeddings. These encode semantic and syntactic information about words.

These representations are standardized to a certain extent. However, there is no single standard. Word embeddings (Word2Vec, GloVe and FastText) are popular approaches for generating vector representations of words. These vectors are of fixed length. It facilitates standardization across words and models. What varies are the specific dimensions and values within these vectors, depending on the algorithm used, and the training data used.

Without using these algorithms too, words can be converted into vectors. The approach is called one-hot encoding. Here, each word in the vocabulary is represented as a vector where all elements are 0 except for the element corresponding to the index of that word in the vocabulary, which is 1. Let us consider a small vocabulary with three words : eat, dog and bird.

Cat could be represented as [1, 0, 0]

Dog could be represented as [ 0, 1, 0]

Bird could be represented as [ 0, 0, 1]

The vectors created are sparse vectors, where most elements are zero. However, one-hot encodings do not capture the semantic relationships between words (like embeddings do). They can result into very high-dimensional representations for large vocabularies.

Index here refers to its position in pre-defined vocabulary. Each word has a unique index. The element corresponding to the word’s index is set to 1. All other elements are set to 0.

In the three-word vocabulary of cat, dog and bird, let us consider the indices assigned.

Cat _> index 0

Dog _> index 1

Bird _> index 2

Cat has three elements in the vector. The element at index 0 (corresponding to cat ) would be set to 1.

The other elements would be set to 0.

Dog and Bird — 1 would move to their respective index. It is thus a binary vector with a single 1. (including the position of the world in the vocabulary).

We now know what a sparse vector is. Let us consider now a dense vector where most of its elements are non-zero. Dense vectors are typically used (rather than sparse vectors).

Dense vectors are often used in word embeddings. Each word is represented by a vector of real numbers (floats). It is in a continuous vector space. The real numbers capture nuanced relationship between words.

Each dimension of the vector might represent a different aspect of the word’s meaning or context. These are generally lower-dimensional (compared to one-hot encodings). They are computationally more efficient and are able to capture subtle semantic relationships between words.

Though the data stored in the hardware is in vectors (word embeddings), the answers to our prompts are in the natural text.

It involves a process of decoding these representations back into the natural language.

The input prompts is processed. It is converted into corresponding embeddings. These embedding go into the model. They are processed into layers (RNNS, transformers or other architectures). The model learns to generate text based on input embedding and context provided.

The output is a sequence of tokens or words embeddings. They are decoded back into text. It involves selecting the most probable word for each position in the sequence (based on the model’s learned probabilities in training). It can also use techniques such as beam search or sampling.

In post-processing, coherence and readability of the output is checked. Duplicate phrases are removed. Grammatical mistakes are corrected. The style is adjusted to match the input prompt or context.

Sampling Techniques in Generating Next Word

LLMs use sampling techniques to generate the next word. Some common sampling techniques used are:

Greedy Sampling Here the word with the highest probability is chosen. Though a very straightforward, it becomes repetitive and less diverse in output.

Top-k Sampling Here the word is selected from the top-k most likely words. It is a random technique. It ensures that words with higher probability are chosen.

First probabilities of all possible words in the vocabulary based on context are calculated. The probabilities are sorted and top k words with the highest probabilities are selected. From this reduced set of top k words, the model randomly selects one word to be the next word in the generated sequence.

There is a balance between selecting highly probable words (coherence) and some randomness (diversity). The value of k determines how many words are considered in this selection process.

Top-(nucleus) Sampling The word is selected from the smallest set of words. The cumulative probability of these words exceeds a threshold p. It dynamically adjusts the set of words to maintain diversity (based on changing probabilities.)

The smallest set of words is determined dynamically based on a cumulative probability threshold (denoted as p).

First, probabilities of all possible words based on context is determined. Ther are sorted in descending order. Starting from the word with the highest probability, the model calculates cumulative probability while iterating through the list of sorted words.

Once the cumulative probability exceeds the threshold p, the model stops considering additional words. The model selects from this sub-set of words whose cumulative probability exceeds p. It is a random selection based on original probabilities of these words.

Temperature Scaling SoftMax probabilities are adjusted before sampling to control randomness of the generated text. Lower temperatures lead to more deterministic outputs. Higher temperatures promote mor randomness.

These techniques achieve a balance between coherent responses and diversity in generated text.

Optimization of an LLM

A large language model’s efficiency, performance and scalability can be improved by using a suitable combination of the following strategies.

  1. Algorithmic improvements One can research and implement novel algorithms specially customized for optimizing LLMs.
  2. Architecture optimization A model’s architecture should be refined off and on to improve its performance and efficiency — experiment with different architectures, layer configurations, activation function etc.
  3. Hardware optimization Either use customized hardware or specialized hardware architectures which are optimized for deep learning tasks.
  4. Parameter tuning There are parameters such as learning rate, batch size, optimizer choice. These can be fine-tuned. It improves training efficiency and convergence speed.
  5. Quantization One It can reduce the precision of model’s weights and activations so as to decrease memory usage and speed up inference without sacrificing performance.
  6. Data augmentation A model can use synthetic training data. Or else one can apply techniques like dropout and regularization. It prevents overfitting and improves generation.
  7. Knowledge distillation A larger model is used to distill knowledge for a smaller model. It reduces the computational complexity.
  8. Pruning One can reduce redundant or less important connections in the model to shrink its size and computational cost, while preserving its performance.
  9. Parallelization Distributed computing frameworks are leveraged. Hardware accelerators such as GPUs and TPUs are used. It parallelizes training and inference tasks. It reduces execution time.
  10. Model compression Several techniques such as low rank factorization, weight sharing, or parameter tying are used to compress the model’s parameters and reduce its memory footprint.

Google in the AI Race

As we know, ChatGPT was launched by OpenAI in late November 2022. It was a pathbreaking event. Google was already testing generative AI for several months by then. All that led to various models emerging from different divisions within Google. None was good enough to excel GPT-4. Google deferred its plans to launch a rival model, while sorting out the scrambled research work. In the meanwhile, it released a chatbot called Bard. However, it was considered less sophisticated than ChatGPT.

A year later, Gemini was ready but some flaws were detected in it in image generation. The release was delayed, and the company could not seize the opportunity to be a leader in generative AI, the technology which got shaped in the lab of Google. (Ref. Attention Is All that You Need by Vaswani et al in 2017).

Google enjoyed leadership position in internet revolution with its search engine — Google Search. (late 1990s and early 2000s). Later Google diversified into mapping, email and more to become the most valuable company in 2016.

ChatGPT was another event after 25 years of Google’s launch. It was a tool to navigate the online information more creatively. Microsoft was determined to take advantage, while Google stumbled. Microsoft tied-up with OpenAI and funded its research. It embedded its existing products with AI. By doing so, it has become the most valuable company in the world.

After initial hiccups, Google is steadying itself. Gemini has become acceptable in tech circles. Google is thinking of adding paid generative AI services to its Search engines. However, Google so far has earned most of its revenues through advertising. It is struggling to become successful in generative AI space.

There are several flaws in Google’s strategy — there is lack of a corporate plan for rolling out generative AI. The company has fragmented organization structure. There could be simmering inter-departmental tensions. Google has to master execution of its strategies. These are, however, early days. Google is well-positioned to move ahead.

At present, the formidable image Google has in search engine space brings outsized attention to its minor flaws. Google is addressing cultural and organizational issues. Sheer size of Google also causes certain problems.

Google has merged two research divisions — DeepMind and Google Brain. It is now Google DeepMind. There should be coordination between the research teams.

The new technology could cannibalize its traditional search business. Generative AI is direct, and the search links displays require further effort. It has to protect its cash cow product.

Google cannot afford to ignore generative AI. The longer it takes to adopt it fully, the greater is the risk of consumers switching over to rival companies.