How Chat Bot Answers a Query

As we know, generative AI models have been trained on massive amount of data. Since computers do not understand text, the models do not take text as it is, but take it in a numerical format. These are called embeddings or representation of data in a numerical format. All inputs to LLMs (large language models) and outputs from LLMs is through embedding. If we have to access these embeddings, it is time consuming. Therefore, these embeddings are stored in Vector Databases, which store them and from which these can be retrieved.

Thus we know, that embeddings or vector embeddings represent data — text, images, audio, video and so on. The data is in the numerical format in any n-dimensional space. It is called a numerical vector. Word2Vec developed by Google is a model that converts words to vectors. All LLMs have their respective embedding models to create embeddings.

This way the vectors can be compared to each other. A computer cannot compare two words, but can compare two vectors, we can create a cluster of words with similar embeddings, e.g. ball, bat, wickets, pitch will appear in a cluster as they are related to cricket.

The embeddings facilitate finding words similar to a given word. These can be made into sentences. A sentence can be used as an input to obtain related sentences from the data stored. It is the basis of semantic search, sentence similarity, anomaly detection and chatbot.

The chat bots perform question answering from a given PDF, Doc. by making use of the concept of embeddings.

All LLMs use this approach to get similarly related content to the queries provided to them.

A chat bot based on PDF is asked a query. As we know the data is represented in vector embeddings. Similarities are detected between different parts of data. Data is extracted which is similar to a particular embedding. Vector Store performs the similarity search through search algorithms. It fetches all relevant data. These are passed to chat bot which generates a final answer for the user.

Chat bots create vector embeddings by using ML algorithms which are trained on massive amount of data to learn how to represent words or phrases as vectors of numbers. The most popular algorithm is Google’s Word2Vec invented in 2013. Word2Vec takes a word and spits out an n-dimensional coordinate (or vector) so that when these word vectors are plotted in space, synonyms cluster.

Industry 4.0

Industry 4.0 is a convergence of several technologies, both physical and digital — Internet of Things (IoT), artificial intelligence (AI), drones, robots, autonomous vehicles and other interconnected technology that have the potential to communicate, analyze and act.

Industrial IoT (IIoT) refers to the industrial subset of IoT in manufacturing industries. It means the use of smart sensors, and actuators to improve manufacturing and industrial processes.

It leverages the power of smart machines and real-time analytics to capitalise on data generated by these machines.

In digital twinning, there is the process of recreating a physical object on a virtual interface to improve the overall business process. Digital twins can be used in various ways.

Digital twins traditionally focused on anomaly detection and remote maintenance. As technologies such as IoT digital twins, AI and ML have emerged, the whole organisation can be connected, instead of connecting just one asset. The digital twin software created a holistic digital experience.

Mere buying the technologies is not enough. People are the backbone of a company’s success. People should be ready to adapt to new technologies.

Akashvani

Prasar Bharati is India’s public broadcaster. Its radio services will no longer be referred to as All India Radio (AIR), but will be referred to as ‘Akashvani’. This decision taken by the Government long back was not operational earlier, but is being operationalised now. The Prasar Bharati Act or Broadcasting Corporation of India Act, 1990 refers to Akashvani. The Act came into force in 1997. There should be compliance with the name change with immediate effect. Rabindranath Tagore while inaugurating Calcutta shortwave services in 1939 wrote a poem where AIR was referred to as Akashvani. Akashvani Mysore, a private radio station was set up in 1935.

AI as Transformative Tool

Organisations are undergoing digital transformation. It makes them faster and agile. AI, especially generative AI, accelerates growth. It facilitates new concepts being tested, processes being optimised and new solutions being discovered. However, AI alone cannot drive digital transformation. AI must be backed up by strategic management, good product management, excellent engineering and data management. Business must take this holistic approach to gain maximum growth from digital transformation.

There is resistance from the legacy systems. The implementation cost for new technologies is significantly high. There is a change — from traditional systems to new technologies. Despite these obstacles, there are benefits of AI which outweigh all these.

AI is adopted as a transformative tool. AI should be fair and free of bias. Patterns and algorithms must be uncovered for unfairness. There should be preprocessing and diverse perspectives. There should be ethical guidelines. There should be oversight on AI.

New Crypto –Worldcoin

Sam Altman of OpenAI has become the co-founder of Worldcoin, cryptocurrency project. The trading in this currency commenced on Monday, 24 July, 2023. The currency recorded an initial price of $1.70 before falling back to $2.52 at noon in London. By the intraday trade in London, $145 million worth of token had been traded. On the world’s largest exchange, Binance, the crypto hit a peak of $5.29. It had seen a trading volume of $25.1 million.

The core offering of this project is its World ID. It is described as a ‘digital passport’ and proves that the holder is a real human, and not an AI bot. To get the World ID, a person undergoes iris scan of the eye, using orb or a silver ball, approximately the size of a bowling ball. Once the orb’s iris scan verifies the person is a real human, a World ID is created.

The initial supply of the crypto is capped at 10 billion tokens. The initial launch consisted of 143 million Worldcoins, out of which 100 million were loaned to market makers. The remaining were allocated to investors who were verified by Orb.

Since its launch the orbing, operation is being scaled up to 35 cities in 20 countries.

Blockchain can store the World IDS in a way that preserues privacy and cannot be controlled or shut down by any single entity.

San Altman believes in the concept of UBI or universal basic income to remove inequalities. Since World IDs are with real people, these could be used while implementing UBI. Worldcoin lays the ground work for the UBI concept to become a reality.

People around the world are getting their eyeballs scanned in exchange for a digital ID and the promise of free crypto currency. Each verified user will receive 25 free World coin tokens.

During a trial period, the company has issued IDs for more than two million people in 120 countries. The trial period is the period of last two years.

In London, the Worldcoin representatives showed a stream of people how to download the app and get scanned, handing out free t-shirts and stickers saying ‘verified human’.

Worldcoin tokens were trading around $2.30 on Binance on Tuesday, July 25, 2023.

AI in Assisted Fertility

AI assists the doctors to select the ideal embryo in IVF. AI is being used in fertility treatment. AIVF, a reproductive technology company based in Tel Aviv (Israel) has developed an AI-assisted software (called EMA) that processes vast amount of data to facilitate embryo selection process.

An Orissa-based startup Santaan offers services using AI to select embryos for transfer to the wombs. This selection is crucial for the success of and IVF cycle. AI algorithms analyse the images of embryos to predict which have the highest probability of leading to a successful pregnancy. AI prevents unnecessary transfer of the embryos to the uterus. It also minimises the risk of multiple pregnancies, say the birth of twins, triplets, quadruplets and so on.

Indira IVF clinic, Delhi too uses AI to improve embryo selection.

Machine learning (ML) is used for select oocytes, female germ cells (involved in reproduction) and monitor their behaviour during intracytoplasmic sperm injection (ICSI). It is an assisted reproductive technology. (ART).

AI is used in analysing embryo development. It is used in semen analysis and DNA integrity. Embryologists can identify sperm cells of males who suffer from infertility.

Cryopreservation technique is assisted by AI to maintain and preserve cells tissues and other biological samples. AI analyzes datasets of frozen embryo outcomes. The patterns and factors are identified. These influence the viability of thawed embryos. AI can be used to develop protocols for cryopreservation.

AI can be used to asses the suitablility of reproductive organs, uterus and ovaries, and identify anomalies.

AI in ART is useful, but it requires precise data. AI makes predictions. These have to be validated by comparing them with clinical outcomes. Thus we can refine the algorithms.

There are ethical issues. ART processes involve highly sensitive data ( personal information). There should be stringent data protection. There should not be unauthorized access.

AI-driven ART processes are expensive. IIT, Hyderabad is working on an indigenous and affordable solution in this field.

Chip Ecosystem

A fab plant manufactures Integrated Circuits (ICs) by working on raw wafers through a complex process. The manufacturing involves 500 machines. There are some 700-1500 processes, some of which require heating to 1100 degree centigrade, at times as many as 27 times. The process requires 300 plus gases, and chemicals such as acetone. Many of the inputs depend on imports as these are not made in India. A fab plant requires a minimum investment of $3-4 billion.

The raw material most commonly used is silicon. Some materials are combination of elements –compound semiconductors. These materials are gallium nitride and silicon carbide. These are more heat resistant and compact. The material plants are set up with an investment of an investment below $500 million.

There are different kinds of chip makers. Some are called IDMs — integrated device manufacturers. Samsung and Intel too both design and manufacture chips. Some manufacturers are foundries — they make chips under contract for others, e.g. TSMC. Some organisations are fabless firms — they only design the chips, e.g. Qualcomm, Media Tek and Nvidia. These designers then get them manufactured by foundries.

Stages of Making a Chip

To begin with, wafers are carved out from a salami-shaped bar of 99.99 per cent pure silicon. These are polished to ensure smooth texture. The films of conducting materials are deposited on the wafer.

The second step is to cover the wafers with a light-sensitive coating (photo resist). Those areas which are exposed to UV light change their structure. Thus they become ready for etching.

As a third step, these are put through a lithography machine. It decides just how small the transistor on a chip could be.

As a fourth step, the wafer is etched. It is then baked to reveal the 3D pattern of open channels. It creates a cavity in the wafer with exact depth. Lastly, the wafer is subjected to bombing with ions to facilitate the control of the flow of electricity.

There are two models after this. Foundries or IDMs might do the few processes left themselves or they can outsource to third parties to reduce the investment.

The player involved is outsourced semiconductor assembly and test (OSAT) who do the assembly, packing and testing of the ICs for others. There are companies such as Micro which can set up an assembly, testing, marking and packaging (ATMP) operation. Micron can use its own fabs produced abroad. The chips are then taken out of the wafer, sliced and diced with a diamond saw. These become individual chips.

Wafers can contain a few chips or thousands of chips.

IC packaging is also done — wire bonding and laser marking. The wafers and ICs are tested.

Types of Chips

First there are logic chips.They process information to complete a task. These are used to optimize visual display. They also do processing of ML and deep learning apps. They also act as processors in CPUs.

Secondly, there are memory chips used to store information and save data when machines are switched on and also when they are off.

Thirdly, there are application specific ICs. They are single purpose chips used for repetitive, routine processing.

Lastly, there are chips which carry a system on them. These integrate and combine many chips and circuits into one chip, e.g. a camera, video or Wi-Fi.

Chip Designers

Companies such as Qualcomm, Medi Tek design the chip and get it made by the foundries.

Some Comments

Initially, there will be limited buyers for wafers of any fab plant in India. There are hardly any companies in India which can place orders. India lacks fabless design companies such as Qualcomm to place orders. India must have a foreign partner who can buy back and offtake agreement with the fab company.

Fab plant requires complex technology. However, it is not enough to pay and get technology and drawings to make a fab plant. There should be desirable yields — the total number of chips produced to the maximum chip count on one wafer, ideally it should be 90-95 per cent.

Fabless design companies are ready to buy provided the prices are competitive and the quality is on par with other foreign firms.

Some IDM players can make their own design and fabricate the chips in their own fab plants. However, big IDMs have not shown much interest in India.

Micron is ready to shift its ATMP functions to India to test wafers. Its global plants will import the wafers at agreed transfer price.

Some experts suggest India should have compound semiconductor plants (based on gallium nitride or silicon carbide, rather than pure silicon). These require investments of $100 million-$500 million. But these can be built quickly and have a growing market in automobiles, telecom and power electronics.

Google Bard-ChatGPT

A new version of Bard is giving tough competition to ChatGPT. Some features introduced are not available on ChatGPT. Bard is available in EU countries and Brazil too. Interaction with Bard is possible in 40 languages. We can upload images on Bard. Bard has the capability to readout the answers. It can store conversation history. These conversations can be pinned for easy retrieval. These could be shared also through links. There is ‘modify response’ feature now. Code could be directly exported to Replit. These are groundbreaking techniques.

LLMs and Biology

LLMs are likely to be used in the field of biology, by learning the language of biology. In previous century, there was considerable research in molecular biology, biochemistry and genetics. It is obvious that biology is programmable. It is decipherable too. There four basic components of life — adenine(A), cystosine (C), guanine (G) and thymine (T). Computers depend on the binary system of 0s and 1s. Biology depends on the quartenary system of A, C, G and T. Here there is conceptual overlap. Proteins are made of amino acids — could be a few dozen amino acids to several thousands of them. There are 20 amino acids to choose from. Thus this too is amenable to computerization.

Denis Hassabis, DeepMind, treats biology as an information processing system. Physics depend on maths as its primary language. AI could as well depend on biology as its primary language.

LLMs work optimally in presence of massive signal-rich data. LLMs infer patents and structures. They generate novel output by comprehending the topic.

By ingesting the whole Internet, ChatGPT has become conversational.

If LLMs are trained on biological data, they could learn the language of life.

In early application, they could be used to design proteins, the building blocks of life. Proteins have shapes as their functions. Antibody proteins target foreign bodies, the antigens, just as the key fits into a lock. Enzymes accelerate biological reactions. They are proteins which bind with certain molecules. This alignment makes us aware about how the life functions.

Protein’s one-dimensional structure was converted to 3D using protein alignment. This was done by using AlphaFold AI system. Of course AlphaFold has not been developed using LLMs. It used MSA — multiple sequence alignment from bioinformatics. But this has limitations. It is a slow compute intensive system. It cannot be used for ‘orphan’ proteins with no analogues. Such proteins constitute 20 per cent of all proteins. Protein structure can be deducted/predicted using LLMs.

LLMs can be trained on protein sequences, instead of the English language. They can efficiently be used to predict the protein structure. It started in 2019. In 2022, Facebook put forward ESM-2 and ESMFold, two powerful protein models. They had 15 billion parameters. The predicted sequences can be reversed, thus paving way to generate novel protein structures.

AI can be used to invent new proteins. The vast unchartered protein space can be explored. It iss a nascent field.

LLMs can be used to generate biomolecules such as nucleic acids.

The ultimate aim is to go beyond modelling. We have to study the interactions of proteins with other molecules, and cells, and tissues, and organs so as to cover the whole living organism.

20th Century was dominated by Physics. It is expected that 21st century will be dominated by Biology.

Safeguarding AI Content

July 2023. Seven leading IT firms — Alphabet (Google), Amazon, Apple, Meta (Facebook), Microsoft, Nvidia and Tesla — are meeting President Biden to reach an agreement about new AI systems for being transparent and secure.

The AI systems will be subjected to testing, both internal and external, before their release. The systems will be probed for security flaws and discriminatory tendencies.

These firms will make new commitments to share information to improve risk mitigation with governments, civil society and academics. They will report vulnerabilities as they emerge. Leading AI companies will incorporate watermarks into the material they generate. However, this system is yet to be developed. It may prove difficult to stamp content in a way that it cannot be effaced by malignant actors.

There is immense public interest in the emerging technologies, and concern over the societal risks.

The meeting will be important first step in ensuring responsible guardrails for AI.

AI is expected to benefit the society at large, and therefore should be built and deployed responsibly.

Before the present meeting, the executives have already met the VP Kamala Harris.