Blog

  • Film Making Challenged by AI

    These days Hollywood is on strike. It is for the first time in the last 60 years that writers and actors have jointly participated in the strike.

    Writers are fighting the disruption to the residual payments. Actors are agitated over their digital likenesses without their consent.

    Both the writers and actors feel a threat from AI. AI can generate watchable screen-plays, thus threatening the writers, affecting both their livelihood and reputation. Actors are worried about their digital replicas. In the Netflix series Black Mirror, Salma Hayek signed away the rights of her digital likeness to act out unwittingly. The results are disgusting.

    Can we say that the likeness of a human being does not belong to that human being? Are we reducing human being to zeros and ones? The tools of AI in future can deploy anybody to recreate a person’s likeness.

    There is a connect between humans and the moving images.

    AI has the potential to eliminate the whole process of film making. The technology is already ready, and is further developing at a rapid pace. The background performers or junior artists would be engaged for a day. Their work will be scanned, and the future shots will be produced by their likenesses for ever.

    Of course, AI-generated content will be tested by the audience, who will vote with their wallets, and their vote will be for great works with human connections.

  • AI in Advertising

    1. AI is helpful in making hyper-personalised ads by using information of user behaviour and purchase history. Such ads would lead to higher engagement and conversion rates. Tata Motors used AI to launch personalised ads for Nexon for different audience segments. The AI algorithms analyze customer data and preferences to design the ads.
    2. AI is helpful in segmentation of the audience. At present, advertising uses broad and vague audience groups. AI can make advanced data analytics. Thus it reduces the ambiguity. Highly specific audience segments can be carved out based on nuanced parameters such as unique behaviour patterns and interests.
    3. AI, as we have observed, generates personalised ads. This is helpful in automating the various stages in an ad campaign. The campaign could be adjusted on real-time data in terms of narrative. It improves campaign performance. There is tracking of campaign in terms of impressions, clicks, conversions. The advertisers, thus, could make timely adjustments.
    4. AI ultimately is helpful in budget re-allocations since it assists in ad performance tracking.
    5. AI could be used to generate an ad copy by using natural language processing and descriptions. Myntra does so.
  • Deep Learning Hardware

    Google claims that its Tensor Processing Units (TPUs) are faster (1.7 times) than the A100 chips from Nvidia which power most AI applications. TPUs are more energy efficient (1.9 times) than the A100. Thus Google’s processing of AI is greener.

    Nvidia’s A-100 core GPU is based on the new Nvidia Ampere GPU architecture. It adds many new features and delivers faster performance for HPC, AI and data analytics workloads.

    Google’s TPUs are application-specific integrated circuits (ASICs) especially to accelerate AI. These are liquid cooled and designed to slot into server racks. They deliver up to 100 petaflops of compute and power Google products like Google Search. Google Photos, Google Translate, Google Assistant, Gmail and Google Cloud AI APIs.

    CPUs are central processing units. GPUs are graphical processing units to enhance the graphical performance. TPUs are optimized for tensor operations. The CPU architecture is general purpose. GPUs offer flexibility and precision options. TPUs are optimized for tensor operations. GPUs have greater memory bandwidth than TPUs but higher power consumption. TPUs are energy efficient and performance efficient.

  • Checking the Fact Checkers

    There are concerns for misinformation campaigns and fake news. Such communication affects not only individuals, groups but also the governments and big corporates. These could also affect elections, valuation of companies, supply chains and individual reputations.

    It is extremely necessary to check the facts — a piece of information can be true or false, and it is branded so on a question of fact. It gives rise to fact checking individuals and organisations. Fact checkers aim to get closer to the truth, but they can have biases, which shrouds the very truth they seek to check.

    Even wrong videos can be ‘verified’ (as true). There are ‘unverified’ videos showing Tesla cars catching fire. There could be wrong financial evaluation of companies by vested interests, say short sellers, bringing a run on the company. Fact checking is adjudication. Public interest is involved. Can it be outsourced to private parties? Or only the government has to regulate this. It is an issue of morality also. Since public interest is involved, legally only a judicial or quasi-judicial authority must decide it. The only requirement is that the regulation governing fact checking authority must be fair. The findings of fact checking authority are subject to challenge in a court of law.

    Decisions in this area have to respect freedom of speech and personal liberty.

    Private players do fact checking. Here the issue is who will check the checkers? Some of those build a reputation over a period of time, and become trustworthy.

    This is an era, where it is both very easy and very difficult to verify. There are predatory attacks on individual and corporate reputations. This has not changed from the era of Mahabharat where Yudhishthir , whispers ‘naro va kunjaro va’ when the news of Ashwatthama’s death was spread.

    Validation of responsible private checkers is required.

  • Neurotechnology on Steroids

    Brain signals can be manipulated. It is neurotechnology. It is taking rapid strides. As it affects human rights, it requires global regulation(Unesco).

    These days they interface computers with human brains, and while doing so use artificial intelligence (AI). The aim is to analyze the neural activity.

    AI-assisted neurotechnology means putting neurotechnology on steroids (Mariagrazia, Report on Innovation in Neurotechnology).

    The life of people who live with their disabilities is improved by neurotechnology assisted by AI. It is used to treat cerebral ailments. It is used to diagnose brain-related disorders. There is use of implants at times.

    If the technology is abused, it affects human rights and freedom. It can affect our identity, autonomy, privacy, sentiments, behaviours and overall well-being.

    We have reached a point where the very essence of being a human can be changed.

    The field is attracting substantial private investments, including Neuralink of Elon Musk. There are many scientific papers and patents of late in this area.

    These are the days when non-invasive devices have been used to decode information. There is a need to protect mental privacy

    Corporates take ownership of the data collected during such studies. In an experiment, implants in the cortex of mice made them see things they are really not seeing–hallucinations. What is possible in future, it should be discussed now. Here part of the cerebral activity happens outside the brain.

    No one objects to neurotechnology. It has the potential to reduce death and disabilities. At the same time, a globally coordinated effort should be made to regulate neurotechnology.

  • How LLMs work

    ‘The quick brown fox’ is the input sequence. It is vectorized. The vector is used to calculate the attention weights. The attention weights are used to create a weighted sum or hidden states of the encoder. It is then passed to the decoder. The attention mechanism focuses on the words ‘quick’ and ‘brown’. The output vocabulary is used to generate a probability distribution over the possible next words. The word ‘jumps’ is predicted as the next word. While deciding probability of each word, the output of attention mechanism is used. The decoder repeats this process. The predicted word is taken as input in the sequence and is used for the next reiteration. The process continues until the decoder predicts the end of the sequence.

    While training, decoder could be subjected to masked language modelling. Some words in the input sequence are masked out. The decoder predicts the masked words. It helps the decoder to focus on the context of the current word, while predicting the next word.

    After Vaswani’s paper on Attention Mechanism (2017), transformer model is used. It has since then changed to decoder-only transformer (2019). This transformer is less accurate, and is used where accuracy is not as important. The decoder-only transformer takes the previous words in the sequence as input. The decoder only transformer produces a sequence of hidden states which are used to predict the next word in the sequence. It does so depending on the previous words in the sequence. Here first a score for each token in the input sequence is computed. the score is based on how well the token matches the current score of the decoder, the tokens with the highest score are then used to generate the next token in the output sequence. The most common scoring function is the dot product.

    The attention weights are calculated for the hidden states. They indicate how much attention is to paid to individual words in the sequence. The attention weights are used to combine the hidden states in a single representation. This representation is used to predict the next word in the sequence.

    LLMs use deep learning techniques by analyzing and learning from vast amounts of text data. They use this data to learn the relationships between words and phrases. It is called ‘transfer learning’ where a pre-trained model is adapted to a specific task. In training, they process vast amounts of text. They learn the structure and meaning. They are trained to identify meanings and relationships between words. They deal with large swaths of text to understand the context better.

    Vectorised models can use a distributed representation where different words with similar meanings have similar representation and are close in vector space.

    A unigram model works taking each word in a sentence independently. Bigram models examine the probability of each word in a phrase depending upon the probability of the previous word. A trigram model considers two previous words. An n-gram model considers n-1 words of the previous context.

  • Expensive LLMs

    First of all, the total costs of building an LLM consists of manhours spent by highly talented manpower, who use expensive chips to train them and the various operational costs. We are leaving aside the fixed overheads here.

    LLMs and even smaller models are expensive to train and deploy. There is hardware cost. Training prerequisites are the GPUs or graphics card. Nividia’s A 100 which is commonly used costs $10000. The computation requires tens of thousands of these GPUs. A GPT-3 model has 175 billion parameters. It takes around 285 plus years of computation. To make it manageable, OpenAI used thousands of GPUs to reach its computation goals. According to one estimate, OpenAI may have used more than 30,000 GPUs to commercialise ChatGPT. It wold have cost $30 million. While integrating it to Bing, Microsoft could have spent over $4 billion in the hardware cost. Bard-powered Google could have cost Alphabet $100 billion.

    Running costs of such a model is also very high. An exchange with an LLM costs several times more than the search on search engine. A ChatGPT-like model receives millions or billions of daily queries. Basic running costs are too high even for organisations with deep pockets. According to one estimate, OIpenAI spends about $7 lac per day to run ChatGPT.

    Talented manpower’s salary (with compensations over a million dollars per annum) is another cost component. Skilled talents come at premium cost.

    There is environmental cost on account of carbon emissions.

    There is constant research on LLMs. There are data collection costs. There is electricity cost. And a host of administrative costs.

    All these costs take the best model out of the reach of public. All factors are antithetical to mass adoption. Even Big Tech will have to ration the services.

    Not-for-profit models care not sustainable.

    Organisations must give serious thought to how these models can be monetised. OpenAI has converted itself into for-profit.

    Research will have to focus on reducing the training cost and hardware cost. Already, organisations such as Google produce their own chips, e.g. TUP or Tensor Processing Unit, Amazon’s Inferentia and Trainium, Facebook’s MTIA etc. There should be research on computer memory.

  • Paucity of Training Data for LLMs.

    LLMs are fed on massive data. Stuart Russel, University of California, Berkeley feels that soon there will be no data left to ingest and bots like ChatGPT may hit a brick wall. In near future, the whole field of generative AI may be adversely affected by paucity of data. It is this anxiety that compels companies to resort to data harvesting. The data collection processes as it is are under a radar of those whose copyright material is being used. Much of data collection is being done without consent. The most worrying factor is the shortage of data — all high quality data could be exhausted by 2026. Such high quality data is sourced from books, news articles, scientific papers, encyclopedias and web-content.

    OpenAI has bought datasets from private sources. We can infer that there is acute shortage of high quality of data.

    GPT-4 has been created by making use of public data as well private data. OpenAI has, however, not revealed the sourcing of data for GPT-4

    Sam Altman, CEO, OpenAI has no plans to offer an IPO, as there could be conflicts with investors, in view of the unorthodox structure and decision making in the company.

  • Fine Tuning an LLM Model

    Pre-trained LLM models are retrained on specific datasets. That is called fine tuning the model. It makes the model ready for the specific content of the present needs, say you train an LLM on medical datasets to help diagnosis of a specific disease. An LLM can be made more specific in particular domain by fine tuning. The model is trained on smaller but targeted dataset relevant to the desired task or subject matter.

    A fine tuning API is used to fine tune a model. It is done online. If the weights of the model are open source, fine tuning can be done on our premises. HuggingFace offers an easy Auto Train feature. You can select the parameters on your own, (manual) or can do so using Auto Parameter Selection On Hugging Face. The best parameters for the task are selected.

    A fine tuned model is evaluated on three criteria — perplexity, accuracy and F1 score. Perplexity refers to how well the model predicts the next word in a sequence. The lower the perplexity score, the higher is the ability of the model to predict the next word. Accuracy refers to how well the model performs on a given task. It is given by division of correct predictions by the total number of predictions. F1 score refers to how well the model performs on a binary classification of tasks. Here harmonic mean of precision and recall is taken.

    If fine tuning is to be done for a different application, one will have to repurpose the model by a small change in its architecture. Here embeddings produced by the transformer part of the model are used. (embeddings are numerical vectors).

    In repurposing, the model’s embedding layer is connected to a classifier model, e.g. a set of fully connected layer. The LLM’s attention layers are frozen. They should not be updated. It saves compute costs. Classifier is trained on supervised learning dataset.

    In some instances, parameters weights of the transformer are updated. Here attention layers are not frozen. Fine tuning covers the entire model. It is expensive computationally.

    To update a model knowledge-wise, say in medical literature, an unstructured data set is used. The model is trained through unsupervised or self-supervised learning. Foundation models are trained this way.

    At times, more then knowledge upgradation, an LLM’s behaviour is to be modified. Here supervised fine tuning (SFY) dataset is used. It is a collection of prompts and the responses elicited. It is also called instruction fine tuning.

    Some organisations use reinforcement learning from human feedback (RLHF), taking SFT to the next level. It is an expensive process. Human reviewers and auxiliary models are needed for RLHF. Only well-equipped AI Labs can afford this. RLHF brings humans in the loop.

    Research is directed to parameter-efficient-fine-tuning (PEFT), e.g. low-rank adaptation (LORA).

    Some models cannot be fine tuned, especially models available through API. At times there is no sufficient data, or data changes frequently. The application could be dynamic or context-sensitive. Here one can use in-context learning or retrieval augmentation.

  • From MCP to Dr. Licata on Neural Networks

    In 1943, American neurophysiologist and cybernetician of the the University of Illinois, Chicago Warren McCulloch and psychologist Walter Pitts published a paper ‘A Logical Calculus of the Ideas Imminent in Nervous Activity’ describing the ‘McCulloch-Pits’ (MCP) neuron. It was the first mathematical model of a neural network.

    They described brain functions in abstract terms, and showed that simple elements connected in a neural network can have immense computational power.

    The paper received little attention. The ideas were applied by John von Neumann, Norbert Wiener and others.

    MCP paper was the pioneer in the field of Artificial Intelligence (AI) and cognitive science. It is a core event in computer science and AI. The brain is considered a neural network and the mind is interpreted as a product of its functional properties.

    Biological neuron takes an input signal (dendrite), processes it like CPU (soma), passes it through a cable like structure to other connected neurons (axon to synapse to other neuron’s dendrite). There is a lot more than this in the functioning of a biological neuron, but broadly what happens in our brain is that there is an input, there is processing, and there is an output. The sensory organs send the input to activate a neuron. The decision making is actually done by a couple of neurons.

    Human brain consists of an interconnected network of 10 raised to 11 neurons (100 billion). The connections are complex.

    The output of the processes is passed on to the next layers in a hierarchical manner. There is division of work. A neuron may perform a certain role to a certain stimulus. Each layer has its own role and responsibility. Some functions, e.g. face recognition, could involve many layers.

    MCP designed a network consisting of nodes — a part that takes an input, and a part that makes a decision. The neuron learns Boolean functions — inputs are Boolean and the output is also boolean.

    The MCP neuron is a binary neuron, which manifests either active or inactive states. Its activation is determined by the sum of inputs received by it. If the sum of the inputs is greater than a certain threshold, the neuron fires and remains active. If the sum of inputs is less than or equal to the threshold, the neuron remained inactive. The MCP neurons can represent any logical expression — logical function such as AND, OR and NOT could be implemented by the MCP neurons.

    Brain simulation is problematic because of the complexity of its structure consisting of 100 billion neurons and 1000 trillion synaptic interconnections. Beside communication in the brain is not digital, it is electromechanical. There are inter-related timing and analogue components. Simulation of brain is beyond the technological reach today.

    Neural networks roughly resemble the structure of the brain. The architecture is arranged into layers, and each layer has processing units called nodes. These are in turn connected to other nodes in the layers either above or below. The data fed into the lowest layer is passed on to the next layers. Artificial neural networks are fed with huge amount of data. These are designed to function like biological neural networks. However, brain’s functioning is much more complex.

    Real neurons do not compute the output by summing up the weighted inputs. Real neurons do not remain on until the inputs alter. The output might encode information using pulse arrangements.

    Dr. Licata published a paper in Journal of Computer Science and Biology. The paper questions whether artificial neural networks are good models for human minds.

    According to Dr. Licata, they are not good models for human mind. However, it does not make them useless since they do computation in parallel.

    Modern science has yet to distinguish between human mind and brain. Research is needed for the concept of consciousness. It is necessary to understand how thought emanates. Artificial feedback is unstable. Neurons in the brain which do thinking and planning have tree-like structures. It is not clear how the brain solves the credit assignment problem.

    It is necessary to integrate research in neuroscience and AI, since the paper of MCP in 1943, there is very little integration of neuroscience and AI.