Author: Shabbir Chunawalla

  • Parellelization in AI Chips

    Parellelization in AI chips (say Nvidia chips) denotes their ability to perform many calculations simultaneously, rather than one after the other, as traditional CPUs do. It accelerates data processing in AI and ML tasks.

    First of all, parellelization is a great help in training neural networks. It further helps in executing AI tasks. There are massive matrices, and repetitive mathematical expressions ( additions and multiplications). There are independent computations on large datasets.

    These tasks are broken into smaller pieces and are executed at the same time.

    In such chips, there are thousands of mini-processors or cores. Each core can handle a small task. This many tasks are processed in parallel.

    A CPU processes one task at a time, while an AI chips (GPU) can process thousands of operations simultaneously.

    In training neural network, there are matrix multiplications on data batches, e.g. 1000 images. Each image can be processed independently. The task is split across many cores and each core processes one image or part of that image. It results into faster training and better scalability.

    Matrices in neural networks are made of floating point numbers. These numbers represent data and model parameters. Input data of images is represented as a matter of pixel values ( say 28*28 matrix of numbers ranging from 0 or black to 255 –white. On feeding the model, it is flattened into a 784xj vector.) Neural networks learn by adjusting weights, stored in weight matrices. Input of 784 image pixels are connected to 128 neurons in the next layer. The weight matrix is 128*784. Result Z is 128*1 vector. Each value is the output of a neuron in the hidden layer. It is followed by a non-linear activation function such as ReLU or sigmoid.

    In short, matrix multiplication combines input matrices and weight matrices to produce activations and predictions.

  • Assignment of Weights in a Neural Network

    At the start of the training, the weights in a neural network are typically assigned arbitrarily. Precisely, there is a structural randomness, or they are randomly initialized.

    Random Initialization

    Symmetry is avoided. If weights were initialized to the same value (say zero), neurons in the same layer would learn the same things. That makes training ineffective.

    Random weights ensure neurons process inputs differently, allowing gradients to flow, and useful features to emerge.

    Initialization Methods

    Uniform or Normal Random Initialization

    Weights are chosen from a uniform or normal distribution. It is not an ideal method for deep networks due to issues such as vanishing or exploding gradients

    Xavier Initialization

    For sigmoid, the variance of activation is stable across layers.

    He Initialization

    For ReLU or variants weights are chosen. These are designed to maintain variance of activations and gradients.

    Bias Initialization

    It is typically initialized to zero since this does not affect the symmetry-breaking problem.

    In short, weights are intialized randomly, but not arbitrarily. They are initialized in a way that ensures effective learning from the start.

    Weights are initialized in PyTorch and Tensor Flow/Keras. One can also use GloroNormal for different activations.

  • Parameters of a Neural Network

    By parameters of a neural networks, we mean the internal values or weights the model uses to make predictions or generate text. Parameters are learnt during training. They are weights and biases. They decide how the processes transform input data (words and tokens) into output (the next word or sentence).

    The number of parameters is a rough measure of the size and capacity of the model. More parameters give better ability to understand language patterns. They also give greater memory and ability to generalize. Of course, more parameters also mean the requirement of computational power and hardware is more.

    Small language model has 100 million parameters, e.g GPT-2. Medium language model has 1-6 billion parameters and gives balanced performance. Any large language model has 10 to 100 billion plus parameters. It is a powerful model. Some frontier models have 100 billion to 1 trillion parameters, e.g. GPT-4, Gemini 1.5, Claude 3 Opus.

    In transformer models, words are turned into vectors or embeddings. There are layers of computation. Each layer has millions to billions of weights or parameters. Each layer may be a combination of attention, feedforward and normalization blocks. These parameters are tuned during training across trillions of words to capture grammar, logic, facts etc. Small models have few layers, and large models have many deep layers.

    Parameters are learnt using a (training) dataset which may consist of billions of sentences. Once the model makes a prediction, it could be a wrong prediction and so a loss function is calculated. Backpropagation adjusts weights to minimize the error. The process updates each parameter little by little across millions of iterations. Such adjustments of parameters make predictions better. Inference is the use of parameters to generate or predict text.

  • Small Language Models (SLMs)

    While Large Language Models (LLMs) such as GPT-4 and Gemini hve been in the news and attract investments, small language models (SLMs) are gaining traction in India. They are affordable, efficient and are relevant in resource-constrained environments.

    SLMs are compact, energy-efficient and operate on local devices without requiring continuous cloud access. They are suited for Indian-use cases.

    CoRover has launched BharatGPT Mini, a compact SLM in December 2024. It ensures reliable performance even in offline mode and offers a practical, cost-effective alternative to cloud-dependent LLMs.

    LLMs have significant infrastructure demands. SLMs have an edge here.

    They are good for task-specific applications. They are suited for domain-specific needs, e.g. Fractal has introduced Kalaido-ai-for image generation and Vaidya.ai for medical and health queries. India’s linguistic diversity further boosts relevance of SLMs. Gnani-ai has developed voice-first SLMs trained on Indic language conversations. Automate-365 and Armair-365 have been widely adopted.

  • Natural Diamonds

    Natural diamonds are obtained from the depts of the earth and are the oldest things one could ever hold — born between 90 million to 35 billion years ago. In fact, they are miracles of time, heat and pressure. They had been making under very rare conditions and are shaped by nature. A few make it to light. Some pass-through rivers and are even out to sea. Still, they return their brilliance. Literally, their Latin name means ‘unbreakable’ — an apt metaphor for their resilience.

    A diamond is characterized by four Cs — carat, clarity, colour and cut. However, what brings it to life is its ability to play with light — bends it and reflects it. It scintillates and that defines its beauty. However, cut and shape contribute to its visual presence much more than its size. Jewelers have recognized the importance of light in giving it an unmistakable sparkle.

    On extraction from the bowls of earth and being processed, they reach heights of fashion. They are expressions of authenticity and permanence, in the ephemeral world. That is why we say diamonds are forever — just like love, legacy and time itself. They are passed down and held close.

    They have ruled lifetimes and have celebrated many milestones. They are the most loving promises. They represent the best style.

    India processes 90 per cent of the world’s diamonds. Their prices increase by 3 per cent every year. They are traceable from mine to market through blockchain technology. Beyond their beauty, natural diamonds support real people — miners, artisans and communities. Synthetic diamonds have witnessed a decline in value between 2015-2024 on account of mass production. Natural diamonds retain their rarity, prestige and long-term value. They represent moments, memories and carry bits of warmth from people who pass on them to us.

    Natural diamonds never lose their light and meaning. They are a fragment of earth’s deep past and tell their own story.

    They have witnessed most iconic moments not only for royalty but also for the everyday fashion.

  • Fifth State of Matter

    School textbooks talk about three states of matter — liquid, gas and solid. Later, we learn that matter exists in fourth state also called plasma. Scientists have gone beyond this, and have identified fifth state of matter. Schools should revise their textbooks.

    Liquid particles move freely and give them fluidity. Gases carry dispersed particles occupying the available space. Solid particles are organized and vibrate in fixed positions. Plasma state has loose electrons — a kind of soup and could be seen in lightning and the sun. It is not commonly observed in our daily life. Scientists were looking for any other state not covered by these four states.

    Albert Einstein in 1920 together with physicist Satyendra Nath Bose predicted the fifth state of matter. In an experiment, they cooled bosons close to temperatures nearing absolute zero. At this temperature, the particles lost their individual identities. They acted as a super particle. It happened on account of so little energy. In the absence of energy, they do not move sufficiently. They become indistinguishable. They merge into a collective quantum state. The particle that was the outcome of this experiment was named Bose-Einstein Condensate (BEC). This was confirmed seventy years later by Einstein-Bose-experiment in 1995. Eric Cornell and Carl Wieman managed to cool rubidium-87 atoms to millionths of a degree above absolute zero.

    BECs could help us develop ultra-sensitive sensors which could record variations in magnetic fields, gravity and acceleration. They could help us in advanced quantum technologies and to find new materials.

  • Digital Financial Advice

    Generative AI is utilized to develop digital financial assistants. Gen Z are the potential clients to utilize these services. They use the vocabulary the millennials understand. It all started with robo-advisors and finfluencers. Digital assistants are another cultural shift. However, there should not be pessimism for their role, since they lack one quality which humans have — soft skills.

    Financial advice is a broad concept. An advisor has also to act as a therapist, coach and teacher. Real people go beyond just execution of order. They teach you how to survive in turbulent market. Advice should go beyond data and stats.

    Financial advisors are paid in two ways — they receive a percentage of customers’ investments of assets under management (AUM) each year. Alternatively, they can be paid a flat-rate fee. AUM varies between 0.25 percent to 1.5 per cent. Advisors lower the cost when the size of the portfolio grows. A comprehensive financial advice costs thousands of dollars. What is the entry barrier? It is minimum investable assets requirement. It ranges from $500,000 to $ 1 million. A decade and a half back, these factors prohibited access for millennials.

    This paved the way for robo-advisors. They worked with lower AUM and no asset minimums. Many wondered whether machines would supplant humans. As time advanced, it was clear that the two could exist in symbiotic relationship. Millennials too desired to leverage some soft skills. Advising platforms launched versions with access to humans.

    Gen Z is lured by generative AI since it simulates their vocabulary. They also are early adopters. Teens and 20-plus are fond of using online platforms and social media to receive financial advice. They should be clear that technology is as fallible as humans are.

    Wealth management startups respond to queries by voice and text. They are available to accredited investors and offer access to human professionals.

  • AI-Generated Campaigns

    Products such as DermiCool, Tata Gluco + and Keya Foods released fully AI-generated campaigns. AI has not remained just a toy, it has become a tool of the trade.

    AI today is like the Swiss Army knife of creative departments which is being deployed across ideation, scripting, trend mapping, asset resizing and personalization. These tasks took a great deal of time but could now be accomplished in minutes. It leaves more time for strategic thinking.

    Though AI could be used for storyboarding, versioning, visual development and optimization loops, the creative idea and strategy still begin and end with a human. AI does not replace the human element and at best is an intern on steroids — though fast and still needing human direction and sharp instincts. There is human edge in instinct and insight that AI definitely lacks. Emotional intelligence is inherently human.

    Creatives must be able to brief a machine without losing the soul of the message. It is not necessary to outwrite AI. Of course, we can outthink AI.

    AI has yet not been able to crack the formula that creates memorable advertising.

    The line between human and machine-made is fading. Most resonant campaigns have a human touch and algorithm should work hand in hand, and not at cross-purposes.

  • Digital Film Distribution

    Indian producers of films did the digital distribution of their films through OTT channels. The deal is upfront licensing ranging between Rs.100 -Rs. 300 crores, and the producer gets the cheque. The headache of piracy is on the OTT platforms (on account of piracy, there is loss of viewership). The streaming platforms are the main digital distributors.

    Aamir Khan would like to take a new initiative — skip the OTT channels and use YouTube’s Transactional Video on Demand (TVOD) model. It will be used for the first time for a Hindi film release. YouTube has a massive reach — over 2.5 billion users globally and 460 million in India. It is the most accessible distribution channel available today, far outpacing the standalone OTT services.

    YouTube TVOD model is performance driven, and Amir wants to use it for his film Sitaarey Zameen Par.view. YouTube is service-per-view OTT channels are subscription or membership-based channels. In the absence of the movie on OTT channel, the other option is to view it in cinema hall. It could lead to a longer theatrical run, since users will have to play to watch the movie, either on YouTube or in theatres, one way or the other.

    Under YouTube TVOD, the content producer typically retains 70 per cent of the revenue, while YouTube takes a 30 per cent share. This reduced traditional distribution overheads (which account for 25 per cent of a film’s revenue). The cost advantage is moderate since the producer shares the revenue with YouTube.

    The dreaded menace of piracy prevails in cyberspace. In case of YouTube, it falls on squarely on the producer.

    The disadvantage is that if the movie does not click with the audience, the producers suffer. The OTT platforms offer upfront licensing deal whereas the YouTube model is performance driven. The earnings are a result of audience engagement and contents quality. It is a commercial risk a producer takes.

  • Supercomputing in Space

    May 14. China launched the initial 12 satellites of its plan to create computing constellation in space. Every satellite is equipped with smart computing apparatus and laser-quick intersatellite communication connections. It will lead to in-orbit data processing. On a full-fledged roll out, the constellation will provide 1000 peta operations per second (POPS). It will match the earth’s strongest supercomputers.

    Of course, it is unique computing power. What makes it special is where it is occurring. The conventional satellites transmit the information to earth for analysis. Here the satellites will take decisions on raw data in space. It is an AI-powered 8 billion parameter system with speed of 100 Gbps laser links.

    It has reduced ground-based infrastructure dependence. It curtails time to process and respond to enormous data. It will prove to be a game-changer. Real time data crunching in space could outpace earth’s top supercomputers.

    This network will be further expanded to 2800 satellites. It is a big deal for the future of computing.

    There are, of course, security implications of such a satellite-based system. Ground-based data centers are power hungry. The orbital data centers can be powered by solar energy and cooled by naturally radiating heat into the vacuum of space. No need for massive air-conditioning (avoiding emissions).

    It will cut carbon emissions on earth. It will generate cloud computing in space.

    India should seriously look into all this, especially at ISRO.

    There are strategic and ethical issues. Who controls information processed in space? Will space be turned into a war theatre for missiles? Earth orbit will become a digital nervous system. The future of computers could relocate to orbits.