Author: Shabbir Chunawalla

  • Humanoid Robot

    Robots have emerged as a critical new frontier for the AI industry. There is a potential to apply state-of-the-art technology to real world tasks. Robots can be deployed to perform tasks that are too dangerous or unpalatable for the human beings. Of course, they can also assist to do many laborious and monotonous tasks.

    A company called Figure AI Inc has been founded. It is working on a robot that looks and moves like a human. The humanoid machine will be called Figure 01 and will perform tasks that are unsuitable for people and can alleviate labour shortages.

    Figure AI Inc. is backed by OpenAI and Microsoft. It is raising funds — about $675 million. It is a pre-money round at a valuation of about $2 billion.

    Jeff Bezos of Amazon has committed $100 million through his firm Explore Investments LLC. Microsoft is investing $95 million. Nvidia and Amazon.com Inc-affiliated fund each are providing $50 million. Other companies such as Intel, LG, Parkway, Align are investing too. OpenAI is investing $5 million.

    In May 2023, Figure has raised $70 million in a funding round led by Parkway. It announced then that Figure is going to be the first to bring to market a humanoid that can actually be useful and do commercial activities

  • AI: The Road Ahead

    Surprising Altman compares OpenAI and the Manhattan Project. Both are treated as projects which require protection against catastrophic risks. Many scientists are skeptical about AI gathering world-ending capacity anytime soon, or for that matter ever.

    Instead, attention should be focused on AI bias and toxicity. Sutkever believes that AI, either from OpenAI or some other can threaten humanity. At OpenAI, 20% computer chips are available for superalignment’s team research.

    The team is currently developing the framework for AI’s governance and control.

    It is difficult to define superintelligence and whether a particular AI system has reached that level. The present approach is to use less sophisticated models such as GPT-2 so as to guide the more sophisticated models towards the desired direction.

    Research will also focus on a model’s egregious behaviour. Human beings are trading off between weak models and sophisticated models. But can a lower class student direct a college student? The weak-strong model approach may lead to some breakthroughs, as far as hallucinations are concerned.

    Internally, a model recognises its hallucination — whether what it says is fact or fiction. However, the models are rewarded, either thumbs up or down. Even for false things, they are rewarded at times. Research should enable us to summon a model’s knowledge and to discriminate with such knowledge whether what is said is fact or fiction. This would reduce the hallucinations.

    As AI is reshaping our culture and society, it is necessary to align it human values. The most important thing is the readiness to share such research publicly.

  • AI in Cancer Therapies

    Internationally, there is an increasing interest in the use of AI for treating cancer by facilitating new therapies and by diagnosing patients at early stages. Doctors can also select appropriate treatment by identifying patients at high risk (say those who are likely to develop pancreatic cancer up to three years earlier). This is game-changing, since most get diagnosed when there is advanced cancer or when cancer has metastasized.

    AIMS researchers have developed a supercomputer and AI helps doctors to identify the best cancer therapies (out of so many available) for their patients. A supercomputer, a server and AI help doctors to understand the genetic mutations in their patients. Doctors can select the most appropriate therapy for such mutations. To illustrate, a HER2 breast cancer is cross-referenced to therapy that has worked for most patients of similar genetic make-up. Doctors, thus, make informed, faster and precise therapy choices.

    Here iOncology AI is used. The supercomputer is located at Pune. The server is located at Jhajjar (at National Cancer Institute). iOncology AI aims to sequence genomes of 3000 cancer patients at AIMS, Delhi. They try to correlate this data to diverse cancer therapies to get the most efficacious therapy. The model has been tested on breast and ovarian cancer patients and has recorded 75 per cent accuracy as compared to the clinical diagnosis. Genomic data is more powerful tool for medical researchers and doctors. The system is being validated in several hospitals in MP. After studying the clinical data and genomic make-up of several thousand cancer patients, the model will be able to help the doctors in selecting the appropriate treatment for the next patient. It becomes targeted treatment.

    A doctor will have to upload a scan or histopathology report on the platform. The trained AI will be able to flag automatically certain anomalies. It may also indicate a very small tumour that a radiologist could miss. The system is useful in early detection of cancers.

    Harvard Medical School has developed a specific tool for colon cancers.

    Patient’s confidentiality is maintained. A radiologist is able to see the scans uploaded by him with personal details along with anonymized analysis of other scans. A clinician will be able to see the clinical history scans of his own patients.

  • A House Cat Smarter than AI

    Yann LeCun is Facebook’s chief AI scientist. He is known as one of the godfathers of deep learning. He joined Facebook in 2013 as the company’s director of AI research. Later, he was named as VP and chief AI Scientist. He is a part of Fundamental AI Research (FAIR) team. He is also a computer science professor at N.Y. university, teaching part-time at NYU Center for Data Science and Courant Institute of Mathematical Sciences.

    In a recent interview at the World Government Summit in Dubai in February 2024, he said that we are far from human-level intelligence. He dismissed the fears that AI models are dangerous. The AI-technology is not even on par with cat-level intelligence. Of course, he is certain that one day AI will surpass human beings, leading to artificial general intelligence or AGI.

    The current systems have been trained on public data. They cannot go beyond that. In future, they are likely to be smarter, and will give information better than the search engine.

    LeCun, the French scientist has won the 2018 Turing Award with Goffrey Hinton and Youshua Bengio for his contribution to artificial neural network research.

    Most advanced AI system today have less common sense than house cat. A cat’s brain has 800 million neurons. If we multiply this figure by 2000, we get the number of synapses (the connections between neurons). That equals the number of parameters in an LLM. Largest LLMs have the same number of parameters as the number of synapses in a cat’s brain. ChatGPT to begin with was powered by GPT-3.5 which had 175 billion parameters. GPT-4 is said to have 220 billion parameters. Maybe, we are at the size of a cat. Still the systems are not as smart as a cat. A cat understands the physical world around it. It remembers it. It can plan complex actions. It can do some reasoning. All this it does better than an LLM. It means conceptually we lack something to get machines as intelligent as animals and other humans. A dog’s brain has about 2 billion neurons, and human brain has about 100 billion.

    To get to the AGI level, it may take 10 or may be 20 years. Researchers are overly optimistic.

  • China Awed and Impressed by Sora

    Sora, the text-to-video platform announced by OpenAI in February 2024, has stirred awe and concern in the Chinese landscape. Yin Ye, CEO of BGI Group calls this the Newton moment of AI development. Sora has the potential to disrupt various sectors — advertising, education, entertainment and healthcare. Chinese experts are impressed by Sora’s capabilities to generate natural-looking videos, and seamless integration of text and video.

    This has widened the gap between the USA and China in AI development. China’s LLMs have reached the GPT-3.5 capabilities. Thereafter GPT-4 has been released, and it shows a gap of 1.5 years between the current models of China and GPT-4 released in 2023.

    Some Chinese entrepreneurs are not much impressed by the capabilities of Sora, and remark that it has yet to advance much to understand the world.

    It is known that the US has imposed strict sanctions on the export of semiconductors to China. It prevents China from accessing the cutting-edge technology in AI. China cannot access even GPUs being made Nvidia.

  • Despite Limitations, Sora Is a Game-Changer

    Sora is based on diffusion model structure. According to LeCun, Facebook’s latest V-JEPA (Video Join Embedding Predictive Architecture) is a model that analyzes interactions between objects in videos. It is not generative and makes predictions in representation space. LeCun wants to impress upon us that their self-supervised model is superior to Sora’s diffusion transformer model.

    A model must go beyond LLMs or DMs. Elon Musk also feels that Tesla’s video-generation capabilities are superior to OpenAI’s Sora with respect to predicting accurate physics.

    Sora uses transformer architecture similar to GPT models. The foundation will understand and simulate the real world. It may have used Unreal Engine 5’s generated data to train the model. Jim Fan points out Sora’s learning in neural parameters is through gradient descent using massive amounts of videos.

    Sora, according to Fan, may not learn physics, but manipulates pixels in 2D. It is a reductionist view. It is like saying GPT has not learnt coding but learns sampling of strings.

    Transformers just do a manipulation of sequence of integers (token IDs). What neural networks do is just manipulation of floating numbers. Fan does not agree with such reductionist view.

    Sora may not be able to simulate the physics of a complex scene. It may not grasp cause and effect. It can get confused with spatial details of a prompt.

    Fan describes heavy prompting for Sora as babysitting.

    Of course, there are limitations, but these do not dim the outstanding video quality from Sora. Sora has the potential to disrupt the video game industry.

  • CUDA Libraries in GPUs

    CUDA libraries facilitate the harnessing of GPUs for various computing tasks. They provide optimized implementation of common algorithms and functions. It enables developers to write high-performance apps to leverage the parallel processing capabilities of GPUs.

    The various CUDA libraries are:

    cuBLAS provides optimized routines for basic linear algebra operations (matrix multiplication, vector addition and so on)

    cuFFT accelerates Fast Fourier Transforms (FFTs). It is crucial for signal processing and image analysis.

    cuSPARSE handles sparse matrix computations. It is useful for scientific simulations and ML.

    cuDNN is designed for deep learning. It offers high-performance implementations of essential neural network primitives.

    In addition to these libraries, there is CUDA-X suite. It empowers developers to create apps that run faster. It unlocks the potential in various fields such as AI, graphics and scientific computing.

    Linear algebra libraries are cuBLAS, cuSOLVER and cuSPARSE. Deep learning libraries are cuDNN. It accelerates TensorFlow and PyTorch. Data science libraries are cuFFT and cuRAND for data analysis and ML. Computer vision libraries are cuFFT and cuBLAS. These accelerate image and video processing. Other domains covered are nVJPEG, NCCL and NPP.

    These libraries bridge the gap between hardware capabilities of GPUs and the software applications that leverage them. There are significant performance gains.

  • OpenAI’s Sora: How Open?

    On February 15, OpenAI announced the red teaming of its text-to-video platform called Sora. It can create up to a minute-long video of high quality. It has caused concern to stock video producers, startup founders, actors and filmmakers.

    OpenAI has not revealed about the data used to train Sora. When Facebook released its text-to-video model in 2022, it used 10.7 million Shutterstock videos and 3.3 million YouTube videos to train it. This information enables researchers to check for bias, and creators to know if their work is being exploited.

    It is speculated by some gaming and AI experts that Sora could have been trained on the underlying physics engines of computer games. It is not sure since OpenAI will not disclose the information, as it did with its other AI models.

    Since GPT-4 was tested for about six months before its release, Sora could also take the same time. It could be released in August 2024, just 3 months prior to the elections in the US.

    Deepfakes of politicians generated by AI could affect the elections. OpenAI uses safety filters to keep their models away from violence, sexual content and hateful imagery. It is still impossible to know whether these AI systems will not be misused until they are in the market. Sora is likely to make a bigger impact judging from the use of ChatGPT by millions of people. It will put video generation capabilities into the hands of millions.

    It is obvious that the secrecy OpenAI maintains about its new products is to keep ahead of the competitors. OpenAI is also enhancing its computing power to train its models — this strategy seems to have worked. This is the reason why Sam Altman is seeking trillions of dollars for a chip making unit.

    OpenAI’s stated goal is to attain AI that surpasses our own capabilities. It puts products for the public to try out the transformative tech to reach that goal. That is the open part of OpenAI, while keeping the tactics part closed.

  • AI Chip Venture

    Softbank Group founder Masayoshi Son is interested to set up a chip venture that can compete with Nvidia. The unit will make chips essential for AI.

    The capital investment of Softbank will be $30 billion, and another $70 billion will be possibly raised from institutions in the middle east. The total project cost would be $100 billion, one of the largest investments in AI arena since the advent of ChatGPT, dwarfing Microsoft’s contribution of $10 billion in OpenAI.

    The project has been code named Izanagi, the Japanese God of creation and life, partly because the name includes the initials AGI which stands for artificial general intelligence.

    It is not clear which company or companies will play a role in developing technology that can challenge Nvidia the leader in high-end AI accelerators. There could be collaboration between Softbank and Arm Holdings, a chip design unit. Arm’s CEO Rene Haas is a member of the Board of Directors of SoftBank. He is also a technology expert. He has been advising Son on the project. They would like to focus on compute, power efficiency and energy to develop AGI.

    There is a history of Son changing his mind abruptly, and he keeps on throwing many ideas and technologies while meeting people. Son is, however, unwavering in his enthusiasm for AGI. He is convinced that AGI will be real in 10 years.

  • AI Challenge for India

    India too wants to make its place in this age of generative AI. However, there are two formidable challenges — there is lack of hardware accelerators suited for AI requirements and a shortage of talent.

    LLM training is very capital intensive. Here shortage of talent is a big issue. There are only a few people in the world who really know how to train LLMs. There are issues of curating data and carefully running evaluation metrics. It must be ensured that the models are generalizable. Very few people in the world know how to do all this. Most of them are US-based and are working in a handful of companies — OpenAI, Facebook, Anthropic, DeepMind and Mistral. The knowledge of training a model that has GPT-4 capability is concentrated both individual-wise and geography-wise.

    Computing capacity (compute) is another challenge in building a large AI system. Then there are issues of algorithmic innovations and datasets.

    AI accelerators are specialized automatic data processing systems. They accelerate computer science applications, especially artificial neural networks, machine visualization and ML.

    India has to set up hardware accelerators, and then train them. This is a difficult task. India alternatively can think in terms of ‘inference hardware’. Inference is the process of running live data through a trained AI model to process the data. AI hardware is coupled with software. Nvidia’s GPUs are coupled with the CUDA libraries, needed to make the good use of hardware. It is a big advantage for Nvidia.

    India can use open-source models — Llama from Facebook. India can take these base models and try to build on top of them. It can bootstrap off them.

    This is the summarization of thinking of Arvind Srinivas, CEO of Perplexity AI who is making waves in Silicon Valley.