Blog

  • Meme Coins

    Meme coins are a type of cryptos that are inspired by an internet meme or viral image. The underlying technology is blockchain. As they are cryptos, they can be mined. They are available for buying-and-selling at crypto exchanges.

    Meme coins are generally associated with culture and community. Some are created for fun or recreation. Cryptos, as we know, have technical features, whereas meme coins are known for their association with a meme.

    Some popular meme coins are Dogecoins (DOGE), Shiba Memu (SHMU), Pepe Coin(PEPE), Wojak Token (WOJAK).

    The key difference between meme coins and cryptos such as Bitcoin and Ethereum is the utility. Meme coins are meant for investors to make fast money. Cryptos such as Bitcoin have a limited supply of 21 million — once it reaches the limit, people can longer mine Bitcoin. This drives the demand and price. It becomes an expensive buy. This is not the case with meme coins. Dogecoin has an unlimited supply. Shiba Inucoin has a supply of 1 quadrillion. A Shiba Inucoin is priced on July 4, 2023 at $0.068 For $100, you can buy 1463 Dogecoins. The proposal is attractive for young generation. A few dollars investment buys you thousands of meme coins.

    Meme coins have significant presence on social media. Prices are driven by their popularity.

    As meme coins do not have any fundamental economic or business use, there is price volatility. It is vulnerable to sentiments. At times, creators or investors of meme coins disappear with investor’s money. It is called rug pulls.

    Governments initiate efforts to rein in some meme coins.

    Despite inherent risks, some meme coins have a strong following. Meme coins have an uncertain future.

  • Time and Cost Reduction in Pretraining LLMs

    Training an LLM is very costly — ranging from $10 million to tens or hundreds of times costlier than that. Thus cost-wise LLMs are not affordable for smaller organisations or research/academic groups. It is necessary to revisit the current optimization methods of the LLMs. Standford researchers started working on this. The aim was to curtail the training time of these models to half. There are millions or billions of parameters. These parameters have curvature — the maximum achievable speed these models reach as they progress towards the final goal of LLM pretraining. Curvature in short is the workload of parameters in LLM model. It is for this reason that while optimizing LLM pretraining the curvature estimation step is foregone.

    Researchers noticed a possible inefficiency in previous methods which used parametric curvature estimation. The curvature estimates were updated at every step of optimization. Thought was given to the proposal whether the process can be improved upon by decreasing the number of updates. The idea was tested by designing Sophia to estimate parameters’ curvature every ten steps. That was a winning proposition.

    Another trick tried was clipping. Inaccurate estimation of curvature increases the workload. Clipping prevents that by setting a threshold or a maximum curvature estimation.

    Sophia was used to pretrain a small LLM on par with GPT-2. It reached optimization in half the number of steps and half the time. It means substantial improvement in pretraining and massive cost reduction.

    In future, the researchers would like to experiment with a larger LLM using Sophia, and other models such as CV and multi-modal models.

    As Sophia is open source, the experiment can be carried forward. Sophia is a new approach developed by Stanford researchers to train LLMs. The other optimization algorithms previously used are Stochastic Gradient Descent (SGD), RMSProp Optimisation and Adam.

  • Use of Calculus in Neural Networks

    Calculus helps us understand the internal workings of different ML algorithms. One application of calculus in ML is the gradient descent algorithm along with backpropagation. This is used to train a neural network.

    Backpropagation involves taking the error rate of forward propagation and feeding this loss backward through the layers of neural network. The aim is to fine tune the weights. This is the essence of neural net training.

    The method calculates the gradient of the error function with respect to the weights of the neural network. It reduces the error values in randomly allocated weights and biases in such a way that it produces the desired output.

    The gradient of the loss function for a single weight is computed by the chain rule. It is computed for one layer at a time.

  • Threads

    Zuckerberg has released Threads on 6th July, 2023 as a competitor to Twitter. Soon it will be widely available. At present, the app is not available in European markets. It is available in India.

    Twitter shows collective community mood through trending topics Threads lacks this. It has yet not enabled hashtags, a key tool for organising and discovering basic information and social movements. Spaces is the audio chatting tool of Twitter. Threads may lack this. Perhaps, soon these features too will arrive on Threads.

    Threads, within two hours of appearing, garnered two million people. And within seven hours of launch, it collected 10 million sign ups. However, it is to be seen what its staying power is. Twitter has 238 million people daily active users.

    Threads may not have a messaging system. Instead, it may encourage private chats with Meta’s other apps. People need not have an extra-box to check.

    Currently there is no control of what people see in the feed of Threads. Twitter has the option of restricting posts to people you follow. Threads may carry excessive ads, and be a bait for influences and engagement. However, Meta too is changing. It has promised to make Threads compatible with ‘fediverse’, an open protocol and access it all on other app.

    Twitter faces competition from a formidable rival within seven months of Elon Musk taking control of it.

    Threads has been built upon Instagram platform with its massive user base. If the users adopt Threads, advertisers will follow. Currently, there are no ads, and the company will think about monetisation after it achieves 1 billion users.

    Threads is a stand-alone app. However, users can log in using their Instagram credentials. It is available in 100 countries. The posts can be up to 500 characters long, and can include links, photos and videos up to five minutes.

    Public figures and governments may take a while to warm up to Threads since Twitter is an established town square. To begin with Threads may not have social and political impact. Twitter has earned a place as an opinion shaping platform that sets the narrative. It may retain this position in the foreseeable future.

  • Controlling Super-intelligent AI

    AI of future, a decade hence, could be super-intelligent. OpenAI has formed a team led by Ilya Sutskever (chief scientist) to develop ways to steer and control such a system. Ilya Sutskever is a Russian-born Canadian computer scientist who cowrote the seminal paper Attention Is All You Need in 2017. He cofounded OpenAI.

    AI of future may not be benevolent, thus making it necessary to exercise control over it. The system could go ‘rogue’.

    At present, we use for a technique to align AI such as reinforcement learning from human feedback. This depends on human beings ability to supervise AI. How will they supervise AI that will be smarter than them?

    We, therefore, need superintelligence alignment. The team formed will have access to 20 per cent compute the company has secured to date. The team will address the technical challenges to control superintelligent AI over the next four years.

    They want to build ‘a human-level automated alignment researcher’. The aim is to train the AI systems using human feedback. Alternatively, one AI system can be trained to evaluate other AI systems. The ultimate aim is to build AI that can do ‘alignment research’ — alignment research refers to AI systems that achieve desired outcomes do not go off the rails. The assumption is AI can do the controlling job better than the humans can.

    The future AI systems should be better aligned with humans. The alignment research will be reviewed by human researchers.

    However, such alignment have potential to scale up biases and vulnerabilities. Is alignment related to engineering only? Still it is worth a try.

    The team will share the results with others.

  • Machine Learning Frameworks

    To deploy AI and ML, the machine learning frameworks play a vital role. They are tools, libraries and resources which facilitate AI implementation. Here we make you aware of 10 ML Frameworks.

    1. TensorFlow : It has been developed by Google Brain. It is both flexible and scalable. It facilitates the deployment of ML models across the platforms and devices. It is supported by APIs.

    2. PyTorch : It has Python interface and facilitates computational graph. It is developed by Facebook’s AI Research Lab. It is useful in building and training deep learning models.

    3. Keras : It is based on TensorFlow and is useful for building and training deep learning models. It allows experimentation with different architectures and hyper-parameters. It is preferred both by the novices and experienced professionals.

    4. Scikit-learn : It is a library in Python. It has a good collection of algorithms and tools for data preprocessing. It helps feature slection and model evaluation.

    5. Microsoft Cognitive Toolkit (CNTK) : CNTK is a deep learning framework developed by Microsoft Research. It supports distributed training across multiple GPUs and machines. It suits large scale projects.

    6. Theano : It is a Python library. It facilitates computation and optimization of math expressions used in deep learning. It is used in building and training neural networks.

    7. MxNet : It is a deep learning framework supporting multiple programming language (Python, R and Julia). It facilitates building models and deploying them across platforms and devices. It has modular design.

    8. Caffe : It is a deep learning framework. It is suited for image classification, object detection and segmentation. It is preferred for CV.

    9. Torch : It is scientific computing framework. It is used to build neural networks and image processing.

    10. xGBoost : It stands for Extreme Gradient Boosting — optimized implementation of gradient boosting algorithms.

  • AI Is Not Perfect

    AI has evolved into generative AI where it can generate new content — as text, images, audio, video etc. Further, the latest generative AI algorithms are based on deep reinforcement learning. What does it mean? It means that unlike previous generation algorithms, we do not have to teach the strategies to accomplish a task. What we feed are only the basic rules, and a historical data set. The algorithm learns on its own the most optimal strategy. Google’s AlphaGo beat the champion of the game Go, a board game. However, the system was not trained for the particular strategies to achieve this. It learnt by studying older matches, and played thousands of games against itself.

    The current generation of AI algorithms work like a blackbox. They produce the desired outcomes, but there is no explanation of how they arrived at these outcomes. Even the developers who build these models do not know the exact decision making process. A facial recognition software recognises a face correctly, but it is difficult to say how exactly it does it.

    An AI system may predict that a person is likely to suffer from a stroke based on the scans, but we do not know the exact process behind this conclusion. An autonomous vehicle may choose to collide with a person in order to avoid a collision with a truck. The logic is obscure.

    An AI system may not be free from bias. A facial recognition system have shown poor recognition of black females on the younger side. Certain words in a resume may make an algorithm used in selection biased. Bias enters into algorithms through training data. It can be based on historical or social inequities.

    We assume that AI decisions are objective and fair. Unless we know how they are arrived at, we cannot accept them without an informed discussion.

    AI systems are known to hallucinate. AI hallucinations are defined as generation of nonsensical or unfaithful output. These hallucinations can be intrinsic or extrinsic. The intrinsic hallucinations are a result of the contradiction between input and output. Extrinsic hallucinations occur when the output is not supported by the input. Hallucinated output is presented with confidence and fluency.

    There are reasons for such flaws. Encoders make wrong correlations. There could be biases. Decoders attend to wrong inputs or with higher randomness. Some inaccuracies cannot be attributed to the knowledge and intention of AI or its developers.

    AL technologies can be classified from low-risk to those which are prohibited. AI legislation should deal with technology accordingly.

  • Something to Ponder About

    We have drawn some striking and thought-provoking sentences from different sources to stimulate you about the recent strides of generative AI.

    *AGI — artificial general intelligence is an autonomous system that surpasses human capabilities in most tasks.

    ChatGPT need humans, though its creators obviate this need.

    It is an astonishing paradox.

    *The pantheon of global greats who have shaped humanity include Bill Gates, Steve Jobs, Elon Musk and Sergey Bin and Larry Page. Sam Altman could also be included in this pantheon. Of course, he has greater responsibility, as he is both Pope and reformer. It is difficult to decide where Pope and reformer begin and end. The roles often change depending on the context, countries and eco-systems.

    *User safety, user privacy and AGI’s capabilities all need to move forward together.

    *Big Tech and Governments are adversarial. For the first time, some one like San Altman says, ‘don’t, leave us alone. Please regulate us.’ A collective action with myriad political and economic complexities is unlikely to happen.

    *Perhaps, there could be a beginning if there is an agreement over a universal ‘Labelling Framework.’ This will distinguish machine generated content from human generated content.

    Just as we label our foods and medicines, we could as well label AI-generated content too.

    *What about the accountability of the firms who build on top of AGI?

    *There should be agreement on basic set of metrics that can be reported and regulated, without waiting for a global AI Act or universal agreement, which can never happen.

    *AGI materialises in a decade or two.

    Questions: Sharing of AGI profits. Sharing of AGI access. Sharing of AGI benefits. Structure and governance of the company. Allocation of governance across society.

    *No-profit startups. However, there cannot be charity for ever. Capped profits is a flawed concept. Start as a startup and convert it into C-Corp when it is on the cusp of being cash flow positive.

    *ChatGPT and governments are caught in a mating dance. The governments want to control for the lowest common denominator. There is lobbying against application programming interface (API-led AI) economy.

    Open source models will commoditize free options such as Mosaic becoming cheaper and as capable as ChatGPT which has its GPU limitations.

    *OpenAI’s jukebox can generate music samples while Stable Diffusion and Midjourney can churn out illustrations, probably at rates cheaper and faster than human musicians and artists.

    *ChatGPT aquired 100 million active users in January 2023, two months after its introduction towards the end of November 2022. It is unprecedented.

    *Generative AI could displace jobs at unparalleled scale.

    *As AI depends on data, it would increase the users’ risk to data exposure. The technology can be weaponised.

    *May 30, 2023. A group of 350 AI executives and experts signed a statement and published it warning of the risk of extinction from AI. They said mitigating it ‘should be a global priority.’

    *Open letter by the Future of Life Institute in March 2023. Requested a moratorium on training powerful AI systems. It garnered 1000 + signatures.

    *Pragmatism suggests that the balance could be tilted in favour of innovation and growth without being naive about AI’s potential pitfalls.

  • vLLM : Open Source LLM Inference and Serving Library

    AI has advanced because of the NLP capabilities of large language models (LLMs). They interpret vast amount of existing data and generate human-like texts. Despite their tremendous capabilities, there is significant challenge they pose — their computational inefficiency. They perform lethargically even on the most powerful hardware. These models are built on millions or billions of parameters. Hence, the requirement of computational resources, memory and processing power is huge. One may not have access to such resources always. Besides, LLMs have pathetic response time, making them unsuited for real-time or interactive apps. The issue is how to address the challenges, so as to make LLMs widely accessible.

    Researchers from University of California, Berkeley worked on this and developed vLLM. It is an alternative to LLM that is simpler, faster and economical. Currently they are using the library to power models. Instead, they can use vLLM as the backend. It can now handle peak traffic efficiently 5 times more than attended so far. There is use of limited computational resources. It reduces the operational cost. vLLM supports HuggingFace transformers.

    Research indicated that memory-related issues slow down the performance of LLMs. These use input tokens to generate attention key and value tensors. These are cached in GPU memory. Thus subsequent tokens are generated. These dynamic key and value tensors. These are cached in GPU memory. These dynamic key and value tensors (KV cache) occupy substantial memory. Their management is cumbersome.

    The innovative concept of PagedAttention is introduced to resolve this challenge. This is an attention algorithm. What was paging in OS has been extended to LLM serving. PagedAttention is a flexible approach to manage KV tensors by storing them in non-continuous memory spaces. This obviates the need for continuous long memory blocks. Using a block table, these blocks can be independently retrieved during attention computation. It results into efficient memory utilization. There is less memory waste, and optimal memory usage. PagedAttention can also batch five times more sequencess together. There is good GPU utilization and throughout. There is also efficient memory sharing.

    vLLM handles effectively the management of attention key and value memory through implementation of PagedAttention mechanism. It also integrates with HuggingFace models. The library can be installed by a simple pip command. There is availability for both online serving and offline inference. vLLM, an open source LLM inference and serving library accelerates HuggingFace transformers by 24x.

  • Linear Algebra in Neural Networks

    In applied maths, there is a subject of linear algebra that is a prerequisite for machine learning. Neural Language Models (NLM) address the n-gram data sparsity issue through parameterization of words as vectors — called word embeddings. These are then used as inputs to a neural network. The parameters are learned as a part of the training process.

    Machine learning model output is looked at as decision boundaries. Linear models have a straight line as a decision boundary. Non-linear models have curves as decision boundaries.

    Neural networks too follow the same mathematics. It is a logistic regression, but these networks classify more powerfully than logistic regression. A minimum neural network has one single hidden layer, and represents any function that can be represented by logistic regression.

    Word embeddings are useful in finding nearest neighbours in the embedding space. The embeddings could be used as an input to supervised learning tasks. It creates a mapping of discrete variables, e.g. words to a vector of continuous variables. It also tackles the curse of dimensionality.

    Word2Vec creates word embeddings — Continuous Bag of Words (CBOW) and Skip-Gram Model.

    CBOW model learns to predict by the context. It tries to maximize the probability of the target word by looking at the context.

    Skip-gram model is designed to predict the context of the word. Both these models are used with neural networks with an input layer, a hidden layer of word embeddings and an output layer. Skip-gram models have larger sizes due to the use of more parameters in predicting multiple target words.

    GloVe or Global Vectors for Word Representation is an unsupervised learning algorithm for obtaining vector representations of words. Word2Vec is a shallow neural network to create vectors. GloVe uses a global matrix factorization technique. Word2Vec is for larger tasks, while GloVe for smaller ones. Word2Vec is slower to train, GloVe is faster.