Blog

  • Context Windows

    An LLM considers a specific amount of text or input while generating a response to a prompt. This is called its context window. Essentially it means the model’s memory. It allows it to analyze certain range of preceding text so as to get a better feel of the context to produce more relevant and coherent responses. A context window of 2000 tokens would allow the model to consider the previous 2000 words of text while generating a response.

    The context window size determines a model’s capabilities and performance. A larger window, of course, lets a model generate a more comprehensive answer. However, such large windows also affect computational complexity and memory requirements.

    The techniques used by LLMs to manage the context windows are tokenization, positional encoding (position of each token within the context window), attention mechanism (weights are assigned to tokens based on their relevance to the current task), adaptive context windows ( dynamically the conrext window is adjusted to the input and task).

    The choice of context window depends on the specific application, the accuracy desired, and the fluency expected. Machine translation, and question answering require large context windows.

    GPT-3 has context window of 2048 tokens and GPT-4 has a context window of 32,768 tokens.

    GPT-4 Turbo will see more data. It has a context window of 128K. This equals 300 plus pages of that in a single prompt. The larger the context window, the more is the scope for LLM models to understand the question and offer more thought-out and deliberate responses. Previous versions of ChatGPT had context windows of 8K and 32K.

    Context window is a fundamental aspect of LLM architecture. It enables the model to process the language and understand it comprehensively and in a context aware manner.

    A large context window , apart from higher computational cost, is more likely to generate hallucinations,

    As the LLMs evolve, context windows will play a significant role in terms of their capabilities and applications.

  • Funding for AGI

    Sam Altman says in order to create intelligence equalling human intelligence in machines, he is thinking of asking Microsoft to back this pursuit financially. It is known that the costs of building the sophisticated AI business models are punishing.

    Microsoft values OpenAI at $29 billion.

    Still, OpenAI has a long way to go and involves a lot of compute between the present and AGI goal it has set for itself.

    Though OpenAI earns good revenues, it could not show good profits on account of training costs. Its tie-up with Microsoft is a mutually beneficial relationship.

    OpenAI has already introduced GPT-4 and it wants to build on that. GPT Store Should have the best apps. Ultimately, OpenAI wants to set up an Apple type Store.

    Altman has hired Brad Lightcap as Chief Operating Officer. Altman thinks on superintelligence and compute power required for this.

    The company is working on GPT-5 which is the next-generation of AI model. No time-line commitment still for GPT-5. It will require more data to train on. Some of the data will come from public sources ( say Internet ), and some from private sources ( say companies ). Private data that is not easily accesible online could be used. It is a matter of speculation as to what capabilities GPT-5 will have. It is just a guessing game.

    Models are trained using Nvidia’s H100 chips. The chip costs $40000-a-piece. There could be competitive chips from rival companies.

    Though OpenAI’s models have achieved success, the company is after building AGI through LLMs and through a lot of other pieces on top of them. OpenAI focuses on LLMs, and competitors are working on alternative strategies to advance AI.

    Language is the best way to compress information, and is therefore good for developing intelligence. Several researchers in the field miss this point. Between what we have and what we want — AGI, we require systems to make fundamental leaps of understanding. The issue is to find the missing link.

  • GPT-5

    The idea of GPT-5 is not yet concrete. They have to figure out a lot of things before training GPT-5.

    There are difficult scientific issues and there are many more tasks the model is expected to execute. GPT-5, therefore, may require more compute power.

    GPT-3 was good at writing text. GPT-3.5 was good at 5-8 tasks. GPT-4 is good at dozens of tasks. GPT-5 will be good at most things one might want it to do.

    Maybe, the current AI will be antiquated by 2024. It is a way to provoke the next innovation round. It is rumored that the first version of GPT-5 we might get in 2024.

    OpenAI knows that competitors will have to work hard to match the success of ChatGPT and API services. What they propose to build may change the digital eco-system. It is rumored that multi-modal Gobi model could be GPT-5. GPT would have a self-correction capability and a small degree of self-awareness. It could align with the regulatory plans of the government.

    Google too is expected to put Gemini which is comparable to GPT-4, but is more updated. GPT-5 may compete with Gemini.

    Altman, however, declares GPT-5 is yet far away. It is not ready for training. It is a work-in-progress.

    Bill Gates does not expect GPT-5 to be far more superior to GPT-4.

  • LLMs : Pros and Cons

    The way we interact with software has changed a great deal by what we call large language models. LLMs are where deep learning and computational resources combine.

  • AI Safety Summit, UK

    A two-day AI Safety Summit is being held in the UK’s Bletchely Park campus on November 1 and 2, 2023, and around 1000 world leaders, tech executives, academics and AI researchers will participate at this gathering. Significantly, the venue is the home to the code-breakers who assured victory during the World War II.

    Discussions will revolve around how best AI can be used to benefit mankind, e.g. new drug discovery, climate change remedies and how best we can avoid AI risks

    There will be discussions about extreme threats posed by AI of most advanced form, called the ‘ tip of the spear’ by Hassabis of DeepMind.

    The UK PM will be in conversation with Elon Musk of X on November 2, 2023.

    Facebook believed in fast innovation and quicker growth, and therefore Zuckerberg set its motto ‘move fast and break things.’ that is what described for some time the disruption Big Tech caused.

    As Hassabis puts it the old maxim should be put aside and focus should be on providing great services and applications. Hassabis cited AlphaFold, an AI programme to advance the discovery of new medicines by predicting the structure of every protein in the body.

  • Generative AI : Global Regulation

    It is not right to allow generative AI to proliferate without any restrictions. Most of the people agree with this proposition. Even Sam Altman from OpenAI and Geoffrey Hinton, the Godfather of deep learning are on the same page. Geoffrey Hinton alerted the world about the existential threat generative intelligence can pose to the world. Altman warned the US lawmakers to create some parameters for AI so as not to harm the world.

    The US president’s recent executive order expects the AI companies to establish new standards for AI safety and security. The European Union too wants to frame laws to regulate AI, but is unsure about the way the foundational models can be regulated. Are the foundational models to be regulated only during testing and release? Or should they be monitored even post-release?

    India has lagged behind in passing the digital laws. The Digital Personal Data Privacy (DPDP) Act seems ineffective to regulate both AI and generative AI.

    Big Tech is expected to follow ‘responsible and ethical AI’. Governments across the globe expect the companies to ensure AI safety and to further ensure that rogue models do not get developed. It is naive to expect the companies to do this. The onus lies with the governments.

    It is not clear to the law-makers what exactly should be regulated and controlled in generative AI. Mostly, they trust Big Tech to self-regulate.

    However, law-makers cannot ignore the open source generative AI models. These models provide the tools to developers to develop their own models.

    Are copyright and privacy laws enough to deal with generative AI? In addition, can local laws regulate a world-wide technology? There should be a global consensus on this issue.

    It is necessary to understand the fundamentals of AI models to work out regulations. Generative AI models are called foundational models. They are neural models which are pre-trained on a massive corpus of data. The data is scraped from the internet and other sources such as books, periodicals, research papers and so on. These foundational models are hungry for more data so as to remain effective.

    To keep these models functional, the companies use web crawlers or data scrapers. These essentially are computer programmes which go through the web-sites to extract the data. Search engines have been using web crawlers since long, but in generative AI, they continue to gather data for training the models. There is absence of laws about data scraping. Older laws such as copyright and privacy put some curbs on data being gathered. However, they are ineffective against massive scraping of data. And the scraping also extends to vast amount of personal data on the web. The crux of the matter is how to regulate this data collection for the foundational models.

  • Artificial Intelligence or Artificial Consciousness

    The capability of AI systems, especially that of ChatGPT, leads to the proposition that soon the artificial intelligence system will have consciousness. Still, this could be underestimation of the biological consciousness the human beings have.

    It is so amazing to get the kind of responses from the AI model. Does it work only on the strength of algorithms and smart recognition of patterns? The quality of output generated by ChatGPT sways us to believe that it emanates from a conscious system.

    Though the responsiveness of ChatGPT appear to be from a conscious system, we are drawing a wrong inference. Neuroscientists know that the model lacks the sensory contact that a live organism has with the world around it. In addition, the architecture of thalamocortical system of the brain is different.

    Can consciousness emerge whenever intelligent systems, whether biological or artificial, become large enough? The conversation we have with a bot seems so real. It is tempting to attribute consciousness to it. However, consciousness is what we perceive it to be. It is in the eyes of beholder.

    These models lack feelings and emotions. Besides, they are not cellular organisms. Their nodes are not exactly what the neurons of the brain are.

    Prompt engineers provoke ChatGPT to the extent that makes us feel that it is a self-aware entity. The quest will continue, but never ever ChatGPT will be able to prove that it is a conscious entity since surely it is not.

    Of course, in fullness of time, they will be able to do anything that humans can do.

    In 2022, Blake Lemoine, a Google engineer, argued that LaMDA chatbot was conscious, though most other experts disagreed with him. ChatGPT lacks empathy and emotional entelligence.

    Stephen Hawking imagines AI that improves and replicates itself. Perhaps, this could be a new form of life. He warned that these superhumans would use gene editing to take over. Their expertise in genetic engineering could let them surpass their fellow beings.

    Evolution theory has an altogether different trajectory while conscious living organisms emerged. There are no parallels in AI systems of such evolution.

    We have to cover a long way before we understand consciousness and hence a long way to have conscious machines.

  • Executive Order to Regulate AI

    AI’s use by federal agencies in the US will be governed by an executive order the US president has signed. The order has been released on 30th October, 2023 (Monday). It is a step in the right direction as the government is a top customer for tech companies.

    Since the arrival of generative AI, it is expected that the developers of AI systems should report the training and testing processes to the federal government.

    As we have already written here, executives of tech companies have already alerted the society at large about the potential risks of the AI systems.

    The presidential order aims to promote AI with a government-wide strategy.

    Lawmakers are already working on establishing guardrails while promoting AI.

    The government wants to prevent bias in hiring while using the AI systems in recruitment and selection.

    The draft order wants to encourage the use of AI to prevent unwanted calls.

    There should be safeguards on the part of federal agencies while using citizens’ data. Privacy is a prime concern.

    Big tech has given a voluntary commitment to use AI responsibly.

    AI has raised both concern and acclaim. The US president has signed a lengthy executive order. Big tech will be directed to put powerful AI models through safety tests and submit results to the government before their public release.

    The power to vet technology will apply to the future systems, and not to those which are already on the market. The power refers to the national or economic security risks.

    Infrastructure will be created to watermark contents often referred to as deepfakes.

  • Project Indus by Tech Mahindra

    Tech Mahindra is an IT major in India, and the fifth largest software services firm in India. They have launched Project Indus for LLMs of Indian languages. A 15-member Project Indus team plans to release first the LLM for Hindi and its 37 dialects. The model will be ready by December 2023 or January 2024. It is an attempt by the company to build a foundational model on Indian languages.

    The team has collected data in Hindi and related dialects over the past two months. It is a corpus of 1.2 terabytes of data. They would develop a refined web text from this by November, 2023. It will be open source.

    Later, the work will start for other languages. AI’s growth is verticalised in future. These LLMs are the base. Later there can be domain specific models — rural finance models, agritech models, healthcare models and so on.

  • ScriptGPT

    A TV channel receives feedback from the viewers, but it is not continuous. There is a time lag of three months and thus the ratings get affected and in turn the advertising revenues. Zee, therefore, decided to leverage the capabilities of its Technology and Innovation Centre in Bangalore.

    Zee developed ScriptGPT in collaboration with OpenAI using 1.3 million variables. These variables include character archetypes, plot twists etc. It was trained on 42000 episodes from channels across Hindi General Entertainment landscape. By April 2024, it would have ingested 100,000 episodes. It also received BARC data, brand track data, content and audience research. Its forecast accuracy is 90 per cent. It has achieved this after multiple iterations.

    While it is a challenge to source copious amount of data to build its intelligence, there is an issue of the copyright of the data. They have a team of lawyers to vet the stuff they feed to the model. Of course, the IP rights of the output of the model is still a moot point. The jurisprudence is still evolving.

    After a serial gets screened for a few episodes, ScriptGPT can be questioned about the suggestions to alter the show so as to improve its ratings.

    ScriptGPT facilitates the understanding of characters, stories. plots and twists that the audience seeks.

    There are efforts to use the model to generate a full movie script. That is worrisome for the script writers. However, they can upskill themselves in AI technology so that their jobs do not get affected.

    ScriptGPT makes you alive to what works and why it works. It brings the organisation closer to the audiences.

    Media firms can use AI tools to edit and dub the videos and films.