Blog

  • New ChatGPT Based On GPT-4o

    OpenAI was to hold a livestream conference on Monday,13th May 2024 at 1700 GMT to demo some updates to ChatGPT and GPT-4. This was announced on X. OpenAI was planning to do this before the Google’s I/O development conference.

    However, OpenAI decided to delay this. As we know ChatGPT wowed the world with its human-like written content and top-notch software code. Soon after its advent in late November 2022, it became the fastest application to reach 100 million active users. However, the traffic to ChatGPT fluctuates and is now again returning to its May 2023 peak. OpenAI has to expand the user base of ChatGPT, its flagship product.

    GPT-4 will now be called GPT-4o, when 0 indicates its omnichannel or multi-modal.

    OpenAI’s ChatGPT can receive and respond to voice commands, images and videos. GPT-4o juggles audio, images and videos faster than previous version of technology.

    The app is available, free of charge, for both smart phones and desk top computers. It is a way to look at future of interaction between humans and matachins.

    It is a way to combine chatbots with voice assistants, e.g. Gemini chatbot with Google Assistant. OpenAI will share the technology with users over the coming weeks.

    The new app cannot generate video but can generate still images that represent frames of video.

    ChatGPT has already demonstrated that machines can handle requests more like people. Previously, it was a patchwork of three different AI technologies — one that converted voice to text, one that generated a text response and one that converted this text into a synthetic voice. The new app based on a single AI technology GPT-4o can do all this. It is a more efficient technology. The company can afford to offer it to users for free.

  • Generative AI in Education

    LLMs are in a position to provide answers to research questions on varied subjects and create images if necessary. ChatGPT and its other counterparts have fared well at various examinations. They are wonderful tutors. They can change the way the syllabi are designed and the students learn and get assessed.

    The big question, however, is whether generative AI can change the way the education has been imparted for the last so many years. Though it can, we have to wait for this to happen. There are many rough edges to iron out, and guardrails must be set up.

    Gen AI is being used to create content, to learn about any topic and to stimulate creativity. Generative AI assist in brainstorming and offering feedback. These tools help teachers to personalize education.

    Generative AI pose certain issues — there could be lack of coherence, accuracy and reliability. It could not be used for critical tasks. The models do hallucinate while giving answers confidently. There are issues of model bias. There are copyright issues. There are privacy issues.

    Besides, LLMs are expensive. There is requirement of GPUs for processing large volume of data. It requires intensive training. It is to be seen how academia will be able to bear the costs.

    There could be a division of digital haves and have nots. It is an issue of access.

    It is necessary to tread cautiously.

  • AI Implications

    Madhumita Murgia has written a book Code Independent exploring the grey and murkier areas of AI. She tells the story of data labelers of Kenya who contribute their labour to train the algorithms of self -driving cars. There is a story of a mother whose child is added to the list of potential criminals. The addition is not by a human agency but a machine that uses facial recognition technology with all its racial bias. The book goes beyond Silicon Valley. The book points out the global north-south divide.

    Big Tech elite in Silicon Valley are paid hefty sums of money while the mundane monotonous repetitive task is outsourced to the poor populace of the developing world. It is referred to as data colonialism. This term was first used by academics Nick Couldry and Ulises A. Mejas.

    AI jobs in Kenya and Philippines are handled by data labelers who could afford school education for their children out of their earnings for the first time or could avail of healthcare for their parents. They are doing digital jobs (maybe better than physical jobs in not so conducive conditions). The governments see these as employment generation. AI should bring prosperity to everyone, but there is no such thing happening in global south. AI still is a minimum-wage job. The workers sign an NDA : no-disclosure agreement. They cannot talk about all this to anyone, cannot unionize, and at times do not know who their employer is.

    There is hype around AI as companies feel FOMO — fear of missing out.

    People should understand the implications of this technology. They should understand the dynamics of change in the tech industry. Third, people should understand how AI affects other industries.

  • Microsoft’s Hedge Against Rivals

    Microsoft research division is 30 years old and is manned by esteemed scientists who have won Turing Awards and Fields Medals. Still the CEO Satya Nadella is concerned that the division is falling behind Google on AI research.

    In order to hedge against the research of the rivals, Microsoft is working on an AI model MAI-1 which is large enough to match the models of OpenAI. This confirmation comes in a LinkedIn post from Kevin Scott, Chief Technology Officer, Microsoft.

    Microsoft was careful about not missing out the work being done at OpenAI and took heed of Kevin Sott’s communication that both Google and OpenAI can process human language in ways Microsoft could not easily replicate. That led to Microsoft’s initial $1 billion investment in Open AI. Microsoft has invested more than $10 billion in OpenAI.

    The CEO of Microsoft, Satya Nadella diversified Microsoft’s business in the last 10 years, rather than relying on a single product Windows operating system. In 2014, he pushed Microsoft into the cloud business, gaining a market share of 20 per cent of the global cloud-computing market.

    Microsoft has an observer seat on OpenAI’s board, but still if OpenAI decides to pull the rug out under the feet of Microsoft, it should be well-protected. OpenAI may not share the benefits of AGI with Microsoft, as and when it is developed. That makes its position precarious.

    Microsoft has therefore, decided to strengthen internal AI teams. It has failed when it released its chatbot Tay that spouted racist and abusive messages. The former DeepMind executive Mustafa Suleyman has been roped in to head its AI division.

  • May 13 OpenAI Event

    Tomorrow on Monday, May 13, OpenAI has announced an event to launch something magical. The event is scheduled a few hours prior to Google’s I/O 2024 developer conference.

    No, the event is not launching GPT-5. The event will announce new ChatGPT and GPT-4 updates. There was speculation that OpenAI may introduce a Google-type search engine. OpenAI now confirms on X that they will host a live stream to unveil some ChatGPT and GPT-4 updates. People will love this new stuff.

    The live streaming will be on http://openai.com at 10 AM PT Monday, May 13.

    Some new features will allow the users to ask questions from ChatGPT and receive replies drawing information from internet and providing citations. Experts feel the new updates can provide written answers along with visuals (when necessary). Details are awaited.

  • RNNs vs. LLMs

    RNNs too are neural networks which handle sequential data by maintaining an internal state. Here the input of varying length is processed.

    LLMs lean to predict the probability of a word given its context in a sentence or sequence of words. Such models are GPT-series using transformer architecture to learn contextual representations of words so as to generate text.

    Both RNNs and LLMs handle sequential data, but LLMs do it better because they capture long-range dependencies and understand content more effectively. LLMs do more efficient work because of their attention mechanisms. These allow them to attend to all positions in the input sequence simultaneously. They understand context of each word in relation to all other words in a sequence, irrespective of the fact how apart they are in the sequence.

    Long range dependencies refer to relationships between words (tokens) in a sequence which are separated by so many other words. Capturing such dependencies is crucial for understanding the context and meaning in language.

    The prisoner they have kept isolated in the cell is my brother. Here the word ‘brother’ depends on ‘isolated’ and ‘cell’. All these words are separated by other words. RNNs struggle with long-range dependencies. The reason being RNNs process input sequentially and find it difficult to retain information over long distances in the sequence.

    LLMs can handle large amounts of data, especially with transformer architecture, by doing parallel processing. The processing of all elements is simultaneous. They attend to different parts of the input sequence in parallel. They are thus faster and take less inference time. They are scalable and practical.

    RNNs are the building blocks while LLMs are finished structures built using RNN blocks. RNNs are still in use in Machine Translation, Speech Recognition and Sentiment Analysis.

    In past Google Translate used a type of RNN called LSTM in its machine translation. It was called Google Neural Machine Translation (GNMT). Of late, Google Translate moved on more advanced architectures such as transformers to achieve better translation quality.

  • RNNs: Recurrent Neural Networks

    An RRN is Recurrent Neural Network. It is a type of neural network that can handle sequential data. It is commonly used in tasks like NLP, speech recognition and time series prediction.

    Feedforward networks process each input independently. In RNN networks, the order of input matters. They capture inputs information about previous inputs. the information is retained over a period of time. The decisions are based on both current and past inputs.

    RNNs maintain a hidden state of memory of past inputs. This is updated recursively (it means it considers both current and past inputs). Thus, there is a feedback loop. They thus capture temporal dependencies in the data.

    They handle inputs and outputs of variable lengths. They are thus suitable for text generation, machine translation, speech recognition.

    RNNs face vanishing gradient problem. The gradients become extremely small during training. That makes it difficult for the model to learn long-term dependencies. That has led to more advanced RNNs such as LSTM: Long Short-Term Memory and GRU: Gated Recurrent Unit. These versions are better at capturing long-term dependencies.

    In short, RNNs are good at processing sequential data since they retain memory and capture temporal dependencies. They have limitations and these are overcome by using advanced versions such as LSTM and GRU.

    LSTM’s heart consists of memory cells. They store information over longer periods of time. These memory cells have an internal state which can be updated over time based on the input data and previous states.

    There are three types of gates in LSTMs. They control the flow of information in and out of memory cells. The Forget Gate decides which information to discard from the cell state. It takes as input the previous hidden state and the current input. The outputs are a number between 0 and 1 for each element of the cell state. 0 means forget completely. 1 means remember completely. The Input Gate decides which new information to store in the cell state. It also takes previous hidden state and the current input as input. The outputs are numbers between 0 and 1. It indicates how much of the new information should be added to the cell state. The Output Gate decides which information to output from the current cell state. It takes previous hidden state and current input as input. The outputs are a number between 0 and 1. It indicates how much of each element should be output.

    The cell state is updated based on information from forget gate, input gate and the current input.

    The hidden state of the LSTM is computed based on the updated cell state and the input data.

  • AI-powered PCs

    PC market is experiencing a shift on being integrated to AI. There is a lot of excitement among consumers. Companies such as Dell, Lenovo and HP are introducing AI-powered computers. These computers have Intel or AMD processors. These have dedicated neural processing units (NPUS).

    AI PCs run generative AI apps with smaller data sets. They enhance the security of the models with proprietary data.

    These PCs are designed for AI and ML tasks. The NPU is their distinguishing feature. It eliminates the need for cloud processing. It leads to secure performance for tasks such as object detection and video calls. At the same time, GPU and CPU are freed up for more demanding tasks. This extends battery life.

    AI is being integrated to various applications such as office tools, video editing, VFX, music production and so on. The shift of these tasks from cloud-based to local processing on PCs enhances performance and privacy.

    Such pcs elevate workplace productivity. AI-powered PCs will be big game changers. As the processing is on device, it enhances data privacy.

    Smarter PCs resonate with user’s lifestyles. AI is a fundamental reimagination of digital interaction.

  • Diffusion Transformer: DiT

    Sora, a video making software of OpenAI, is powered by DiT. AI innovation is stimulated by transformers and diffusion models. These are the architectures. Transformers are used in neural networks. They have rejuvenated language and text-based models. Diffusion models are used for image generation.

    Diffusion basically means the process of spread. The particles spread from dense space to less dense area. Diffusion transformer: DiT is a diffusion model based on transformer architecture.

    DiT has been developed William Peebles (formerly with UC Berkeley, now with OpenAI) and Saining XE (NY University) in 2023. It uses U-Net — iterative image de-noising. Making an image is akin to solving a jigsaw puzzle. The puzzle pieces can be arranged in a number of ways. Every time it is not the best solution. DiT enables us to solve big puzzles.

    Sora uses diffusion to predict videos. It then uses the strength of the transformers for scaling. It has to focus on the attention pooling stage — important parts of the video. It then uses the strength of the transformers for scaling. It has to focus on the attention pooling stage — important parts of the video. There is random noise added on the prompt. It is to be seen how the noise affects the video. The model predicts what should come next by trying various combinations. It learns from its mistakes and improves. The finished video at the end is smooth and clear and is bereft of all extra noise. DiT, thus, helps Sora to understand text prompts and produce cool videos. It draws on the images it has stored during its training.

    DiT deploys transformers where noise is slowly transformed into a target image. It reverses the diffusion process by being guided by the transformer. It is like editing a blurry image. DiT can handle larger input data without compromising quality.

  • Chip Production

    It takes 450 steps starting from the procurement of a raw wafer, permitting cleaning steps and complex processes for multiple chips on the wafer, till the wafer goes for packaging and assembly. The process involves patterning or lithography on the wafer — it means passing the chip design on the wafer for defining circuits after which they are cleaned and polished.

    Next comes etching — it means removing the material and creating layers of chips.

    After this, the transistors are interconnected, and joined to the aluminum pad, which are then connected to outer world with thin wires.

    Of the total 450 steps, 30 per cent pertain cleaning process. Clean room environment is critical for semi-conductor fabs to reduce contamination during processing.

    There are machines like lithography system, plasma etcher, defect inspection stations, photo restoration coater.

    Technicians load recipes to process wafers — a recipe refers to a set of instructions or parameters used to perform a particular manufacturing process.

    The facility must have ultrapure water and different gases. Over 30 types of gases are used.

    There are expenses of repair and maintenance of equipment. There are 70 plus equipment, and if any one of them is down, the whole line will stop working.