Types of Generative AI

When AI creates human-like contents, it is called generative AI. It is relatively a new field. The content could be text or pictures or videos or poetry or computer code. This is achieved by four techniques which have evolved in the last ten years. These draw inspiration from deep learning, transformers and neural networks. These techniques rely on data to learn how to generate content.

LLMs are foundation models — neural networks trained on huge amounts of data so as to learn the relationship between words. This enables them to predict the next word that should appear in any sequence of words. They are further trained on specific data — fine-tuned to carry out specific tasks.

In LLMs, the first step is tokenization (words, parts of words, combinations of prefixes-suffixes and linguistic elements). Next step is matrix transformation to convert tokens into numerical data analyzable by computers.

LLMs are useful in natural language processing (NLP).

Diffusion models is another method to do generative AI. They follow a process called iterative denoising. There is a text prompt. The computer has to create an image. It is an image of random noise. It is like drawing an image by scribbling randomly on a piece of paper. These scribbles are refined by using training data. At each step, noise is removed, adjusting the image to the desired characteristics. It then generates an entirely new image that matches text prompt (this is not found in training data).

Stable Diffusion and Dall-E follow this process to create photo-realistic images and generate videos too as proved by Sora.

Generative Adversarial Networks (GANs) emerged in 2014 to generate synthetic content (both text and images). Here two different algorithms are pitted against each other. One is called generator and the other discriminator. Both have to out-do each other. The generator tries to generate realistic content. The discriminator tries to decide whether it is real or not. Each learns from the other. There is constant improvement till the generator learns how to create content that is as close as possible to real.

GANs are versatile tools for generating pictures, video, text and sound. They are extensively used for computer vision and NLP tasks.

Neural Radiance Fields (NeRFs) is the latest technology that emerged in 2020. It is asked to create representations of 3D objects using deep learning.

Certain portions of the image are not seen, say an object in the background which an object in the foreground obscures, or the rear aspects of an object being photographed from the front.

Here we have to predict the volumetric properties of objects. These are mapped on 3D spatial coordinates (using neural networks, using geometry and reflection of light around an object).

This technique is pioneered by Nvidia. These are used in simulations video games, robotics, architecture and urban planning.

Hybrid models of generative AI combine various techniques described above to create innovative content generation.

DeepMind’s AlphaCode combines LLMs with reinforcement learning (RL) to generate high quality computer code.

print

Leave a Reply

Your email address will not be published. Required fields are marked *