Fi Fosters AI Boom

In a recent memoir, Le tells the story of ImageNet (The World I Saw). It was a large database. Prior to ImageNet, people did not believe in data. Li pursued the project for more than a couple of years. All this happened at Princeton (2008). She took a job at Stanford in 2009. She thus took the project of ImageNet to California.

In 2012, a team from the University of Toronto trained a neural network on the ImageNet dataset. There was amazing image recognition. The model was called AlexNet after its author Alex Krizhevsky. This set into motion a deep learning boom that has been continuing till today.

AlexNet was assisted by CUDA platform of Nvidia which converted GPUs into non-graphics applications. Fei-Fei Li pursued the unorthodox ideas. The second visionary was Geoffrey Hinton, a computer scientist at University of Toronto. He promoted neural networks for decades despite skepticism. The third visionary was Jensen Huang of Nvidia.

Hinton teamed up with two of his former colleagues at USSD Rumelhart and Williams to describe backpropagation for efficiently training neural networks in a landmark 1986 paper. Of course, backpropagation was not discovered by them, but their paper popularized it. Hinton moved to the University of Toronto in 1987.

One French computer scientist Yann LeCun was attracted by Hinton here to do post-doc work with Hinton before moving to Bell Labs in 1988.

Backpropagation facilitated handwriting recognition, and by mid-1990s LeCun’s technology was adopted by the commercial banks to process cheques.

Neural networks were not suitable for larger and complex images.

GPUs have many execution units which are tiny CPUs. All are packed on a tiny chip. There is parallel processing which results into better image quality and higher frame rates. GPUs were introduced in 1999. In mid-2000’s Nvidia CEO Jensen Huang suspected that a GPU can be useful beyond gaming, say for weather forecasting or oil exploration. In 2006, the CUDA platform was announced by Nvidia. Here programmers write kernels — short programmes to run on a single execution unit.

Kernels allow the computing task to be split up into bite-sized chunks which could be processed in parallel.

In 2006, CUDA was thought to be a useless thing. In 2008, Nvidia’s stock declined by 70 per cent. In 2009, CUDA was downloaded and reached a peak. Again there was a decline for three years. Huang did not conceive of neural networks or AI when CUDA was thought of. It was Hinton’s backpropagation concept that split up the task into bite-sized chunks. Hinton was quick to recognize the potential of CUDA. Human speech recognition was made possible using CUDA platform in 2009. Still, Hinton was never given a free chip by Nvidia. They obtained the Nvidia chips GTX580 for AlexNet project (Hinton, Alex and Sutskever).

At Princeton, Li wanted to build a comprehensive image dataset. She was suggested to use WordNet where 1.4 lac words were organized. She called her new dataset ImageNet, since she used WordNet as a starting point to choose her categories. Verbs, adjectives and intangible nouns were eliminated. There were then 22000 countable objects ranging from ‘ambulance’ to ‘zucchini’.

She adopted Google’s image search to find candidate images, and used a human being to verify them. The images were chosen and labelled. However, it was a humongous task that would take years.

She then learnt about Amazon Mechanical Turk which cut the time to complete ImageNet to two years. ImageNet was ready for publication in 2009. It was presented in computer vision conference. Still, it did not get the type of recognition Li expected. She made a smaller dataset with 1000 categories and 1.4 million images. She arranged a competition in 2010, and 2011. Still ImageNet was too much for algorithms to handle. A third competition was held in 2012. Geoffrey Hinton’s team submitted a model based on deep neural networks. It gave amazing accuracy. The winners were to be announced at the European Conference of Computer Vision. Li was not inclined to attend since she had a baby to nurse. Then she witnessed how well AlexNet has worked on her dataset. She reluctantly attended. Even Yann LeCun too was in the audience. It was a turning point in the history of computer vision. It endorsed Hinton’s faith in neural networks. AlexNet was CNN that LeCun has developed 20 years back for recognizing. There were few differences between AlexNet and LeCun’s image recognition networks of 1990s. AlexNet was far larger. It had 8 layers and 60 million parameters. LeCun could not have trained a model of this magnitude in the 90s since there were no GPUs. Even collecting images would have been tough in the absence of supercomputers, Google and Amazon Mechanical Turk.

Li provided the training data that large neural network needs to reach their full potential.

Hinton and his students formed a company, which was later purchased by Google where Hinton worked while retaining his academic post at Toronto.

AlexNet made Nvidia chips the industry standard for training neural networks.

Three elements of modern AI converged for the first time — neural networks, big data and GPU computing.

print

Leave a Reply

Your email address will not be published. Required fields are marked *