ImageNet Classification

Researchers at the University of Toronto improved state of the art models for classifying under the ImageNet dataset. Krizhevsky et. al.‘s 2012 paper ImageNet Classification with Deep Convolutional Neural Networks describes their research. After submitting to competitions ILSVRC-2010 and ILSVRC-2012, their model improved previous state-of-the-art deep neural network considerably for both top-1 and top-5 error rates.

ImageNet is a database of millions of images with 22,000 different categories. Humans have contributed to classifying and labeling the test data. At the time, the best models were not very efficient and computationally expensive. The researchers from the paper outlined a variety of novel techniques which each improve the efficiency of the model in various ways.

The architecture of the neural network consists of five convolutional layers, followed by three fully connected (dense) layers. The convolutional layers improved the efficiency of the model for working with images considerably, by using ReLu activation, pooling layers, and convolutions to extract specific features to pass to each subsequent layer.

CNN’s use ReLu activation, which provides a linear activation between the values 0 and 1. This simplifies the computation on the forward pass as well as the gradient computation during the backward propagation. ReLu alone increased the speed by a factor of three, as compared to the saturating nonlinearity in tanh. They also help to normalize the output without needing to normalize the input. They essentially translated and reflected existing images, which the model learned from as separate and unique images.

Additionally, to reduce overfitting the researchers utilized the dropout technique, which helps to generalize well to unseen data. Additionally, data augmentation was used to improve the models ability to recognize variations of existing images.

Ultimately the research resulted in a model with top-1 error rate of 37.5% and a top-5 error rate of 17%. This was considerably better than the second best top-5 error rate of 26.2%. The research showed that the depth of the network worked to improve the accuracy, and that removing any of the middle layers resulted in a degradation of performance.