Subscribe to DSC Newsletter


Alex Krizhevsky, Geoffrey Hinton and Ilya Sutskever created a neural network architecture called ‘AlexNet’ and won Image Classification Challenge (ILSVRC) in 2012. They trained their network on 1.2 million high-resolution images into 1000 different classes with 60 million parameters and 650,000 neurons. The training was done on two GPUs with split layer concept because GPUs were a little bit slow at that time.

The original paper is available at ImageNet Classification with Deep Convolutional Neural Networks

Also check: Convolutional Neural Network and LeNet-5

AlexNet Architecture:

The AlexNet architecture consists of five convolutional layers, some of which are followed by maximum pooling layers and then three fully-connected layers and finally a 1000-way softmax classifier.

Original Architecture Image from [Krizhevsky et al., 2012.]

In the original paper, all the layers are divided into two to train them on separate GPUs. Since it is a complex arrangement and difficult to understand, we will implement AlexNet model in one layer concept.

Simplified AlexNet Architecture Video

First Layer:

The input for AlexNet is a 227x227x3 RGB image which passes through the first convolutional layer with 96 feature maps or filters having size 11×11 and a stride of 4. The image dimensions changes to 55x55x96.

Then the AlexNet applies maximum pooling layer or sub-sampling layer with a filter size 3×3 and a stride of two. The resulting image dimensions will be reduced to 27x27x96.

Second Layer:

Next, there is a second convolutional layer with 256 feature maps having size 5×5 and a stride of 1.

Then there is again a maximum pooling layer with filter size 3×3 and a stride of 2. This layer is same as the second layer except it has 256 feature maps so the output will be reduced to 13x13x256.

Third, Fourth and Fifth Layers:

The third, fourth and fifth layers are convolutional layers with filter size 3×3 and a stride of one. The first two used 384 feature maps where the third used 256 filters.

The three convolutional layers are followed by a maximum pooling layer with filter size 3×3, a stride of 2 and have 256 feature maps.

Sixth Layer:
 The convolutional layer output is flattened through a fully connected layer with 9216 feature maps each of size 1×1.

Seventh and Eighth Layers:
 Next is again two fully connected layers with 4096 units.

Output Layer:
Finally, there is a softmax output layer ŷ with 1000 possible values.

Summary of AlexNet Architecture:

Views: 1459


You need to be a member of Data Science Central to add comments!

Join Data Science Central


  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service