Home » Uncategorized

Accelerating Convolutional Neural Networks on Raspberry Pi

This article was written by Koustubh. 


Unless you have been living under the rock, you must have heard of the revolution that deep learning and convolutional neural networks have brought in computer vision. Computers have achieved near-human level accuracy for most of the tasks. However, most of the top performing neural networks for state of the art image recognition problem suffer from three problems:

1. Low speed:

which deter them to get deployed in real time applications on embedded devices like Raspberry PI, Nvidia Jetson etc. This problem gets worse for an application like object detection where multiple windows at different locations and scale need to be processed.

2.Very large model size:

Models that achieve state of the art accuracy are too large to fit into mobile devices or small devices like Raspberry Pi.

3. Large memory footprint:

Even if you somehow manage to live with the large size of models, the amount of run-time memory(RAM) required to run these models is way too high and limits their usage.

Hence, the current trend is to deploy these models on servers with large graphical processing units(GPU), but issues like data privacy and internet connectivity demand usage of embedded deep learning. So huge efforts from people all around the world are geared towards accelerating the inference run-time of these networks, decreasing the size of the model and decreasing the run-time memory requirement.

Why are CNNs slow?

The major chunk of time in a CNN is spent on convolutional layers while the most of the storage is spent on fully connected layers. For example in the AlexNet, 90% of computation time is spent on convolutional layers and 90% of model size is from the weights of FC layers. The newer models like Inception and Resnet have removed the FC layers completely, hence reducing the storage. But still, the models cannot be deployed with low memory footprint. The reason is that during inference, additional memory is utilized for facilitating fast matrix multiplication. Therefore these models can be run on Graphical Processing Units(GPU) with massive memory, but for devices like Raspberry Pi(RAM of 2GB), one has to manage with shallower CNN. Unfortunately, shallower CNNs do not achieve human level accuracy.

This long article has the following sections:

  • Decomposition of Response Map
  • Convolutional Layer
  • Lower Rank Subspace
  • Decomposition

To check out all this information, click here.

DSC Resources

Popular Articles

Leave a Reply

Your email address will not be published. Required fields are marked *