Subscribe to DSC Newsletter

Implementing a Distributed Deep Learning Network over Spark

Implementing a Distributed Deep Learning Network over Spark

Authors: Dr. Vijay Srinivas Agneeswaran, Director and Head, Big Data Labs, Impetus {}

Ghousia Parveen Taj, Lead Software Engineer, Impetus {}

Sai Sagar, Software Engineer, Impetus {}

Padma Chitturi, Software Engineer, Impetus, {}.

Deep learning is becoming an important AI paradigm for pattern recognition, image/video processing and fraud detection applications in finance. The computational complexity of a deep learning network dictates need for a distributed realization. Our intention is to parallelize the training phase of the network and consequently reduce training time. We have built the first prototype of our distributed deep learning network over Spark, which has emerged as a de-facto standard for realizing machine learning at scale.


Geoffrey Hinton presented the paradigm for fast learning in a deep belief network [Hinton 2006]. This paper, with the advent of GPUs and widespread availability of computing power, led to the breakthrough in this field. Consequently, every big software technology company is working on deep learning and every other startup is using it. A number of applications are being realized over it, including in various fields such as credit card fraud detection (see for example Deep Learning Analytics from Fico), multi-modal information processing etc. This excludes areas such as speech recognition and image processing, which have been already transformed by the application of deep learning [Deng 2013].

The team at Google lead by Jeffrey Dean came up with the first implementation of distributed deep learning [Dean 2012]. Architecturally, it was a pseudo-central realization, with a centralized parameter server being a single source of parameter values across the distributed system. Oxdata has recently released its H20 software which also comprises a deep learning network in addition to several other machine learning algorithms. They have also made the H20 software to work over Spark, as evident from this blog on Sparkling Water. To the extent we have explored, only the Microsoft project Adam comes close to a fully distributed realization of a deep learning network .

Distributed Deep Learning over Spark

Spark is the next generation Hadoop framework from the UC Berkeley and Databricks teams – even the Hadoop vendors have started bundling and distributing Spark with Hadoop versions.  Currently, there is no deep learning implementation either in MLLib, the machine learning library on top of Spark or outside of MLLib that we are aware of.

We have implemented a stacked Restricted Boltzman Machines, similar to this paper [Roux 2008]. The architecture of our deep learning network over Spark is given in the diagram below.

 To achieve our desires of a fully distributed deep learning implementation, we have relied on Hadoop's distributed file system and Spark's in memory computation for our parallel training.

The input dataset is stored as a HDFS file, and thus distributed across the cluster. Each node in the cluster runs a Akka Actor. The role of this “Actor” is to share the training results on this node with every other node in the cluster. On receiving a request to train the network, the deep learning framework initializes the initial set of weight matrix, and the same is made available on every node's local file system. The training phase is nothing but a Spark application that loads the input file in HDFS to Spark RDD. Once training for a single RDD partition is complete, the results (weight matrix) is written to HDFS, and the local Actor publishes the update message to every other node on the cluster. On receiving the update message, Actors on other nodes copy the weight matrix and updates the local weight matrix accordingly. For subsequent, partitions the updated weight matrix is used. The training output will be the final weight matrix after training every dataset block.

We have eliminated the central parameter server from the deeplearning4j, which itself is a realization of Geoffrey Dean’s paper. We have built a publish-subscribe system which is implemented using the Akka framework over Spark – this is responsible for distributing the learning across the different nodes. The weight matrix represents this learning and is shared at a location in the HDFS in our current implementation – we may augment the Akka distributed queue to take a larger file in future.

Concluding Remarks

This is the first attempt at realizing a distributed deep learning network directly over Spark, to our best knowledge. We shall be augmenting the first cut implementation with more work especially w.r.t achieving high accuracy of the deep learning network. We shall also be building a few applications including image search and NLP (to provide natural language interface to relational queries) to show case the power of our deep learning platform.



[Dean 2012]  Dean, Jeffrey, et al. “Large scale distributed deep networks.” Advances in Neural Information Processing Systems. 2012.

[Deng 2013] Li Deng and Long Yu, Deep Learning : Methods and Applications, Foundations and Trends in Signal Processing, vol 7, no. 3, pages 197-387, 2013.

[Hinton 2006] Hinton, G. E., Osindero, S. and Teh, Y. A fast learning algorithm for deep belief nets, Neural Computation, 18, pages 1527-1554.

[Roux 2008] Le Roux, Nicolas, and Yoshua Bengio. “Representational power of restricted boltzmann machines and deep belief networks.” Neural Computation 20.6 (2008): 1631-1649.



Views: 17816


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Dr. Vijay Srinivas Agneeswaran on December 2, 2014 at 1:29am

Debanjan, all nodes go on computing different parts of the training data simultaneously. They exchange updates after some mini batches through the publish-subscribe layer - kindly note that this is being done asynchronously. Nodes do not wait for updates from neighbours.

Comment by Debanjan Bhattacharyya on December 2, 2014 at 1:26am

Vijay, if you are doing mini batches, how are you optimizing the use of nodes ? When a single node is working on a mini batch (this might take time when we are considering images), are the other nodes just sitting idle ?

Comment by Padma Ch on November 30, 2014 at 8:10am

The reason behind using hdfs is the network on different nodes would be trained on different datasets. Hence we are maintaining local copy of the network (deep learning network) and after certain iterations we are ensuring that the updated weight matrices are same across all the nodes. Also, as the network size increases it could be memory intensive. We need to explore Tachyon and see how best it would suit the application.

Comment by sai sagar on November 29, 2014 at 5:09am

We are exploring Tachyon.

Comment by Sam Bessalah on November 29, 2014 at 3:45am

Have you tried using Tachyon, the in memory file system project from Berkeley. It provides better throughput compared to hdfs, especially for storing local weighted matrices.

Comment by Peng Cheng on November 28, 2014 at 1:40pm

Why using HDFS for frequently updated parameter instead of in-memory distributed cache?

Follow Us


  • Add Videos
  • View All


© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service