.

Pavan Kumar's Blog (1)

Apache Spark: distributed data processing faster than Hadoop

This blog is extrapolated from DataScience Hacks by the author himself. 

Apache Spark, another apache licensed top-level project that could perform large scale data processing way faster than Hadoop (I am referring to MR1.0 here). It is possible due to Resilient Distributed Datasets concept that is behind this fast data processing. RDD is basically a collection of objects,…

Continue

Added by Pavan Kumar on September 28, 2014 at 7:00am — 1 Comment

Blog Topics by Tags

Monthly Archives

2014

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service