Self-Learn Yourself Apache Spark in 21 Blogs – #2

By this blog we will share the titles for learning Apache Spark, Basics on Hadoop which is one of the big data tool, and motivations for Apache Spark which is not replacement of Apache Hadoop, but its friend of big data.

Blog 1 – Introduction to Big Data

Blog 2 – Hadoop, Spark’s Motivations

Blog 3 – Apache Spark’s History and Unified Platform for Big Data

Blog 4 – Apache Spark’s First Step – AWS, Apache Spark

Blog 5 – Apache Spark Languages with basic Hands-on

Blog 6 – The RDD, RDDs Input, Hands-on

Blog 7 – Transformation, map, mapPartitions

Blog 8 – RDD Combiner

Blog 9 – Actions, Persistence Actions, Hands-on

Blog 10 – Implicit Conversions, Hands-on

Blog 11 – Key Value Methods

Blog 12 – Caching Data, Hands-on

Blog 13 – Accumulator

Apache Hadoop is an open source technology which is the big data management platform and most associated with big data analytics applications. The distributed processing framework was created in 2006, primarily at Yahoo and based partly on ideas outlined by Google in a pair of technical papers; soon, other Internet companies such as Facebook, LinkedIn and Twitter adopted the technology and began contributing to its development. In the past few years, Hadoop had evoled into a complex ecosystem of infrastructure components and related tools, which are packaged together by various vendors in commercial Hadoop distributions.

One of the best tutorials on Hadoop thanks to Yahoo team.

Below are the pointers on why Apache Spark and Motivations for Apache Spark…

Readability
Expressiveness
Fast
Testability
Interactive
Fault Tolerant
Unify Big Data Platform

In Blog 3 – We will share the detaled study on Apache Spark’s History and Unified Platform for Big Data.

Originally posted here.