Apache Spark Introduction – A Comprehensive Guide for beginners

This article was posted on Data Flair. Below is a quick overview of the original article.

1.Objective

This tutorial provides introduction to Apache Spark, what are its ecosystem components, Spark abstraction – RDD, transformation and action. The objective of this introductory guide is to provide detailed overview of spark, its history, architecture, deployment mode and RDD.

2.History:

Apache Spark was introduced in 2009 in the UC Berkeley R&D Lab, later it become AMPLab. It was open sourced in 2010 under BSD license. In 2013 spark was donated to Apache Software Foundation where it became top-level project in 2014. Apache Spark became most popular project at apache in 2015.

3.Introduction

Apache Spark is a general-purpose & lightning fast cluster computing system. It provides high-level API like Java, Scala, Python and R. Apache Spark is a tool for Running Spark Applications. Spark is 100 times faster than Hadoop and 10 times faster than accessing data from disk. Spark is written in Scala but provides rich APIs in Scala, Java, Python and R. I can be integrated with Hadoop and can process existing HDFS data.

4.Need for Spark

5.Components:

Spark core
Spark SQL
Spark streaming
MLlib
GraphX

6.Resilient Distributed Dataset – RDD

7.Spark Shell:

To read the full article or get Spark training, click here.

Top DSC Resources

Article: What is Data Science? 24 Fundamental Articles Answering This Question
Article: Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
Tutorial: Data Science Cheat Sheet
Tutorial: How to Become a Data Scientist – On Your Own
Categories: Data Science – Machine Learning – AI – IoT – Deep Learning
Tools: Hadoop – DataViZ – Python – R – SQL – Excel
Techniques: Clustering – Regression – SVM – Neural Nets – Ensembles – Decision Trees
Links: Cheat Sheets – Books – Events – Webinars – Tutorials – Training – News – Jobs
Links: Announcements – Salary Surveys – Data Sets – Certification – RSS Feeds – About Us
Newsletter: Sign-up – Past Editions – Members-Only Section – Content Search – For Bloggers
DSC on: Ning – Twitter – LinkedIn – Facebook – GooglePlus

Apache Spark Introduction – A Comprehensive Guide for beginners

Leave a Reply Cancel reply