This article was posted on Data Flair. Below is a quick overview of the original article.
1.Objective
This tutorial provides introduction to Apache Spark, what are its ecosystem components, Spark abstraction – RDD, transformation and action. The objective of this introductory guide is to provide detailed overview of spark, its history, architecture, deployment mode and RDD.
2.History:
Apache Spark was introduced in 2009 in the UC Berkeley R&D Lab, later it become AMPLab. It was open sourced in 2010 under BSD license. In 2013 spark was donated to Apache Software Foundation where it became top-level project in 2014. Apache Spark became most popular project at apache in 2015.
3.Introduction
Apache Spark is a general-purpose & lightning fast cluster computing system. It provides high-level API like Java, Scala, Python and R. Apache Spark is a tool for Running Spark Applications. Spark is 100 times faster than Hadoop and 10 times faster than accessing data from disk. Spark is written in Scala but provides rich APIs in Scala, Java, Python and R. I can be integrated with Hadoop and can process existing HDFS data.
4.Need for Spark
5.Components:
6.Resilient Distributed Dataset – RDD
7.Spark Shell:
To read the full article or get Spark training, click here.
Top DSC Resources
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge
Posted 1 March 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central