Home » Uncategorized

Self-Learn Yourself Apache Spark in 21 Blogs – #1

We have received many requests from friends who are constantly reading our blogs to provide them a complete guide to sparkle in Apache Spark. So here we have come up with learning initiative called “Self-Learn Yourself Apache Spark in 21 Blogs”.

We have drilled down various sources and archives to provide a perfect learning path for you to understand and excel in Apache Spark. These 21 blogs which will be written over a course of time will be a complete guide for you to understand and work on Apache Spark quickly and efficiently.

We wish you all a Happy New Year 2016 and start the year with rich knowledge. From dataottam we wish you good luck to “ROCK Apache Spark & the New Year 2016”

Blog 1 – Introduction to Big Data


Assume that you’re preparing for 50 most popular and best books in the big data space to purchase for your college library from around the world.  When we do a web search like “good and best books for big data”, we will land up with many and many multiple pages of results including various ppt’s, pdf’s, pics, and more.  And even we could see the links to social media like google+, facebook, LinkedIN, twitter, and more.

So now the game beings how do we decide what is most applicable to our need? Due to time constraint we can able to go through every link, so now our big data problem starts. And assume that you have friend who can able to analyze all this listed data and share you with the just information what you need.

So now let’s learn what Big Data is and what is role and dimensions of Big Data in enterprise. Now let’s consider this a LinkedIn post gets 200 likes, and 10+ comments per day and there are many in the same line, hence the data generation is very huge which is unimaginable or non measureable with the legacy data base systems. And the collection of large amount of data is referred to big data.

And the big data can come from Internal Data sources, External Data Sources, LinkedIn, Google, Facebook, Twitter, and Personal Devices. And if all these data are sorted, filtered, and analyzed will be producing insight full information which could key pointers for the enterprise to make the business decisions.


Big Data is characterized by 4Vs, but according to me it also brings the dimension of 4Ms. They are Volume, Velocity, Variety, Values and Makes Me More Money. 

But what is the source for Big Data; it will be Social Data, Machine Data, and Transactional Data.

In Blog 2 – We will share with Apache Spark’s motivation and Apache Hadoop 101.

The original blog can be seen here.

As, always please feel free to suggest or comment.