Subscribe to DSC Newsletter

Is Spark The Data Platform Of The Future?

Hadoop has been the foundation for data programmes since Big Data hit the big time. It has been the launching point for data programmes for almost every company who is serious about their data offerings.

However, as we predicted we are seeing that the rise in in-memory databases has seen the need for companies to adopt frameworks that harness this power effectively.

It was therefore no surprise that Apache have launched Spark, a new framework that utilizes in-memory primitives to deliver performance around 100 times faster than Hadoop’s two-stage disk-based version.

This kind of product has become increasingly important as we move forward into a world where the amount and speed of data has been increasing exponentially.

So is Spark going to be the Hadoop beater that it seems to be?

Yes

This kind of technology that allows us to make decisions quicker and with increased amounts of data is going to be something that companies are clamouring for.

It is not simply in principle that this platform will be bringing about change either. As an open source platform, it has the most developers working on it across every Apache product.

This suggests that people support the idea through their willingness to dedicate their time to it. It is common knowledge that many of the data scientists working on Apache products are the same ones who will be using it in their day-to-day roles at different companies, which could suggest that they are going to adopt this system in the future.

No

One of the main reasons for the success of Hadoop in the last few years has been not only due to its ease of use, but also that companies can get it for nothing. This is because you can run the basics of Hadoop on a regular system and will only need to upgrade when they ramp up their data programmes.

Spark runs on-memory systems which requires a system with high performance, something that companies new to data initiatives are unlikely to invest in.

So which is it more likely to be?

In my opinion, Hadoop will always be the foundation of data programmes and with more companies looking at adopting it as the basis for their implementations, this is unlikely to change.

Spark may well become the upgrade that companies who move to a stage where they want, or need, improved performance will adopt. As Spark can work alongside Hadoop this seems to have also been in the minds of the guys at Apache when coming up with the product in the first place.

Therefore, it is unlikely to be a Hadoop beater, but will instead become more like its big brother. It is capable of doing more, but at increased cost and only necessary for certain data volumes and velocities, is not going to be a replacement. 

Views: 4301

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Tom Huffman on March 15, 2015 at 4:11pm

I like @Narayanarao's comparison. Spark allows you to have access to the vase amount of data in a near realtime fashion. As he explains, Spark should not be used as a replacement but as a companion of your Hadoop eco-system. On scenario it's well suited is as a preliminary filter for determining if the data being considered fits the criteria before storing it to the HDFS. This would minimize and help control the amount of data retrieved.  I have been an advocate of Spark for over a year and saw the potential even then. 

Comment by Narayanarao Pavuluri on March 15, 2015 at 3:11pm

IMHO, The Hadoop and Spark are two distinct platforms that serve different applications. One cannot load petabytes of data into in memory database. Spark is created for applications that need faster access to the data and quick information, whereas Hadoop is for storing large amount of data and to make sense of what is being said of the data. If I have to compare them, I think it is like comparing elephant and a horse. Compared to elephant, horse is small, very fast and agile but it cannot pull as much load as an elephant can. While an elephant cannot be as fast as horse even with small amount of load, it can pull large amounts of load. Sorry to animal lovers for giving animal analogy. I just want to paint a picture. I do not want to use animals for manual labor. At this age, machines are doing all or most of our work...

Comment by Chris Towers on March 15, 2015 at 9:44am

...& so the comment gets deleted, marvellous!

Comment by Chris Towers on March 15, 2015 at 6:07am

Hadoop will not be made "obsolete" Ilya - the amount of companies that I meet claiming so with their 'next gen' solution, Hadoop just gets further support from being built better & better

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service