Subscribe to DSC Newsletter

Hadoop – the software framework which provides the necessary tools to carry out Big Data analysis – is widely used in industry and commerce for many Big Data related tasks.

It is open source, essentially meaning that it is free for anyone to use for any purpose, and can be modified for any use. While designed to be user-friendly, in its “raw” state it still needs considerable specialist knowledge to set up and run.

Because of this a large number of commercial versions have come onto the market in recent years, as vendors have created their own versions designed to be more easily used, or supplied alongside consultancy services to get you crunching through your data in no time.

These days, this is often provided in the form of “Hadoop-as-a-service” – all of the installation will actually take place within the vendors own cloud, with customers paying a subscription to access the services.

Here’s a run-down, in no particular order, of 10 of the most popular or interesting commercial Hadoop platforms on the market today.

Cloudera

One of the first commercial Hadoop offerings and still the most popular, reportedly with more installations running than any of its competitors. Cloudera also contribute Impala, which offers real-time massively parallel processing of Big Data to Hadoop.

Amazon Web Services

Open source Big Data frameworks may not be the first thing that springs to mind when you think of Amazon, but the retailer was another one of the first to offer Hadoop in the cloud as part of its Amazon Web Services package. AWS is a hosted solution integrating Hadoop with Amazon’s Elastic Cloud Compute and Simple Storage Service (S3) cloud-based data processing and storage services.

Hortonworks

Of the vendors listed here, Horton is one of the few which offer 100% open source Hadoop technology without any proprietary (non-open) modifications. They were also the first to integrate support for Apache HCatalog, which creates “metadata” – data within data – simplifying the process of sharing your data across other layers of service such as Apache Hive or Pig.

MapR

Uses some differing concepts, such as native support for UNIX file systems rather than HDFS, meaning it will be more familiar to DBAs used to working in a UNIX environment. MapR technologies is also spearheading development of the Apache Drill project, which provides advanced tools for interactive real-time querying of Big Datasets.

IBM

It might be a relative newcomer to the Hadoop ecosystem, but IBM has deep roots in the computing industry, particularly in distributed computing and data management. Its BigInsights package adds its proprietary analytics and visualization algorithms to the core Hadoop infrastructure.

Microsoft HDInsight

Engineered to run on Microsoft’s Azure cloud platform, Microsoft’s Hadoop package is based on Hortonworks’, and has the distinction of being the only big commercial Hadoop offering which runs in a Windows environment.

Intel Distribution for Apache Hadoop

Another giant of the tech world which has recently turned its attention towards Hadoop. Intel’s distribution adds the company’s Graph Builder and Analytics Toolkit functions to Hadoop, and claims that security updates to the infrastructure mean that their solution offers added security for your data. 

Datastax Enterprise Analytics

Datastax offers its own distribution of the Apache Cassandra database management system on top of its Hadoop installation. It also includes custom proprietary systems to handle security, search, dashboard and visualization. Customers include Netflix, where it powers the recommendation engine by analyzing over 10 million data points every second!

Teradata Enterprise Access for Hadoop

Teradata offer hardware and software for implementing Big Data solutions, as well as their own Hadoop package, which is also based on the Hortonworks distribution. Proprietary technology supplied alongside the open source components include their QueryGrid analytics engine and Viewpoint dashboard.

Pivotal HD

Pivotal was formed as a joint venture between storage system provider EMC and virtualization specialists VMware. Pivotal HD (Hadoop Distribution) forms part of the company’s Big Data Suite, which also includes database tools Greenplum and analytics platform Gemfire. Customers include China’s national rail operator, China Railway – sorting out the logistics for rail journeys for 3.5 billion passengers certainly qualifies as Big Data!

Which others would you add to my list? What are your views on any of these, please share in the comments below.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 1687

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Donal Daly on October 5, 2015 at 3:32am

I thought Intel was getting out of the Hadoop distribution and going with Cloudera? - see this article: http://www.theinquirer.net/inquirer/news/2336750/intel-dropping-its...

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service