Subscribe to DSC Newsletter

Hadoop is an open source framework for storing massive amounts of data on clusters of commodity hardware.

Haboob is a dense dust storm that moves fast along the landscape and reduces visibility.

The goal of modern big data analytical ecosystems and data science is to help organizations obtain greater visibility to make better decisions and obtain competitive advantage. While Hadoop has certain cost advantages over traditional data warehouses for storing and batch processing raw and structured data, the complicated framework may be reducing visibility for many organizations.

Indeed, the Hadoop framework ecosystem appears to be evolving into an expensive, complicated morass of many moving parts, vendors and consultants that is producing more pain than gain and reducing return on investment (ROI) for the following reasons:

1. Shortage of qualified engineers to build and operate the Hadoop framework.

2. Shortage of qualified data scientists to obtain actionable, valuable meaning from the huge mass of valuable and worthless data.

3. The Hadoop framework is complicated and confusing to operate limiting usefulness.

4. Numerous, complicated and confusing vendor offerings around the Hadoop framework.

5. Hadoop is batch oriented when (near) real-time data ingestion and analysis will be in greater demand to create real value from data. MapReduce is high latency and a speed layer is needed for real-time creating unwanted complexity.

6. Many organizations are using Hadoop to simply dump data (Hadump) with no information strategic plan.

7. The Hadoop framework is more expensive to build and operate than vendors and consultants disclose (although cheaper than the traditional data warehouse architecture).

8. Expensive consultants are required to evaluate and select tech to efficiently operate the Hadoop framework - as well as design and help execute a customized information strategy integrating legacy data warehouses with Hadoop.

9. It is very difficult to integrate traditional data warehouse / business intelligence systems with the Hadoop framework.

10. The Hadoop framework lacks useful software and big data-analysis applications to help determine valuable, relevant data - creating the need to store massive amounts of worthless data - leading to storing too much and make answering both simple and difficult questions harder.

Organizations need an information strategy and data platform that provides valuable, actionable insights from both raw unstructured and semi-structured data as well as internal and external data sources. Little value is gained from analyzing internal structured data - the big value comes from analyzing a mixture of both internal and external data sources. Additionally, organizations increasingly need both batch and (near) real-time data processing capabilities from big data systems.

Yet the Hadoop framework only efficiently stores and processes batch data (high volumes of data where a group of transactions is collected over a period of time - data is collected, entered, processed and then batch results produced) and is complicated to build and use. Hadoop implementation and operation is expensive and challenging considering a shortage of professionals with the expertise and experience to work with Hadoop, MapReduce, HDFS, HBase, Pig, Hive, Cascading, Scalding, Storm, Spark Shark and other new technologies.

The bottom line is the Hadoop framework may not be sustainable considering serious pain points and flaws. 

The Haboob is moving fast to cloud Hadoops future and I predict many organizations will fail to get a reasonable return on their Hadoop investment.

There must be a better "smart" data framework and platform in the future that is simple to use, cost-effective and provides a better return on investment.

See: http://bit.ly/1896gM

Views: 2016

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Riya Saxena on April 28, 2015 at 11:11pm
Thanks for your post! When Apache Hadoop debuted more than eight years ago, it was seen as the future of Big Data – data would be stored cheaper, processed faster and the enterprise would be more efficient overall. Today, Hadoop is used sparingly in the Enterprise world, but the promise of the technology is still true. Recently, Allied Market Research forecasted that the global Hadoop market value will reach $50.2 billion by 2020. More at www.youtube.com/watch?v=1jMR4cHBwZE
Comment by Donna Dupree on April 25, 2014 at 12:03pm

Hadoop can be a "Hadump", if you are a data hoarder. A good approach is to determine what data justifies its storage and is analysis worthy. Then sweep the data to pull out key elements for mining and analysis. Basically, you need to treat the data as garbage - some should be "recycled" and some should be destroyed. Why store it all, forcing you to dig through it all to find the golden nuggets? Sift and sort it first, then toss out the fluff.

Comment by Simon Thompson on March 31, 2014 at 1:03pm

1) It's very easy to learn 

2) Yes - problem

3) Only if you are an idiot

4) See 3. Ignore vendors

5) This was the case 2 years ago, now, not so much

6) Why is this a problem? 

7) They say it is free and gives you christmas presents, why does reality surprise you? 

8) You mean you have to have IS people? really? 

9) Nope (see 3)

10) Killer : but then no one can value data a-priori, meaning the alternative is not storing data and not getting answers. This can be bad. Alternatively store lots cheaply and have it there to use if you need it - like data insurance. 

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service