Hadoop is an open source framework for storing massive amounts of data on clusters of commodity hardware.
Haboob is a dense dust storm that moves fast along the landscape and reduces visibility.
The goal of modern big data analytical ecosystems and data science is to help organizations obtain greater visibility to make better decisions and obtain competitive advantage. While Hadoop has certain cost advantages over traditional data warehouses for storing and batch processing raw and structured data, the complicated framework may be reducing visibility for many organizations.
Indeed, the Hadoop framework ecosystem appears to be evolving into an expensive, complicated morass of many moving parts, vendors and consultants that is producing more pain than gain and reducing return on investment (ROI) for the following reasons:
1. Shortage of qualified engineers to build and operate the Hadoop framework.
2. Shortage of qualified data scientists to obtain actionable, valuable meaning from the huge mass of valuable and worthless data.
3. The Hadoop framework is complicated and confusing to operate limiting usefulness.
4. Numerous, complicated and confusing vendor offerings around the Hadoop framework.
5. Hadoop is batch oriented when (near) real-time data ingestion and analysis will be in greater demand to create real value from data. MapReduce is high latency and a speed layer is needed for real-time creating unwanted complexity.
6. Many organizations are using Hadoop to simply dump data (Hadump) with no information strategic plan.
7. The Hadoop framework is more expensive to build and operate than vendors and consultants disclose (although cheaper than the traditional data warehouse architecture).
8. Expensive consultants are required to evaluate and select tech to efficiently operate the Hadoop framework - as well as design and help execute a customized information strategy integrating legacy data warehouses with Hadoop.
9. It is very difficult to integrate traditional data warehouse / business intelligence systems with the Hadoop framework.
10. The Hadoop framework lacks useful software and big data-analysis applications to help determine valuable, relevant data - creating the need to store massive amounts of worthless data - leading to storing too much and make answering both simple and difficult questions harder.
Organizations need an information strategy and data platform that provides valuable, actionable insights from both raw unstructured and semi-structured data as well as internal and external data sources. Little value is gained from analyzing internal structured data - the big value comes from analyzing a mixture of both internal and external data sources. Additionally, organizations increasingly need both batch and (near) real-time data processing capabilities from big data systems.
Yet the Hadoop framework only efficiently stores and processes batch data (high volumes of data where a group of transactions is collected over a period of time - data is collected, entered, processed and then batch results produced) and is complicated to build and use. Hadoop implementation and operation is expensive and challenging considering a shortage of professionals with the expertise and experience to work with Hadoop, MapReduce, HDFS, HBase, Pig, Hive, Cascading, Scalding, Storm, Spark Shark and other new technologies.
The bottom line is the Hadoop framework may not be sustainable considering serious pain points and flaws.
The Haboob is moving fast to cloud Hadoops future and I predict many organizations will fail to get a reasonable return on their Hadoop investment.
There must be a better "smart" data framework and platform in the future that is simple to use, cost-effective and provides a better return on investment.