We've given Hadoop almost 10 years to mature, invested billions, and very few companies are seeing the return on investment. Several companies have tried to make Hadoop a real-time analytical platform, incorporating SQL-like facades on top, but the latency is still not where it needs to be for interactive applications. Even Google, a true big data user, has moved on and is using more dataflow / flow-based programming approaches. Why? It just makes sense...
Instead of saving all data into an HDFS, and then trying to run full parallel table scans to reduce the data, alternative systems can answer the same questions while the data is in motion. We live in an extremely chaotic data environment. Data is everywhere, exists in many different formats, and comes at us from many different directions. It's unforseeable that we'll be able to duplicate all of this data into a single HDFS.
By defining your processing scheme as functional blocks with inputs and outputs, and then describing dependencies between the functional blocks as dataflow connections, a platform can systematically parallelize the execution of the defined processing scheme. As data is flowing, it can be filtered, routed, aggregated and disseminated to other systems. This results in a system that is making decisions and getting answers out of its data in real-time.
It's platforms like Composable Analytics which are going to be the big players in this next big data wave. Flow-based programming approaches provide agile and fluid mechanics for processing data.
Data got you overwhelmed? Go with the flow!
Posted 1 March 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central