Few days back i have attended a good webinar conducted by Metascale on topic “Are You Still Moving Data? Is ETL Still Relevant in the Era of Had... This post is targeting this webinar.
In summary, this webinar had nicely explained about how enterprise can use Hadoop as a data hub along with the existing Datawarehouse set up. “Hadoop as a Data Hub” this line itself raised lot of questions in my mind:
Your use case doesn’t support Map Reduce framework during ETL.
You process relatively small amount of data using Hadoop. Hadoop is not meant for this and takes longer than it supposed to be.
You try to join some information with existing datawarehouse and unnecessary duplicate the information at HDFS as well as at conventional RDBMS.
Now, having these questions in place doesn’t mean Hadoop can’t be projected as a replacement/amendment of existing datawarehouse strategy. On contrary, I could see some different possibilities and ways for Hadoop to sneak-in into the existing enterprise data architecture. Here are my few cents:
I think, effort to bring Hadoop to enterprise require diligent changes in datawarehouse reference architecture. We are going to change a lot in our Reference Architecture when we bring Hadoop into the enterprise.
So, it is not important to learn WHERE HADOOP IS THE RIGHT FIT? but it is very important to understand HOW HADOOP IS THE RIGHT FIT?
Original post: http://datumengineering.wordpress.com/2013/11/17/etl-elt-and-data-h...
Comment
Our CM Instance of was responding very slow. When the day starts it works well. But as the day moves on it starts becoming slow. And sometime it becomes slow from start of the day itself.[It was totally random] Locking an item, navigating from one item to another item, saving an item, previewing an item etc. was taking lot of time.
Thank you for the article. Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.More at https://www.excelr.com/big-data-hadoop/
Thanks for your post!
Most data warehouse systems are front-ended with an ETL system (Extract-Transform-Load). The purpose of this system is to:
This methodology for data warehousing was made commonplace in an age where storage was expensive, processing was slow, and the reports you wanted to extract from your data at the end of the entire flow were known in advance. But the world has changed, and some of the limitations of ETL are making data architects think differently. Learn more https://intellipaat.com/informatica-training/
DMX-h, Syncsort’s ETL /DI product for Hadoop runs natively on Hadoop and integrates very closely with the Map Reduce paradigm to perform high volume ETL batch operations like large JOINS, AGGREGATIONS, etc., which doesn’t require users to rip the data out of Hadoop, do the ETL, and put it back into Hadoop as you referenced. Let me explain how.
DMX-h’s ETL engine integrates via Syncsort’s contribution to the Apache open source community, patch MAPREDUCE-2454, which introduced a new feature to the Hadoop MapReduce framework to allow alternative implementations of the Sort phase. This engine is the same ETL engine Syncsort offers outside of Hadoop and uses the same graphical UI, thereby making it very easy and seamless for existing ETL developers and architects to make the transition to ETL in Hadoop/Map Reduce – eliminating the need for Java/PIG expertise. The same lightweight DMX-h engine can be used to extract data from disparate source systems (Mainframe, RDBMS, files, etc.), pre-process, cleanse, validate and load it to HDFS, and then be used to implement very efficient and high speed Map Reduce ETL in Hadoop. Here are some links to blogs where you can learn more:
http://blog.syncsort.com/2013/02/hadoop-mapreduce-to-sort-or-not-to...
http://blog.syncsort.com/2013/05/our-making-a-better-etl-for-hadoop...
http://blog.syncsort.com/2013/10/big-data-warehouse-meetup-use-case...
We’re actually doing a webinar on exactly this topic – “Offload the Data Warehouse in the Age of Hadoop. Why Hadoop means more data savings & less data warehouse. Use the link below to register!
http://www.syncsort.com/en/Offloading-the-Data-Warehouse-Webcast-12...
Last but not least, you can even use the Syncsort Hadoop offering in the cloud – for free!
I would be very interested in hearing other opinions for this. I am trying to understand where others feel hadoop fits into a data warehouse solution.
Posted 29 March 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central