Subscribe to DSC Newsletter

Does Apache Tez run slower than hive on larger dataset (~2.5 TB)?

We have started to look into testing tez query engine. From initial results, we are getting 30% performance boost over Hive on smaller data set(1-10 GB) but Hive starts to perform better than Tez as data size increases. Like when we run a hive query with Tez on about 1.3 TB worth of data, it performs worse than hive alone.(~20% less performance) Details are in the post below.

Evaluating Tez

On a cluster with 1.3 TB RAM, I set the following property :

set tez.task.resource.memory.mb=10000;
set =-Xmx47364m;
set hive.tez.container.size=59205;

Is it normal or I am missing some property / not configuring some property properly? Also, I am using an older version of Tez as of now. Could that be the issue too? I still to bootstrap latest version of Tez on EMR and test it and see if that could do any better

Tags: Hadoop, Hive, MapReduce, Tez

Views: 501

Reply to This

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service