Subscribe to DSC Newsletter

How does hive insert into with select work at memory and storage level

I am trying to understand how does hive insert into with select works at memory and storage level.
One of my oozie workflow fails due to GC overhead issue.
Our query is: insert into table <> select a,b,c,d from table2.
Now the map reduce job finished and then we get the message :
Ended job = jobid: <>
Loading data to table table2 
Failed with exception java.lang.outofmemoryError: GC overhead limit exceeded.
Failed: execution error, return code 1 from .org.apache.hadoop.hive.ql.exec.MoveTask
And then I get Chmod error.
Can you please tell me after select query, does hive store the result in memory or at a physical temp storage?
How does it move the result to the destination folder?
I did increase mapred and heap memory at both script level and oozie launcher level and still it failed. Does it even have to do anything with mapred and heap memory? 
Thanks,
nikita

Tags: GC, Oozie, heap, hive, memory, overhead

Views: 311

Reply to This

Replies to This Discussion

What is the table format? Different table formats require different amounts of memory to create the output HDFS file. Text and Sequencefiles need very little memory.

Hive copies data from table2 to a local files on some of the servers, then copies from those local files to the HDFS files in the output table. Hive stores nothing in memory.

I have never used Oozie. Does the Hive script do this when you run it from the HiveServer2 command line? Or from a JDBC client?

Follow. I have the similar issue

RSS

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service