Subscribe to DSC Newsletter

Hello Fellow Data Scientists

This is my first forum query. I would like to keep it short and crisp. I have been working on Daata Science assignments lately. Would like to try my hand at ETL on large datasets using Amazon EMR. I earlier tried to integrate R with Hadoop through RHADOOP but I failed. Its a little complicated process.

Now somebody from a large ecom company suggested me to try EMR. I currently have zero knowledge on that. I was reading about it from Using this I can either use Pig/Hive to write the codes and get it processed through MapReduce.

Not Since, I know it a paid version for organization. I can still try my luck on this. Need help here.

Also, a similar one for Cloudera CDH. I mean how and where to write codes in Pig/hive on CDH as well. Please suggest a way out.

Tags: Big, Data, Hive, Pig

Views: 240

Reply to This

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service