Hello Fellow Data Scientists
This is my first forum query. I would like to keep it short and crisp. I have been working on Daata Science assignments lately. Would like to try my hand at ETL on large datasets using Amazon EMR. I earlier tried to integrate R with Hadoop through RHADOOP but I failed. Its a little complicated process.
Now somebody from a large ecom company suggested me to try EMR. I currently have zero knowledge on that. I was reading about it from https://aws.amazon.com/elasticmapreduce/ Using this I can either use Pig/Hive to write the codes and get it processed through MapReduce.
Not Since, I know it a paid version for organization. I can still try my luck on this. Need help here.
Also, a similar one for Cloudera CDH. I mean how and where to write codes in Pig/hive on CDH as well. Please suggest a way out.