Subscribe to DSC Newsletter

Rahul Patodi's Blog (4)

S3 as Input or Output for Hadoop MR jobs

How to use s3 (s3 native) as input / output for hadoop MapReduce job. In this tutorial we will first try to understand what is s3, difference between s3 and s3n and how to set s3n as Input and output for hadoop map reduce job. Configuring s3n as I/O may be useful for local map reduce jobs (ie MR run on local cluster), But It has significant importance when we run elastic map reduce job (ie when we run job on cloud). When we run job on cloud we need to specify storage location for input as…

Continue

Added by Rahul Patodi on November 11, 2012 at 8:00am — No Comments

Hadoop:- A soft Introduction



What is Hadoop:

Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like…
Continue

Added by Rahul Patodi on November 11, 2012 at 8:00am — No Comments

S3 instead of HDFS with Hadoop

In this article we will discuss about using S3 as replacement of HDFS (Hadoop Distributed File System) on AWS (Amazon Web Services), and also about what is the need of using S3. Before coming to original use-case and performance of S3 with Hadoop let’s understand …

Continue

Added by Rahul Patodi on June 27, 2012 at 9:07pm — No Comments

Deploy Hadoop Cluster

Step by Step Tutorial to Deploy Hadoop Cluster (fully distributed mode):

Setting Hadoop in cluster requires multiple machines/nodes, one node will act as master and rest all will act as slaves.
If you want Hadoop quick introduction please click here.
If you want to setup hadoop in pseudo distributed mode please …
Continue

Added by Rahul Patodi on May 25, 2012 at 8:36am — No Comments

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service