Subscribe to DSC Newsletter

This article was posted on Intellipaat. 

Hadoop is an open-source framework developed in Java, dedicated to store and analyze the large sets of unstructured data. It is a highly scalable platform which allows multiple concurrent tasks to run from single to thousands of servers without any delay.

It consists of a distributed file system that allows transferring data and files in split seconds between different nodes. Its ability to process efficiently even if a node fails makes it a reliable technology for companies which cannot afford to delay or stop their activities.

How did Hadoop evolve? 

Inspired by Google’s MapReduce which splits an application into small fractions to run on different nodes, scientists Doug Cutting and Mike Cafarella created a platform called Hadoop 1.0 and launched it in the year 2006 to support distribution for Nutch search engine.

Hadoop was made available for public in November 2012 by Apache Software Foundation. Named after a yellow soft toy elephant of Doug Cutting’s kid, Hadoop has been continuously revised since its launch.

As part of its revision, it launched its second revised version Hadoop 2.3.0 on 20th Feb, 2014 with some major changes in the architecture.

What comprises of HADOOP architecture/Hadoop ecosystem? 

The architecture/ecosystem of Hadoop can be broken down into two branches, i.e., core components and complementary/other components.

Core Components:

As per the latest version of Apache Hadoop framework, there are four basic or core components.

What you will find in the full article:

  • Why should we use Hadoop?
  • Scope of Hadoop
  • Why do we need Hadoop?
  • Who is the right audience to learn Hadoop?
  • How Hadoop will help you in career growth?

To read the original article, click here.

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 3947

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by William Vorhies on April 4, 2017 at 11:28am

The interesting thing about this type of article is that it focuses on the architecture and not really on the fundamentals.  Most folks would say the importance of Hadoop is that it ushered in the era of NoSQL in that it could store and retrieve not only very large volumes of data but also unstructured data and data in motion.

Doug Cutting who is widely credited as the father of Hadoop says otherwise.  He says the most important thing about Hadoop is that it allowed multiple computers to function as one ushering in the era of MPP (massive parallel processing) on commodity servers.  In Data Science this is what made deep learning possible as well as the analysis of IoT, image and speech processing, much of robotics/reinforcement learning, and the analysis of very large databases like web logs containing very sparse signals, on which the whole art of recommenders relies.

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service