Home » Uncategorized

Top Hadoop Terms You Need to Know

The post is by Joseph Macwan. He is technical writer with a keen interest in business, technology and marketing topics.


What Is Hadoop Distributed File System (HDFS)?

You will come across this term very frequently. An HDFS is a storage system that is spread in the Hadoop framework. Being a data repository, it stores data and grants access to it wherever required. In terms of the HDFS architecture, NameNodes and DataNodes are two prominent aspects. It is generally the default storage system in Hadoop ecosystem with a major role to play in access of the data to the application.

What Is Hadoop Common?

As the name suggest, Hadoop Common acts a central library with utilities. These utilities facilitate the working of modules which are communicate to transfer information. Hadoop Common is an integral part of the Hadoop ecosystem. But, its usage is limited to developers who are involved in programming.

What Is HBase?

HBase is a short variant for Hadoop database. It acts a storage unit but this is not to be confused with HDFS. An HDFS is the underlying system which HBase operates on. The advantage of using HBase is that it allows users to read and modify data in real-time. It is also known as column-oriented database because of the way data is structured.

What Is MapReduce?

MapReduce is a core component of the Hadoop ecosystem. It enables processing of large data sets. The reason for MapReduce’s popularity is its ability to process unstructured data. It is compatible with almost all popular programming languages; but, the preferred language remains to be Java. MapReduce is often characterized as a fault-tolerant system because it works in parallel on multiple clusters.

To read the full article (with more definitions including YARN, Hive, Pig, Spark, Cassandra and more), click here. For more Hadoop related articles on DSC click here

DSC Resources

Popular Articles