By Siddartha Mani
Few would argue with the statement that Hadoop HDFS is in decline. In fact, the HDFS part of the Hadoop ecosystem is in more than just decline - it is in freefall. At the time of its inception, it had a meaningful role to play as a high-throughput, fault-tolerant distributed file system. The secret sauce was data locality.
By co-locating compute and data on the same nodes, HDFS overcame the limitations of slow network access to data. The…
Continue
Added by Jonathan Symonds on August 6, 2019 at 1:00pm —
No Comments
Hadoop - MapReduce in an easy way
In the previous blog, we discussed about HDFS, one of the main components of Hadoop. I highly recommend going through that blog before moving onto MapReduce. This blog will introduce you to MapReduce, which is…
Continue
Added by Aafrin Dabhoiwala on September 2, 2018 at 8:30am —
No Comments
This blog is to give brief introduction about Hadoop for those who know next to nothing about this technology. Big Data is at the foundation of all the megatrends that are happening today, from social to the cloud to mobile devices to gaming. This blog will help to build the foundation to take the next step in learning this interesting technology. Let's get started:
1. What's Big Data?
Ever since…
Continue
Added by Aafrin Dabhoiwala on August 26, 2018 at 12:30pm —
1 Comment
In this post I will share some tips I learned after using the Apache Hadoop environment for some years, and doing many many workshops and courses. The information here considers Apache Hadoop around version 2.9, but it could definably be extended to other similar versions.
These are considerations for when building or using a Hadoop cluster. Some are considerations over the Cloudera distribution. Anyway, hope it…
Continue
Added by Renata Ghisloti Duarte Souza Gra on May 24, 2018 at 5:00pm —
No Comments

Imagine there are two girls standing in front of you – The first girl is cute, beautiful, interesting and has the smile that any guy would die for. And the other girl is average-looking, quiet, not-so-impressive… no different from the ones that you usually see in the restaurant cash counter. Which girl will you call out for a date? If you’re like me, you will choose the attractive girl. You see, life is full of options and making the right choice is what matters the…
Continue
Added by Venkatesan M on May 16, 2018 at 12:00am —
No Comments
Special thanks to Brandon Kaier (@bkaier) for his research and thoughts on the Digital Twins concept.
Unilever, one of the Consumer Package Goods (CPG) industry’s titans with over 400 brands and annual sales greater than $60B, recently bought Dollar Shave Club for $1B. Now normally I would not think twice about such an acquisition, peanuts in the world of mergers and…
Continue
Added by Bill Schmarzo on December 7, 2017 at 12:00pm —
No Comments
This blog is written in collaboration with the witting and insightful Matt Maccaux and his leading edge work around our elastic data platform and data lake.
“Our organization is abuzz with the concept of data lakes!” a customer recently told me. And rightfully so, as the data lake holds the potential to help organizations become more effective at leveraging data and analytics to power their business models. That’s exactly what we propose when we…
Continue
Added by Bill Schmarzo on October 14, 2017 at 2:30pm —
No Comments

Technology has remarkably changed the way we live today, there is no denial to it. Compared with our ancestors, we stand far away from them in using different technologies for our day-to-day works.
So many technologies are developed in the past couple of years that have revolutionized our lives, and it’s impossible to list each of them. Though technology changes fast with time, we can observe the trends in which it changes. Last year, 2016 had bought so many fresh…
Continue
Added by Venkatesan M on October 6, 2017 at 9:00pm —
No Comments
Over the last few years, organizations have made a strategic decision to turn big data into competitive advantage. Owing to rapid changes in the trends of BI and DW space, Big Data has been driving the organizations to explore the implementation aspects on how to integrate big data into the existing EDW infrastructure. The process of extracting data from multiple sources such as social media, weblogs, sensor data etc. and transforming that data suit the organization’s analytical needs is…
Continue
Added by Sirish M Simha on August 24, 2017 at 10:30pm —
1 Comment
Hadoop – Introduction & features
Let us start with what is Hadoop and what are Hadoop features that make it so popular.
Hadoop is an open-source software framework for distributed storage and distributed processing of extremely large data sets. Important features of Hadoop are:
Hadoop is an open source project. It means its code can be modified to business requirements.
In Hadoop, data is highly available and…
Continue
Added by Sheetal Sharma on July 31, 2017 at 7:30pm —
No Comments
Q1. What exactly is Hadoop?
A1. Hadoop is a Big Data framework to process huge amount of different types of data in parallel to achieve performance benefits.
Q2. What are 5 Vs of Big Data ?
A2. Volume – Size of the data
Velocity – Speed of change of data
Variety – Different types of data : Structured, Semi-Structured, Unstructured data.
Q3. Give me examples of Unstructured data.
A3. Images, Videos, Audios etc.
Q4. Tell me about Hadoop file system…
Continue
Added by Sarvesh Kumar on February 20, 2017 at 1:30am —
No Comments
Does Big Data mean Hadoop? Not really, however when one thinks of the term Big Data, the first thing that comes to mind is Hadoop along with heaps of unstructured data. An exceptional lure for data scientists having the opportunity to work with large amounts data to train their models and businesses getting knowledge previously never imagined. But has it lived up to the hype? In this article, we will look at a brief history of Hadoop and see how it stands today.
2015 Hype Cycle –…
Continue
Added by Kashif Saiyed on September 30, 2016 at 6:00pm —
No Comments
Hadoop is a big data management tool gaining popularity all over the world due to the need to manage the increasingly voluminous and diverse data. Due to the increasing popularity of social media and various photo sharing websites, the amount of unstructured data generated on the Internet is growing rapidly. This has overburdened the conventional data management…
Continue
Added by Pragati Pa on September 12, 2016 at 9:30pm —
No Comments
In the book Hadoop: The definitive guide, Tom white quotes Grace Hopper, “In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers.” For long Hadoop has been the data analytics system preferred by businesses all over. The recent entry of the spark engine has however given businesses an option other than Hadoop for data analytics…
Continue
Added by Tanmay Bhandari on June 7, 2016 at 7:29pm —
No Comments
Since its inception in the year 2008, the global Hadoop market has observed growth at a tremendous pace. This market, valued US$1.5 billion in 2012, is estimated to grow at a CAGR of 54.7% from 2012 to 2018. By the end of 2018, this market could amass a net worth of US$20.9 billion. With the massive amount of data generated every day across major industries, the global Hadoop market is anticipated to observe significant growth in the future as well.
Why…
Continue
Added by Ankit Jain on May 23, 2016 at 11:00pm —
No Comments
For a world, that's churning out and recording infinite volumes of data every second, where dependency on data is steeply rising, the need to implement Big Data architecture becomes natural.
Big Data solutions can resolve specific big data problems and requirements for data analysis, curation, capturing, sharing, searching,…
Continue
Added by Ritesh Gujrati on May 5, 2016 at 3:30am —
No Comments
The ongoing pursuit of data solutions occupies mindshare of consumers, vendors and service providers alike as they invest considerable amount of time, costs and efforts. The past attempts to concur data have resulted in solutions that combined databases, applications and tools with limited success. We are still struggling with a few unresolved, persistent legacy challenges such as -
Data everywhere
Today, every enterprise has huge data…
Continue
Added by Suhas Marathe on February 23, 2016 at 9:48am —
No Comments
Celebrate the Big Data Problems – #2
How to identify the no of buckets for a Hive table while executing the HiveQL DDLs ?
The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework.
Context:
Bucketing is another…
Continue
Added by Kumar Chinnakali on January 21, 2016 at 7:41pm —
No Comments

Celebrate the Big Data Problems – #1
Daily we are facing many big data problems in production, PoC, and more perspective. Do we have any common repo to collect and share? No, as we know we don’t have any. As always dataottam is looking forward to share the learnings with community to celebrate their similar, same kind of problems. And…
Continue
Added by Kumar Chinnakali on January 15, 2016 at 11:30pm —
No Comments
Big Data is problem statement and it can be solved with one of the tools like Apache Hadoop. But having Apache Hadoop as infra to do our proof of concepts, proof of values is little challenging. Hence we brought 3 click ideas to have your Apache Hadoop installed.
What is Perquisite?
- Ubuntu 14.04
- Internet Connection
Can I have the Script? Yes
How…
Continue
Added by Kumar Chinnakali on January 12, 2016 at 9:53am —
No Comments