Subscribe to DSC Newsletter

All Blog Posts Tagged 'Hadoop' (63)

Hadoop for Beginners - Part 2

Hadoop - MapReduce in an easy way

In the previous blog, we discussed about HDFS, one of the main components of Hadoop. I highly recommend going through that blog before moving onto MapReduce. This blog will introduce you to MapReduce, which is…

Continue

Added by Aafrin Dabhoiwala on September 2, 2018 at 8:30am — No Comments

Hadoop for Beginners- Part 1

This blog is to give brief introduction about Hadoop for those who know next to nothing about this technology. Big Data is at the foundation of all the megatrends that are happening today, from social to the cloud to mobile devices to gaming. This blog will help to build the foundation to take the next step in learning this interesting technology. Let's get started:

1. What's Big Data?

Ever since…

Continue

Added by Aafrin Dabhoiwala on August 26, 2018 at 12:30pm — 1 Comment

Apache Hadoop Admin Tips and Tricks

In this post I will share some tips I learned after using the Apache Hadoop environment for some years, and  doing many many workshops and courses. The information here considers Apache Hadoop around version 2.9, but it could definably be extended to other similar versions.

These are considerations for when building or using a Hadoop cluster. Some are considerations over the Cloudera distribution. Anyway, hope it…

Continue

Added by Renata Ghisloti Duarte Souza Gra on May 24, 2018 at 5:00pm — No Comments

7 Reasons Why Java Developers Should Learn Hadoop

Imagine there are two girls standing in front of you – The first girl is cute, beautiful, interesting and has the smile that any guy would die for. And the other girl is average-looking, quiet, not-so-impressive… no different from the ones that you usually see in the restaurant cash counter. Which girl will you call out for a date? If you’re like me, you will choose the attractive girl. You see, life is full of options and making the right choice is what matters the…

Continue

Added by Venkatesan M on May 16, 2018 at 12:00am — No Comments

CPG Industry Levels Playing Field with Power of One

Special thanks to Brandon Kaier (@bkaier) for his research and thoughts on the Digital Twins concept.

Unilever, one of the Consumer Package Goods (CPG) industry’s titans with over 400 brands and annual sales greater than $60B, recently bought Dollar Shave Club for $1B. Now normally I would not think twice about such an acquisition, peanuts in the world of mergers and…

Continue

Added by Bill Schmarzo on December 7, 2017 at 12:00pm — No Comments

Data or Algorithms – Which is More Important?

Summary:  Which is more important, the data or the algorithms?  This chicken and egg question led me to realize that it’s the data, and specifically the way we store and process the data that has dominated data science over the last 10 years.  And it all leads back to Hadoop.

 

Recently I was challenged to speak on the role of data in data…

Continue

Added by William Vorhies on November 28, 2017 at 10:36am — 1 Comment

Data Lake Business Model Maturity Index

This blog is written in collaboration with the witting and insightful Matt Maccaux and his leading edge work around our elastic data platform and data lake.

“Our organization is abuzz with the concept of data lakes!” a customer recently told me. And rightfully so, as the data lake holds the potential to help organizations become more effective at leveraging data and analytics to power their business models. That’s exactly what we propose when we…

Continue

Added by Bill Schmarzo on October 14, 2017 at 2:30pm — No Comments

Technology Trends That Will Dominate 2017: Big Data, IoT, AWS and AI

Technology Trends That Will Dominate 2017: Big Data, IoT, AWS and AI

Technology has remarkably changed the way we live today, there is no denial to it. Compared with our ancestors, we stand far away from them in using different technologies for our day-to-day works.

So many technologies are developed in the past couple of years that have revolutionized our lives, and it’s impossible to list each of them. Though technology changes fast with time, we can observe the trends in which it changes. Last year, 2016 had bought so many fresh…

Continue

Added by Venkatesan M on October 6, 2017 at 9:00pm — No Comments

Beyond Datawarehouse - The Data Lake

Over the last few years, organizations have made a strategic decision to turn big data into competitive advantage. Owing to rapid changes in the trends of BI and DW space, Big Data has been driving the organizations to explore the    implementation aspects on how to integrate big data into the existing EDW infrastructure. The process of extracting data from multiple sources such as social media, weblogs, sensor data etc. and transforming that data suit the organization’s analytical needs is…

Continue

Added by Sirish M Simha on August 24, 2017 at 10:30pm — 1 Comment

Limitations of Hadoop – How to overcome Hadoop drawbacks

Hadoop – Introduction & features

Let us start with what is Hadoop and what are Hadoop features that make it so popular.

Hadoop is an open-source software framework for distributed storage and distributed processing of extremely large data sets. Important features of Hadoop are:

Hadoop is an open source project. It means its code can be modified to business requirements.

In Hadoop, data is highly available and…

Continue

Added by Sheetal Sharma on July 31, 2017 at 7:30pm — No Comments

Top Hadoop Interview Questions & Answers

Q1. What exactly is Hadoop?

A1. Hadoop is a Big Data framework to process huge amount of different types of data in parallel to achieve performance benefits.

Q2. What are 5 Vs of Big Data ?

A2. Volume – Size of the data

Velocity – Speed of change of data

Variety – Different types of data : Structured, Semi-Structured, Unstructured data.

Q3. Give me examples of Unstructured data.

A3. Images, Videos, Audios etc.

Q4. Tell me about Hadoop file system…

Continue

Added by Sarvesh Kumar on February 20, 2017 at 1:30am — No Comments

Why Not So Hadoop?

Does Big Data mean Hadoop? Not really, however when one thinks of the term Big Data, the first thing that comes to mind is Hadoop along with heaps of unstructured data. An exceptional lure for data scientists having the opportunity to work with large amounts data to train their models and businesses getting knowledge previously never imagined. But has it lived up to the hype? In this article, we will look at a brief history of Hadoop and see how it stands today.

2015 Hype Cycle –…

Continue

Added by Kashif Saiyed on September 30, 2016 at 6:00pm — No Comments

Staying at the Cutting Edge with Hadoop; Continuous Innovation Boosts Market Globally

Hadoop is a big data management tool gaining popularity all over the world due to the need to manage the increasingly voluminous and diverse data. Due to the increasing popularity of social media and various photo sharing websites, the amount of unstructured data generated on the Internet is growing rapidly. This has overburdened the conventional data management…

Continue

Added by Pragati Pa on September 12, 2016 at 9:30pm — No Comments

Hadoop VS Spark: Which is the best Data Analytics engine?

In the book Hadoop: The definitive guide, Tom white quotes Grace Hopper, “In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers.” For long Hadoop has been the data analytics system preferred by businesses all over. The recent entry of the spark engine has however given businesses an option other than Hadoop for data analytics…

Continue

Added by Tanmay Bhandari on June 7, 2016 at 7:29pm — No Comments

Why Hadoop? Streamlined Nature, Scalability and Cost-Effectiveness

Since its inception in the year 2008, the global Hadoop market has observed growth at a tremendous pace. This market, valued US$1.5 billion in 2012, is estimated to grow at a CAGR of 54.7% from 2012 to 2018. By the end of 2018, this market could amass a net worth of US$20.9 billion. With the massive amount of data generated every day across major industries, the global Hadoop market is anticipated to observe significant growth in the future as well.

Why…

Continue

Added by Ankit Jain on May 23, 2016 at 11:00pm — No Comments

How to Architect a Big Data Application to Unleash Its Full Potential

For a world, that's churning out and recording infinite volumes of data every second, where dependency on data is steeply rising, the need to implement Big Data architecture becomes natural.

Big Data solutions can resolve specific big data problems and requirements for data analysis, curation, capturing, sharing, searching,…

Continue

Added by Ritesh Gujrati on May 5, 2016 at 3:30am — No Comments

Two Sides of "Big?" Data

The ongoing pursuit of data solutions occupies mindshare of consumers, vendors and service providers alike as they invest considerable amount of time, costs and efforts. The past attempts to concur data have resulted in solutions that combined databases, applications and tools with limited success. We are still struggling with a few unresolved, persistent legacy challenges such as - 

Data everywhere

Today, every enterprise has huge data…

Continue

Added by Suhas Marathe on February 23, 2016 at 9:48am — No Comments

Celebrate the Big Data Problems – #2

Celebrate the Big Data Problems – #2

How to identify the no of buckets for a Hive table while executing the HiveQL DDLs ?

The dataottam team has come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework.

Context:

Bucketing is another…

Continue

Added by Kumar Chinnakali on January 21, 2016 at 7:41pm — No Comments

Celebrate the Big Data Problems – #1

Celebrate the Big Data Problems – #1

Daily we are facing many big data problems in production, PoC, and more perspective. Do we have any common repo to collect and share?  No, as we know we don’t have any. As always dataottam is looking forward to share the learnings with community to celebrate their similar, same kind of problems.  And…

Continue

Added by Kumar Chinnakali on January 15, 2016 at 11:30pm — No Comments

Just 3 clicks to get your Apache Hadoop installed!

Big Data is problem statement and it can be solved with one of the tools like Apache Hadoop. But having Apache Hadoop as infra to do our proof of concepts, proof of values is little challenging. Hence we brought 3 click ideas to have your Apache Hadoop installed.

What is Perquisite?

  • Ubuntu 14.04
  • Internet Connection

Can I have the Script? Yes

How…

Continue

Added by Kumar Chinnakali on January 12, 2016 at 9:53am — No Comments

Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

1999

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service