*The term "big data" comes up frequently in articles and office discussions, but what does it really mean to utilize big data?*

Big data: it’s one of those buzzwords you can’t seem to get away from. Though you might have an idea of what it means, there’s likely plenty you don’t know the ins and outs of what it takes to really harness the power of big data.

We sat down with Robert Swisher, Chief Technology Officer at Business.com, to get a better idea of what the big deal is with big data. He breaks down what it all means and how to make it work for you.

*
*

*What exactly is big data?*

RS: It’s basically a large set of data. People use different terms, but it’s just a huge amount of data—structured and unstructured—that’s coming in at a high velocity and a big volume, and a lot of times it’s not that “clean,” so you have manipulate, sanitize and covert that data to clean it up and make it usable.

At its core, though, it’s just a gigantic set of data.

RS: So, for example, it could be all the point of sale data for Best Buy. That’s a huge data set—everything that goes through a cash register. For us, it’s all of the activity on a site, so a ton of people coming through, doing a bunch of different things. It’s not really exactly cohesive and structured.

With point of sale, for example, you’re looking at what people are purchasing and what they’ve done historically. You’re looking at what they’ve clicked on in email newsletters, loyalty program data and coupons that you’ve sent them in direct mail—have those been redeemed? All these things come together to form a data set around purchasing behavior. You can look at what “like” customers do in order to predict what similar customers will buy as well.

RS: I think that the technology needed time to evolve. The core technology that’s used for big data was developed about ten years ago. There’s the software component that allows you to manage these data sets, and there’s the hardware component of storage and compute costs that has been getting cheaper, which makes big data more accessible for businesses. They can now make use of their large data sets with off the shelf, open source technology.

RS: In my opinion, people think it’s this magical thing. They think, “We’ll just turn that on and now things will just work and we’ll know all this stuff.” But it’s just not that simple—it’s actually really complicated and you need the right equipment and people that understand how to analyze and work with big data.

Increasingly, simplified tools are coming out for non-technical users to create dashboards and get some of the information they’re looking for, but it is a really specialized skillset. It’s not something you can just turn on and have. There’s an investment in people, time and hard costs to make this stuff work.

RS: That would be one way to do it. The other way to go about it would be to make a list of what types of data you have that you’re not making use of. Ask yourself, what are all the different types of data that we collect on a regular basis that we may or may not be doing things with, and how can we combine those to find intersections? How can we analyze them? This will also help you determine where there are gaps in the information you collect.

RS: The *volume* of your entire data set—meaning everything coming in—is probably measured in gigabytes or terabytes, which is storage on disk scale.

*Velocity* is the rate at which the data in coming in, and it would be measured in units like records per second or bits per second, for example.

*Variety* means that you have a bunch of different pieces of information that you’re putting together to build a cohesive model around what you’re looking to solve or understand.

*Veracity* means that often, data is unclean and you have to deal with that. There’s no metric that I’m aware of to measure it, but it’s important.

RS: A good example is junk. Let’s say people are submitting email addresses, and a lot of times, there are typos, misspellings or it’s not real. Anytime that you’re looking at things that are based on user input, there are often a lot of mistakes or just blatant, false information.

RS: You either need to have engineers and tools in-house, or you need to find a consultancy or firm that specializes in it. The latter can come in and help you get it setup and get you started, which is a good route.

There are some off the shelf platforms that can give you some insight, like GoodData and Tableau, where you can plug in the data sets that you have for a monthly fee. Their dashboarding functionalities help non-technical users to create charts and graphs and to look for trends to analyze.

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central