Subscribe to DSC Newsletter

Big Data: What’s the Buzz Mean Beyond the Hype?

In this final blog post of three with Dr. Michael Cavaretta, we explore questions about big data, as well as some of his views on emerging technologies from start-ups. As Ford Motor Company’s top data scientist, Dr. Cavaretta has helped to shape the auto manufacturer’s use of data and analytics during the last 20 years. Here he is in conversation with Anametrix CEO Pelin Thorogood. 

Pelin: Let me ask you about “big data.” It means a lot of things to a lot of people. What does big data mean to you? 

Michael: So I take a pretty simplistic view about big data.  You have a “big data” problem when the data itself is one of your challenges in solving the problem. Your data problem may relate to its volume or velocity or variety; it really doesn’t matter. All that matters is that when you think about attacking the problem, you’re thinking, “Yeah, I’ve got to do something 
different than what I did before, because the data is X.” That’s how you can tell you have a “big data” problem. 

Pelin: There’s also a lot of talk about “small data,” which is not actually that small either. So do you differentiate between big data and small data? 

Michael: I think the biggest challenge is to figure out what data can help you solve your problem. Whether that’s big data involving very large data sets or small data addressing a small problem, it really doesn’t matter. The main question is: “what data do you have, and what kind of value can you get from it?” The reason big data has gotten so much attention recently is because you can now do things with very large data sets, which were once too large or varied or coming in too fast to do anything with in the past. But ultimately, at the end of the day, we still have to be able to derive some value from the data, or it’s pretty much pointless. 

Pelin:  What trend or development in data science has you most excited about the field and what are you tired of hearing about? What’s hot and what’s not for you? 

Michael:  I’ve been very excited about the work by some new startups to attack the problem of cleaning up data, which is 80 percent of what has to be done to make it useful. We’ve had this problem for decades, trying to figure out where we get the data and where it goes. So I really like the fact that there are startup companies attacking this space, and I look to see more companies in this space, in fact. That’s generally the sunny side. The thing that I’m tired of is companies saying that they do “big data,” no matter what they do. So I can get a pivot table with my data. Is that big data? No, it’s not. 

Pelin:  So how do you see advances in technology making data science more accessible to more people, versus keeping it in the domain of the mathematicians and the computer scientists? 

Michael: This is an area with two dimensions. What do tools need to do to get better, and how do you guide people into better ways of doing things?  I think there’s a lot of work to do here, including training and understanding. It doesn’t matter how good the tool is. The tool is only as good as what people can do with the tool. So it needs to be attacked in both directions. 

Pelin: My hope is that we can make certain elements simple in terms of using the right algorithms, let’s say in marketing analytics when a marketer is trying to do loyalty or turn analyses. They don’t have to decide if they need a logarithmic regression on certain data points. The tool can be smart enough to know which data points to go after and even determine if there is sufficient data. Do you think we can actually get there so a marketer, for example, will be able to press a button that says, “I want to do a turn analysis,” and get the answers? 

Michael: There’s a possibility, yes. I don’t think that this is out of the normal possibility at all.  The question is going to be: “What’s the timeframe?” So the stuff that I’ve seen from companies, and from what I know about the data sets that most companies have, there’s still a lot of work that’s required to enable them to do these types of analyses.  Yes, we can have something that automatically presents options to a marketer. But it still takes a great deal to get that in place. While it’s not an immediate “slam dunk,” I think there’s some good work going forward. 

Even though the tools and technology still have a ways to go, I’m definitely a big booster in this area. I’m a big believer that data science has a lot of possibilities inside companies, but also in government and the not-for-profit organizations. 

Pelin: Thank you so much for your insights. This was an exciting opportunity to speak to questions ranging across the fields of data science and analytics. We wish you the very best as your work at Ford evolves. 

Views: 263

Tags: Analytics, Anametrix, Dr. Michael Cavaretta, big data, computer science, data scientist, data viz, ford, predictive analytics, statistics, More…visualization

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service