different than what I did before, because the data is X.” That’s how you can tell you have a “big data” problem. Pelin: There’s also a lot of talk about “small data,” which is not actually that small either. So do you differentiate between big data and small data? Michael
: I think the biggest challenge is to figure out what data can help you solve your problem. Whether that’s big data involving very large data sets or small data addressing a small problem, it really doesn’t matter. The main question is: “what data do you have, and what kind of value can you get from it?” The reason big data has gotten so much attention recently is because you can now do things with very large data sets, which were once too large or varied or coming in too fast to do anything with in the past. But ultimately, at the end of the day, we still have to be able to derive some value from the data, or it’s pretty much pointless. Pelin: What trend or development in data science has you most excited about the field and what are you tired of hearing about? What’s hot and what’s not for you? Michael:
I’ve been very excited about the work by some new startups to attack the problem of cleaning up data, which is 80 percent of what has to be done to make it useful. We’ve had this problem for decades, trying to figure out where we get the data and where it goes. So I really like the fact that there are startup companies attacking this space, and I look to see more companies in this space, in fact. That’s generally the sunny side. The thing that I’m tired of is companies saying that they do “big data,” no matter what they do. So I can get a pivot table with my data. Is that big data? No, it’s not. Pelin: So how do you see advances in technology making data science more accessible to more people, versus keeping it in the domain of the mathematicians and the computer scientists? Michael:
This is an area with two dimensions. What do tools need to do to get better, and how do you guide people into better ways of doing things? I think there’s a lot of work to do here, including training and understanding. It doesn’t matter how good the tool is. The tool is only as good as what people can do with the tool. So it needs to be attacked in both directions. Pelin: My hope is that we can make certain elements simple in terms of using the right algorithms, let’s say in marketing analytics when a marketer is trying to do loyalty or turn analyses. They don’t have to decide if they need a logarithmic regression on certain data points. The tool can be smart enough to know which data points to go after and even determine if there is sufficient data. Do you think we can actually get there so a marketer, for example, will be able to press a button that says, “I want to do a turn analysis,” and get the answers? Michael:
There’s a possibility, yes. I don’t think that this is out of the normal possibility at all. The question is going to be: “What’s the timeframe?” So the stuff that I’ve seen from companies, and from what I know about the data sets that most companies have, there’s still a lot of work that’s required to enable them to do these types of analyses. Yes, we can have something that automatically presents options to a marketer. But it still takes a great deal to get that in place. While it’s not an immediate “slam dunk,” I think there’s some good work going forward.
Even though the tools and technology still have a ways to go, I’m definitely a big booster in this area. I’m a big believer that data science has a lot of possibilities inside companies, but also in government and the not-for-profit organizations. Pelin: Thank you so much for your insights. This was an exciting opportunity to speak to questions ranging across the fields of data science and analytics. We wish you the very best as your work at Ford evolves.