Dr. Michael Cavaretta, leader of the Predictive Analytics Group in Ford’s Research and Advanced Engineering group, continues his conversation this week with Anametrix CEO Pelin Thorogood. He’s Ford’s top data scientist and an expert on how data can be used across corporate functions. Here he talks about his work in an internal consulting group, as well as the challenges of recruiting data experts who also understand business − a person sometimes referred to as a “unicorn” because they are so hard to find. Pelin: You talked about how you have served as an internal consultant to so many different departments at Ford – from auto manufacturing to HR and everything in between. Can you share some specific examples of the types of problems you use data and analytics to address? What are some of the typical approaches you take as an internal consulting group? Michael:
We try to maintain a good split of about 70 percent of our work driven by what we call “pull.” People pull us into meetings, into conversations, and those come about primarily through word of mouth. One of the benefits from being at a company for a number of years is that you establish a reputation and credibility. We have many contacts, people and internal organizations we’ve worked with in the past.
In contrast, about 30 percent of what we do is look for new tools, techniques and technologies to bring into the company. In these cases, we’ll recognize something that’s important, perhaps something like natural language processing. And we ask, can this kind of technology provide true value? We try it out and then, if it does have value, get people excited about it. Pelin: At Ford you’ve expanded from a role as data scientist to having an internal consulting group with a strong reputation throughout the company. How does Ford now view data science and advanced analytics? Is this an area that the company is embracing more and more? Michael:
It’s most definitely a growing area. At Ford, we have a yearly data science conference, which brings together analytics speakers drawn internally from all over the company. These are people who specialize in marketing analytics or data scientists who work at Ford Credit, people who work at the customer service division. All of the Ford analytics teams get together at this conference. This is where we get an idea about the number of people, now in the hundreds, who globally work on analytics for Ford. Pelin: Let’s talk more about the concept of data science. How do you see data scientists working within their companies generally? Do you have insight into how data science is evolving across the board? Michael:
On the positive side, there’s definitely a lot of change from companies trying to think about how to do a better job as data-driven organizations, particularly in putting together all of the pieces. There are many opportunities with regard to data and analytics − from making quick, real-time decisions to exploring interesting, long-term strategic ones that can span years. So that’s one side. The other side, of course, is your ability to process data. Today, we can create technologies and parallel processing architectures to store a lot more data. You can process data at the lowest level to get a new perspective.
I think one of the biggest challenges for companies is the difficulty in finding people who have a set of skills in computer science, statistics and visualization, and then also have good business knowledge. These people are sometimes referred to as unicorns, because they’re just almost impossible to find. One way to solve this problem is to look for strategic hires and then train these people in data and computing. It’s often easier to take people with business knowledge and train them up with new, data-related skills, than the reverse. Pelin: We really need people who have the left brain and right working in balance, while also knowledgeable of the business. As a data scientist supporting business, what skills are needed in your day-to-day work? You talked about statistics and computer science. Can you elaborate on that a little bit? Michael:
What we’re trying to do is build our team with overlapping skills. We don’t expect everybody to be that all-in-one data scientist with the perfect blend of skills. We look for people who have some blending of skills, but a concentration in one area. So, for example, I have a mechanical engineer on our team. She’s got a great primary background and is strong in statistics. She knows the engineering, she knows the signal and computer science, but has less knowledge of visualization. That’s fine because I have other people who are strong in visualization. What we’re trying to do is blend these skills, so that when we attack a problem, we know what skills we need to pull from to devote time and resources appropriately. Pelin: What advice would you have for someone entering the field of data science
The biggest thing is that you really need to be passionate about this area. In fact, I recently wrote a blog post
about this. There’s increased demand for data scientists right now, and there are always questions about people who may be chasing the dollars. But you really have to have some passion for this field. Otherwise it can be tough, just blocks of data day after day while you struggle with understanding why it’s not giving you what you want.
So before people spend a lot of time and money getting training in formal data science techniques, go take a look at some open data. Take a look at some sites that have contests. Give it a try. If you find it’s something that excites you, and you find it fun and interesting, fantastic. But if that isn’t something that you feel passionate about, you should really reconsider getting into the field. Pelin: That’s definitely good advice, probably for almost all professions. As a data scientist, I’m sure you’ve used a wide variety of tools, which have changed over the years. What tools do you see as being important for future data scientists? What are the emerging technologies of highest relevance for you today? Michael:
There’s so much discussion about what tools and technologies are appropriate. There’s a bit of a war, for example, between the people who are big fans of the statistical language R versus people who love Python for their data science. And, of course, there’s always going to be that group of people who are SAS fans. They have a great install base and fantastically trained people.
The thing that I would say is, “be flexible.” There are appropriate times to be looking at enterprise-level SAS, and there are other times to be looking at something you can hack in Python or do a quick analysis in R. The tools and technologies all have their strengths and weaknesses. The biggest thing is you must “stick” fundamentals before going forward, rather than just concentrating on a fixed tool. I believe you need to get the fundamentals down first. So learn the basic statistics, understand how basic programming works, get that stuff done, then look to specialize, whether that be in network analysis or visualizations or information systems. I think having a specialty and looking at tools and technologies in that area is a good thing to do. Pelin, Thank you, Michael. We’ll be back next with week with Dr. Michael Cavaretta to talk about big data and what it means beyond the hype.