By Vincent Granville
Dr. Roy Marsten, author of more than 30 papers on computational optimization in academic journals, has been a professor at MIT, Northwestern, University of Arizona, and the Georgia Institute of Technology before becoming a Big Data entrepreneur, founding several companies. Today, he has taken his Big Data expertise and started Emcien, an Atlanta-based Big Data analytics company shaping how businesses analyze their data. Dr. Marsten’s algorithms power the Emcien engine, which is deployed across sectors. Companies such as NCR and organizations like the Atlanta Police Department are using the engine to solve a variety of Big Data problems.
Dr. Marsten was the recipient of the Beale-Orchard-Hays Prize for Excellence in Computational Mathematics, awarded by the Mathematical Programming Society. He was also a finalist for the Franz Edelman Prize for Excellence in Management Science, awarded by the Institute for Operations Research and Management Science.
Dr. Granville: Should companies embrace Big Data? Which ones and how? How would today’s businesses measure ROI on Big Data?
Dr. Marsten: I believe that companies need to embrace Big Data to be successful in today’s economy. Through time, companies have been looking for a new level of improvement and gain, and this has usually been achieved through finding ways to optimize processes, ultimately leading to productivity gains. Through my experience working with major companies, I’ve observed that data is the next frontier. It is everywhere and every vertical stands to benefit from it. Data has changed our lives and it is the language of technology.
Everyone is trying to move towards automation. Once a problem is identified internally, a company needs to find a vendor that specializes in solving problems using innovative methods that don’t solve things with more services, but instead automate data analysis. This is the most effective way to truly analyze data, and it’s the foundation of what Emcien does. Technology doesn’t need to enable more manpower, the technology businesses select needs to make people smarter, giving the capabilities of a data scientist to the business user or analyst.
Companies can begin to embrace data by first taking stock of the data that they have and identifying the business problems they need to address. Companies have all kinds of data, and they need to look at what is available and what sort of business problems they can solve by leveraging the information. When you go back over your analytics efforts, compare the problems you’ve identified and solved and what they were costing you. That’s how you measure success.
Dr. Granville: What do you consider to be Big Data? I tend to think of big data as anything 10 times larger (in terms of megabytes per day) than the maximum you are used to. Also, sparse data might not be as big as they look, can be costly to process.
Dr. Marsten: Big is actually relative, in my opinion. Companies today are looking at comparatively small data sets and the real problem isn’t with size, but the problem is when the data doesn’t make sense anymore. That’s Big Data. Once it stops making sense it isn’t useful. When the data becomes so complex and multidimensional to where a person can’t easily analyze it or manipulate it to find answers, then it becomes Big Data.
Dr. Granville: How did you become interested in data science?
Dr. Marsten: I’ve always been interested in data science. I love math. My religion is the number line.
Dr. Granville: What is the difference between data science, statistics, machine learning, and data engineering? Do you think an hybrid role (cross-disciplines) would be helpful - and helpful to small companies, or helpful to the analytic practitioner as it opens more job opportunities?
Dr. Marsten: Everything in the industry is very fluid right now. People are starting to coin new terms, give old terms new meanings and combining terms, but in reality it’s all about using math to understand data, and then using computers to deal with that data. Different labels will come and go, and it will change with the emerging industry that is Big Data. The thing that matters is the community of people in these fields, bringing data together.
Dr. Granville: What kind of training do you recommend for future data scientists? Any specific program in mind?
Dr. Marsten: For training future data scientists, you really need programs that combine math and computer science. In math, you need to cover probability, statistics, linear algebra and graph theory. When I was an undergraduate, there were no computer science programs. They emerged out of the electrical engineering and math departments. There are several programs, such as Carnegie Mellon’s, that are starting to include concentrations in analytics, information management, BI and others. And of course, Georgia Tech. There are even masters programs now in Big Data. This just solidifies that Big Data is beyond a buzzword.
Dr. Granville: How to get university professors more involved in teaching students how to process real live, big data sets? Should curricula be adapted, outdated material removed, new material introduced?
Dr. Marsten: We really have to encourage real connections between faculty members and companies. Professors should consult for companies, and then those companies should be willing to make data available to them. A good university program should offer classes that use material that’s really cutting-edge and recent. I remember in the 70s, when I was teaching at MIT, I was approached by a cargo airline to analyze their data. Real data. The students were so excited when I came in and dropped the physical boxes of cards of data and said that they had this opportunity to analyze “real world problems.”
Dr. Granville: During my first year in my PhD program, I worked part-time for a high-tech small company, in partnership with my stats lab. This was a great experience - being exposed to the real world, and decently paid to do my PhD (in Belgium in 1988). How could we encourage such initiatives in US?
Dr. Marsten: Real-world consulting to companies (that I mentioned in the previous answer) opens opportunities for students to get this experience working with these companies and their data.
Dr. Granville: Besides Hadoop-like and graph database environments, do you see other technology that would made data plumbing easier for big data?
Dr. Marsten: Large companies today all have relational databases, and these could be the biggest impediment to progress. It’s not that we don’t need them, it’s just not where the real world Big Data problems are coming from. We know how to do that and it’s all in the past. Today’s data doesn’t fit into relational databases.
Dr. Granville: Does it make sense to structure unstructured data (using tags, NLP, taxonomies)?
Dr. Marsten: I do not think so. I think you should avoid doing violence to the data. Companies need to find ways to deal with the data as-is. They can do this by seeking out partners and vendors that specialize in analyzing these data types. We spend a lot of time forcing old methods on to new things, but that’s not useful with data. You need to find new ways to manage the variety of today’s data.
Dr. Granville: Can you tell me 5 business activities that would benefit most from big data, and 5 that would benefit least?
Dr. Marsten: The business activities that would benefit the LEAST from Big Data are:
The business activities that would benefit the MOST from Big Data are: