Subscribe to DSC Newsletter

I was trying to find some good domain name for our upcoming business science website, when something suddenly became clear to me. Many of us have been confused for a long time about what data science means, how it is different from statistics, machine learning, data mining, or operations research, and the rise of the data scientist light - a new species of coders who call themselves data scientist after a few hours of Python/R training, working on a small project at best, and spending $200 for their training. The data scientist light is not a real one, even though I believe that you can learn data science from scratch on the job, just as I did. 

This introduction brings me to the ABCD's, and the arguments are further developed in my conclusion below. These four domains are certainly overlapping. But I believe that identifying them brings more clarity about roles differentiation and collaboration.

  • Analytics Science. Deals with modern statistical modeling, predictive modeling, model-free (data-driven) statistics, root cause analysis, defining and selecting metrics, and traditional techniques such as clustering, SVM, linear regression, K-NN (whether you call it machine learning, AI, statistics or data science). Analytics scientists are true geeks with generic knowledge applicable to many domains. They may work on small or big data. Analytics science, unlike data science, is well documented in college textbooks.
  • Business Science. Deals with principles, both theoretical and applied, where domain expertise and deep cross-departments business understanding is critical. The purpose is to leverage analytics to deliver added value or increased profits. Business scientists might spend little time coding, unlike the three other categories. Examples of business science applications can be found in this article, and also in this article. It may overlap with BI, six sigmas and operations research. So it can definitely involve a great deal of statistical modeling, modern or not.
  • Computer Science: Deals with architecture (including real-time, distributed and cross-platforms such as IoT), algorithm design and refinement, platform design, Internet and communications protocols, data standards, systems engineering, software engineering and prototyping.
  • Data Science: Deals with data identification, collection, cleaning, summarizing, and insights extraction - even dashboards and visualizations that help with the executive decision process. Also includes advanced algorithms for big data, sensor data (IoT), black-box analytics, batch-mode analytics, automation of analytics processes. API's, analytics-driven systems based on machine-to-machine communications (for instance, automated bidding, fraud monitoring.) Typically, simple black-box, machine-controlled techniques are more difficult to design than complex man-controlled analyses, because they must be made very robust, as opposed to very accurate. Many data scientists also know quite a bit of analytics science, especially modern principles (typically not published in college textbooks) to deal with big, unstructured, fast flowing data. While in some ways computer scientists make data alive, data scientists take it from there and make it intelligent. 

I finally decided to call myself business scientist, as my experience is more and more aligned with this domain (being an entrepreneur), though, like many of us here, I have significant knowledge and expertise in all four domains, especially in data science and analytics science. My motivation to call myself a business scientist is also partly to not be confused with a data scientist light. This erroneous statement is sometimes brought against us (real) data scientists, by a minority of vocal analytics scientists. I believe that we need to dispel this myth. Part of the reason, I believe, is because math-free solutions that in addition, trade accuracy for robustness (in order to fit in black-box systems or be usable by the layman) are not respected by some traditional statisticians, who erroneously believe that automation and/or removing statistical jargon and mathematical background, is not possible. Maybe because it could jeopardize their jobs?

In the end, I want to make data science accessible to everyone, not to an elite of initiated, change-adverse professionals. It requires a new, unified, simple, efficient, math-free or math-light (but not data science light) approach to analytics problems and solutions, as well as algorithmic ingeniosity. This is feasible, but more difficult than producing extremely complicated statistical models - which is what I was doing earlier in my career

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 5172

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Jamie Lawson on January 2, 2016 at 9:15am

The definitions appear somewhat ad hoc. I agree that Data Science is new, and we are still competing for good definitions for it. I've sketched my own outline for such a definition in a blog on this site:

http://www.datasciencecentral.com/profiles/blogs/control-the-uncle-...

But I'm more concerned that the definitions here just generally miss the mark. For instance, Computer Science has been around for more than a half century, and we know pretty much what it is. I know a bit of Computer Science having earned advanced degrees in the subject and studying and doing it for several decades. It is not really the stuff discussed in this post, which is largely systems integration. It's like saying fine arts is about mixing paint. Computer Science is the study of complexity: the complexity of problems, solutions, and systems. That idea isn't new. Harold Abelson and Gerald Sussman articulated it in the 1980s. And it's a good description of Computer Science because it allows us to reason about what things we should or shouldn't do with Computer Science. It's also consistent with scientific method.

Data Science, though newer, appears to be quite different from what is described in this post. Let us assume that the people who consider themselves data scientists--myself included--do not wish to abuse the term "science". And speaking for myself, Data Science is the extraction of meaning from observations. In order to do Data Science, we have to verify that the meanings extracted are consistent with the real world. This leads to data driven refinement processes that are analogous to the processes carried out by other scientific instruments.

The definitions in this post also leave out the extremely important field of Simulation Science, which mirrors Data Science, except from a model-driven vice a data-driven perspective. Where Data Science refines what we observe about the world, Simulation Science refines what we believe about the world. Data Science and Simulation Science are like the left hand and the right hand of the same being, and it's really difficult to do either without the other. One will often dominate, but they go hand-in-hand. And this actually gets to Irv Lustig's question about optimization, because penultimate step in both the Data Science and Simulation Science processes is based on optimization. Cutting to the chase, these refinement processes take stuff we find in the world (observations or beliefs) and refine them into actionable intelligence about the world, the next step is to turn actionable intelligence into actions, and that is a control problem which relies on optimization.

Comment by Irv Lustig on November 30, 2015 at 8:20am

I'm curious why you titled this post "The ABCD's of Business Optimization".  I.e., why use the word "optimization"?

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2018   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service