Subscribe to DSC Newsletter

What is in a Name? A Data Scientist by any other name

This article was written by Bhavani Raskutti. Bhavani joined the ANZ Teradata Advanced Analytics team in 2014. She is internationally recognised as a data mining thought leader and is regularly invited to present at international conferences on Mining Big Data. She is passionate about transforming businesses to make better decisions using their data capital.

The term “data science” was first used by the statistician William H. Cleveland in his 2001 paper entitled, “Data Science: An Action Plan for Expanding the Technical Areas of t...”. Cleveland emphasized that the “[results in] data science should be judged by the extent to which they enable the analyst to learn from data”.

The scientific discipline of learning from data has been happening for centuries before the term data science ever came into being. Statisticians have been collecting, processing, analysing, visualising and interpreting vast amounts of diverse data to generate models. In doing so, they developed many algorithms that are used for regression and classification such as GLM (Generalised Linear Modeling) and embedded in statistical packages such as SAS and SPSS that are used extensively to this day. They also developed fundamental theories that have been the basis of many learning algorithms developed in other fields, eg., Support Vector Machine (SVM).

The focus in statistics has been firstly, on inference, i.e., generating stochastic models to fit the data and on theoretical rigor in deriving statistically sound inferences, i.e., ensuring that the assumptions behind data distributions and data independence are valid. Machine Learning, on the other hand, focuses learning from data to make predictions without any reference to the underlying mechanism generating the data. They employ different predictive algorithms, many of which were developed by statisticians or based on statistical learning theory. The use of N-fold cross-validation or leave-one-out methodology to compare accuracies of different algorithms, the development of SVM solvers, ensemble classifiers and deep learning as well as different metrics for evaluating prediction accuracy are the result of fundamental machine learning research.

To read the full article, click here.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 1691

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Mark Meloon on July 8, 2016 at 4:56am

I like to see "communication" as one of the four pillars of your diagram. From all the blogs and other articles out there about data science one would get the feeling that all one needs to do is be good at machine learning and *poof* you're a data scientist. There's a lot more to being "The perfect data scientist". Thanks for a great diagram.

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service