This article was written by Bhavani Raskutti. Bhavani joined the ANZ Teradata Advanced Analytics team in 2014. She is internationally recognised as a data mining thought leader and is regularly invited to present at international conferences on Mining Big Data. She is passionate about transforming businesses to make better decisions using their data capital.
The term “data science” was first used by the statistician William H. Cleveland in his 2001 paper entitled, “Data Science: An Action Plan for Expanding the Technical Areas of t...”. Cleveland emphasized that the “[results in] data science should be judged by the extent to which they enable the analyst to learn from data”.
The scientific discipline of learning from data has been happening for centuries before the term data science ever came into being. Statisticians have been collecting, processing, analysing, visualising and interpreting vast amounts of diverse data to generate models. In doing so, they developed many algorithms that are used for regression and classification such as GLM (Generalised Linear Modeling) and embedded in statistical packages such as SAS and SPSS that are used extensively to this day. They also developed fundamental theories that have been the basis of many learning algorithms developed in other fields, eg., Support Vector Machine (SVM).
The focus in statistics has been firstly, on inference, i.e., generating stochastic models to fit the data and on theoretical rigor in deriving statistically sound inferences, i.e., ensuring that the assumptions behind data distributions and data independence are valid. Machine Learning, on the other hand, focuses learning from data to make predictions without any reference to the underlying mechanism generating the data. They employ different predictive algorithms, many of which were developed by statisticians or based on statistical learning theory. The use of N-fold cross-validation or leave-one-out methodology to compare accuracies of different algorithms, the development of SVM solvers, ensemble classifiers and deep learning as well as different metrics for evaluating prediction accuracy are the result of fundamental machine learning research.
To read the full article, click here.