Subscribe to DSC Newsletter

One of the biggest decisions that a data scientist need to make during a predictive modeling exercise is to choose the right classifier.There is no best classifier for all problems. The accuracy of the classifier varies based on the data set. Correlation between the predictor variables and the outcome is a key influencer. The choice need to be made based on experimentation. There are two main selection criteria here.

Accuracy:  While accuracy of the algorithm is important, sometimes meeting an accuracy threshold is all that is required. The amount of false positives and false negatives generated is all a consideration. For some applications, false positives (like medicine)  are not acceptable and in some, false negatives (like fraud detection) are not acceptable. When classifiers produce similar close accuracy ranges, statistical evaluation need to be made to see if the difference is significant.

Speed: The speed of building models and predicting outcomes is of vital importance especially since a number of classifiers, especially ensemble classifiers take significant times to run. Model building and predictions might need to be made in real time or within set thresholds. In these cases, accuracy is compromised for performance.

An experiment to compare different classifiers can be found here Experiment to compare classifiers

Thoughts?

 

Views: 2893

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service