Subscribe to DSC Newsletter

How to evaluate Data Science models ?

In today’s Digital age,  insights received from data science are extremely important to deliver the best customer experience. 
Data Scientists use various techniques such as Regression, SVM, Neural network, Nearest neighbor, Naive Bayes, Decision Tree and Ensemble models.
These algorithms help to identify previously unrecognized patterns and trends hidden within vast amounts of structured and unstructured information. These patterns are used to create predictive models that try to forecast future behavior.
These models have many practical business applications: predicting patients at risk, they help banks decide which customers to approve for loans, and marketers use them to determine which leads to target with campaigns.
But how to determine if the predictive models you create are accurate, meaningful representations that will prove valuable to your organization?
There are various methods used by data scientists to measure the accuracy of the model:
  • Lift Charts & Gain Charts: These are widely used in campaign targeting problems, to determine which decile can we target customers for a specific campaign. Also, it tells you how much response you can expect from the new target base.
  • ROC Curve: The ROC curve is the plot between false positive rate and True Positive rate.
  • Gini coefficient: This is the ratio of area between the ROC curve and the diagonal line & the area of the above triangle
  • Cross Validation: splitting the data into two parts, where one part is used for "training" your model, and the second part is used to make predictions. By this you can test the model on the data that was "not seen" by it previously, and check how it could possibly behave with external data.
  • Confusion Matrix: A table showing the number of predictions for each class compared to the number of instances that actually belong to each class. This is very useful to get an overview of the types of mistakes the algorithm made. This method shows accuracy, true positive, false positive, Sensitivity & specificity of the model.
  • Root Mean Squared Error: This is the average amount of error made on the test set in the units of the output variable. This measure helps you get an idea on the amount a given prediction may be wrong on average. This is most popular in regression techniques.
In general, the assessment used should be closely matching the business objectives. Using the right metric can have more influence on you model performance than the algorithm you use.

There are so many data points generated by Internet of Things, Mobiles,Social Media and all the Omni-Channels used for customer interactions. Only storing this data is useless , unless it is used by data scientists for generating insights that is used for next actions. 

Views: 6309

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Snear Nadav on October 1, 2018 at 4:55am

What about log loss? Is it a better estimation than RMSE regarding to classification? 

Comment by Vincent Granville on August 14, 2016 at 8:58am

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service