There are so many confusing and sometimes even counterintuitive concepts in statistics. I mean, come on…even explaining the differences between Null Hypothesis and Alternative Hypothesis can be an ordeal. All I want to do is to understand and quantify the cost of my analytical models being wrong.
For example, let’s say that I’m a shepherd who has bad eyesight and have a hard time distinguishing between a wolf and a sheep dog. That’s obviously a bad trait, because the costs of being wrong are very expensive:
Okay, so I’m not a very good shepherd, but I am a very sophisticated shepherd and I’ve build a Neural Network application to distinguish a sheep dog from a wolf. Through much training of the “Wolf Detection” neural network, I now have a tool that can correctly distinguish a sheep dog from a wolf with 95% accuracy (see Figure 1).
Figure 1: Source: “Why Deep Learning Is Suddenly Changing Your Life” http://fortune.com/aiartificialintelligencedeepmachinelearning...
Okay, that seems pretty great, but is 95% accuracy good enoughgiven the costs of False Positives and False Negatives? Shouldn’t I invest more time and effort to improve that accuracy percentage to ensure that my model is “profitable;” quantify that 5% inaccuracy which is making my analytical model wrong?
Enter the Confusion Matrix (if there was ever an accurate description of something, this name nails it).
So how does one go about quantifying the costs of being wrong using the Confusion Matrix? That is, determining if a model that correctly predicts with, for instance, 95% accuracy is good enough given the business situation and the costs associated with being wrong.
The terms ‘true condition’ (‘positive outcome’) and ‘predicted condition’ (‘negative outcome’) are used when discussing Confusion Matrices. This means that you need to understand the differences (and eventually the costs associated) with Type I and Type II Errors.
First, let’s set up our Confusion Matrix for testing the condition: “Is that animal in the grove a wolf?” The Positive Condition is “The Animal is a Wolf” in which case I’d take the appropriate action (probably wouldn’t try to pet it). Below is the 2x2 Confusion Matrix for our use case.
True Condition 

Predicted 

True (Wolf) 
False (Dog) 
True (Wolf) 
TP 
FP 

False (Dog) 
FN 
TN 
Where:
So once the neural network model produces the Confusion Model that covers all four of the above conditions in the 2x2 matrix, we can calculate goodness of fit and effectiveness measures, such as model Precision, Sensitivity and Specificity.
True Condition 

Predicted 
Cell Probabilities 
True (Wolf) 
False (Dog) 
True (Wolf) 
Precision 


False (Dog) 




Recall / Sensitivity 
Specificity 
The Confusion Matrix can then be used to create the following measures of goodness of fit and model accuracy.
Now let’s get back to our shepherd example. We want to determine the costs of the model being wrong, or the savings the neural network provides. We need to determine if the there is sufficient improvement in what the model provides over what the shepherd already does himself.
Before the Wolf Detection application, I (as the shepherd) had the following Confusion Matrix where:
Without Wolf Detection Application 

True Condition 

Predicted 
5,000 Observations 
True (Wolf) 
False (Dog) 
True (Wolf) 
True Positive = 75% 
False Positive = 10% 

False (Dog) 
False Negative = 5% 
True Negative = 10% 




By the previous definitions, the corresponding metrics without using the Wolf Detection Application are:
Now using the Wolf Detection application, we get the below Confusion Matrix:
With Wolf Detection Application 

True Condition 

Predicted 
5,000 Observations 
True (Wolf) 
False (Dog) 
True (Wolf) 
True Positive = 4,000 No cost 4000 / 5000 = 80% 
False Positive = 200 Cost per occurrence = $2,000 200 / 5000 = 4% 

False (Dog) 
False Negative = 50 Cost per occurrence = $5,000 50 / 5000 = 1% 
True Negative = 750 No cost 750 / 5000 = 15% 




Confusion Matrix metrics with using the Wolf Detection Application are:
Bringing this all together into a single table:

Without Wolf Detection App 
With Wolf Detection App 
Improvement 
% Improvement 
Precision 
88% 
95% 
7 points 
8.0% 
Recall/Sensitivity 
94% 
99% 
5 points 
5.3% 
Specificity 
50% 
79% 
29 points 
58.0% 
Accuracy 
85% 
95% 
10 points 
11.8% 
Return on Investment then equals:
Finally, the Expected Value Per Prediction (EvP) =
= ($2000 * Change in FP%) + ($5000 * Change in FN%)
= ($2000*.06) + ($5000*.04)
= $320 average savings per night
Not all Type I and Type II errors are of equal value. One needs to invest the time to understand the costs of Type I and Type II errors in relationship to your specific case. The real challenge is determining whether the improvement in performance from the analytic model is “good enough.” The Confusion Matrix can help us make that determination.
And if folks are still struggling with the concept of Type I and Type II errors, I hope the below image can help to clarify the difference. Hehehe
Special thanks to Larry Berk, one of my Senior Data Scientists, for his guidance on this blog. He still understands the use of Confusion Matrices much better than me!
“Simple Guide to Confusion Matrix Terminology”
“Confusion Matrix” from Wikipedia (by the way, I did make a donation to Wikipedia. They are a valuable source of information for these sorts of topics).
Views: 2324
Tags: #AI, #BigData, #DataAnalytics, #DataEngineer, #DataScience, #DataScientist, #DeepLearning, #MachineLearning, #NeuralNetworks, #Statistics, More…#Stats
© 2020 Data Science Central ® Powered by
Badges  Report an Issue  Privacy Policy  Terms of Service
Upcoming DSC Webinar
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 20082014  20152016  20172019  Book 1  Book 2  More
Upcoming DSC Webinar
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central