Reviving from the dead an old but popular blog on Understanding Type I and Type II Errors
I recently got an inquiry that asked me to clarify the difference between type I and type II errors when doing statistical testing. Let me use this blog to clarify the difference as well as discuss the potential cost ramifications of type I and type II errors. I have also provided some examples at the end of the blog[1].
In statistical test theory, the notion of statistical error is an integral part of hypothesis testing. The statistical test requires an unambiguous statement of anull hypothesis (H_{0}), for example, "this person is healthy", "this accused person is not guilty" or "this product is not broken". The result of the test of the null hypothesis may be positive(healthy, not guilty, not broken) or may be negative(not healthy, guilty, broken).
If the result of the test corresponds with reality, then a correct decision has been made (e.g., person is healthy and is tested as healthy, or the person is not healthy and is tested as not healthy). However, if the result of the test does not correspond with reality, then two types of error are distinguished: type I errorand type II error.
A type I error occurs when the null hypothesisis true, but is rejected. Let me say this again, atype I error occurs when the null hypothesis is actually true, but was rejected as falseby the testing.
A type I error, or false positive, is asserting something as true when it is actually false. This false positive error is basically a "false alarm" – a result that indicates a given condition has been fulfilled when it actually has not been fulfilled (i.e., erroneously a positive result has been assumed).
Let’s use a shepherd and wolf example. Let’s say that our null hypothesis is that there is “no wolf present.” A type I error (or false positive) would be “crying wolf” when there is no wolf present. That is, the actual conditionwas that there was no wolf present; however, the shepherd wrongly indicated there was a wolf present by calling "Wolf! Wolf!” This is a type I error or false positive error.
A type II error occurs when the null hypothesis is false, but erroneously fails to be rejected. Let me say this again, atype II error occurs when the null hypothesis is actually false, but was accepted as trueby the testing.
A type II error, or false negative, is where a test result indicates that a condition failed, while it actually was successful. A Type II error is committed when we fail to believe a true condition.
Continuing our shepherd and wolf example. Again, our null hypothesis is that there is “no wolf present.” A type II error (or false negative) would be doing nothing (not “crying wolf”) when there is actually a wolf present. That is, the actual situationwas that there was a wolf present; however, the shepherd wrongly indicated there was no wolf present and continued to play Candy Crush on his iPhone. This is a type II error or false negative error.
A tabular relationship between truthfulness/falseness of the null hypothesis and outcomes of the test can be seen in the table below:

Null Hypothesis is true 
Null hypothesis is false 
Reject null hypothesis 
Type I Error False Positive 
Correct Outcome True Positive 
Fail to reject null hypothesis 
Correct outcome True Negative 
Type II Error False Negative 
Let’s walk through a few examples and use a simple form to help us to understand the potential cost ramifications of type I and type II errors. Let’s start with our shepherd / wolf example.
Null Hypothesis 
Type I Error / False Positive 
Type II Error / False Negative 
Wolf is not present 
Shepherd thinks wolf is present (shepherd cries wolf) when no wolf is actually present 
Shepherd thinks wolf is NOT present (shepherd does nothing) when a wolf is actually present 
Cost Assessment 
Costs (actual costs plus shepherd credibility) associated with scrambling the townsfolk to kill the nonexisting wolf 
Replacement cost for the sheep eaten by the wolf, and replacement cost for hiring a new shepherd 
Note: I added a row called “Cost Assessment.” Since it can not be universally stated that a type I or type II error is worse (as it is highly dependent upon the statement of the null hypothesis), I’ve added this cost assessment to help me understand which error is more “costly” and for which I might want to do more testing.
Let’s look at the classic criminal dilemma next. In colloquial usage, a type I error can be thought of as "convicting an innocent person" and type II error "letting a guilty person go free".
Null Hypothesis 
Type I Error / False Positive 
Type II Error / False Negative 
Person is not guilty of the crime 
Person is judged as guiltywhen the person actually did notcommit the crime (convicting an innocent person) 
Person is judged not guiltywhen they actually didcommit the crime (letting a guilty person go free) 
Cost Assessment 
Social costs of sending an innocent person to prison and denying them their personal freedoms (which in our society, is considered an almost unbearable cost) 
Risks of letting a guilty criminal roam the streets and committing future crimes 
Let’s look at some business related examples. In these examples I have reworded the null hypothesis, so be careful on the cost assessment.
Null Hypothesis 
Type I Error / False Positive 
Type II Error / False Negative 
Medicine A cures Disease B 
(H_{0} true, but rejected as false) Medicine A curesDisease B, but is rejected as false 
(H_{0} false, but accepted as true) Medicine A does not cureDisease B, but is accepted as true 
Cost Assessment 
Lost opportunity cost for rejecting an effective drug that could cure Disease B 
Unexpected side effects (maybe even death) for using a drug that is not effective 
Let’s try one more.
Null Hypothesis 
Type I Error / False Positive 
Type II Error / False Negative 
Display Ad A is effective in driving conversions 
(H_{0} true, but rejected as false) Display Ad A is effective in driving conversions, but is rejected as false 
(H_{0} false, but accepted as true) Display Ad A is not effective in driving conversions, but is accepted as true 
Cost Assessment 
Lost opportunity cost for rejecting an effective Display Ad A 
Lost sales for promoting an ineffective Display Ad A to your target visitors 
The cost ramifications in the Medicine example are quite substantial, so additional testing would likely be justified in order to minimize the impact of the type II error (using an ineffective drug) in our example. However, the cost ramifications in the Display Ad example are quite small, for both the type I and type II errors, so additional investment in addressing the type I and type II errors is probably not worthwhile.
Type I and type II errors are highly depend upon the language or positioning of the null hypothesis. Changing the positioning of the null hypothesis can cause type I and type II errors to switch roles.
It’s hard to create a blanket statement that a type I error is worse than a type II error, or vice versa. The severity of the type I and type II errors can only be judged in context of the null hypothesis, which should be thoughtfully worded to ensure that we’re running the right test.
I highly recommend adding the “Cost Assessment” analysis like we did in the examples above. This will help identify which type of error is more “costly” and identify areas where additional testing might be justified.
[1]More information about type I and type II errors can be found at: http://en.wikipedia.org/wiki/Type_I_and_type_II_errors
Comment
Thanks Larry for the added perspective.
I routinely practice statistical testing, and yet I admittedly still find the terms 'positive' and 'negative' confusing. The aim of a statistical test is 'find a meaningful difference between statistics' (means, variances, etc.). In that context (as opposed to Bill's business examples), a NULL hypothesis is always that "in probabilistic terms, there is no difference".
Consequently, the outcome the statistical test desires is *REJECT* the NULL hypothesis. This is the affirming, 'positive' outcome (for me the analyst). Likewise, the disappointing, 'negative' outcome, is that probabilistically, there is no meaningful difference in the statistics (and, I have to rethink what I had hoped to demonstrate).
Therefore, I find it helpful to mentally swap the word 'positive' for the entire phrase 'REJECT the NULL hypothesis'. This reinforces in my head that:
1) the word positive is not a substitute for the word 'true'; a 'positive finding' means the statistics (groups, entities, etc.) being compared are *in fact* different in probabilistic terms'.
2) 'negative' is not a substitute for the word 'false'; a 'negative finding' means that the statistics (groups, entities, etc.) being compared are *in fact* no different, also in probabilistic terms'.
Now I can keep the phrases straight:
'false positive'  incorrectly concluding a difference when none really exists,
'false negative'  incorrectly concluding no difference when a significant one really does exist
It's easy to get confused because the 'confusion matrix' (no pun intended) in for evaluating Classification algorithm performance also uses terms 'true positive', 'false negative', etc. In this context 'positive' *is* effectively a substitute for the word 'true' and 'negative' *is* effectively substitute for the word 'false'.
Larry Berk,
Data Scientist, Hitachi Vantara
© 2019 Data Science Central ® Powered by
Badges  Report an Issue  Privacy Policy  Terms of Service
You need to be a member of Data Science Central to add comments!
Join Data Science Central