.

# Understanding Type I and Type II Errors

Reviving from the dead an old but popular blog on Understanding Type I and Type II Errors

I recently got an inquiry that asked me to clarify the difference between type I and type II errors when doing statistical testing.  Let me use this blog to clarify the difference as well as discuss the potential cost ramifications of type I and type II errors. I have also provided some examples at the end of the blog[1]

In statistical test theory, the notion of statistical error is an integral part of hypothesis testing. The statistical test requires an unambiguous statement of anull hypothesis (H0), for example, "this person is healthy", "this accused person is not guilty" or "this product is not broken".   The result of the test of the null hypothesis may be positive(healthy, not guilty, not broken) or may be negative(not healthy, guilty, broken).

If the result of the test corresponds with reality, then a correct decision has been made (e.g., person is healthy and is tested as healthy, or the person is not healthy and is tested as not healthy).  However, if the result of the test does not correspond with reality, then two types of error are distinguished: type I errorand type II error.

# Type I Error (False Positive Error)

A type I error occurs when the null hypothesisis true, but is rejected.  Let me say this again, atype I error occurs when the null hypothesis is actually true, but was rejected as falseby the testing.

A type I error, or false positive, is asserting something as true when it is actually false.  This false positive error is basically a "false alarm" – a result that indicates a given condition has been fulfilled when it actually has not been fulfilled (i.e., erroneously a positive result has been assumed).

Let’s use a shepherd and wolf example.  Let’s say that our null hypothesis is that there is “no wolf present.”  A type I error (or false positive) would be “crying wolf” when there is no wolf present. That is, the actual conditionwas that there was no wolf present; however, the shepherd wrongly indicated there was a wolf present by calling "Wolf! Wolf!”  This is a type I error or false positive error.

# Type II Error (False Negative)

A type II error occurs when the null hypothesis is false, but erroneously fails to be rejected.  Let me say this again, atype II error occurs when the null hypothesis is actually false, but was accepted as trueby the testing.

A type II error, or false negative, is where a test result indicates that a condition failed, while it actually was successful.   A Type II error is committed when we fail to believe a true condition.

Continuing our shepherd and wolf example.  Again, our null hypothesis is that there is “no wolf present.”  A type II error (or false negative) would be doing nothing (not “crying wolf”) when there is actually a wolf present.  That is, the actual situationwas that there was a wolf present; however, the shepherd wrongly indicated there was no wolf present and continued to play Candy Crush on his iPhone.  This is a type II error or false negative error.

A tabular relationship between truthfulness/falseness of the null hypothesis and outcomes of the test can be seen in the table below:

 Null Hypothesis is true Null hypothesis is false Reject null hypothesis Type I Error False Positive Correct Outcome True Positive Fail to reject null hypothesis Correct outcome True Negative Type II Error False Negative

# Examples

Let’s walk through a few examples and use a simple form to help us to understand the potential cost ramifications of type I and type II errors.  Let’s start with our shepherd / wolf example.

 Null Hypothesis Type I Error / False Positive Type II Error / False Negative Wolf is not present Shepherd thinks wolf is present (shepherd cries wolf) when no wolf is actually present Shepherd thinks wolf is NOT present (shepherd does nothing) when a wolf is actually present Cost Assessment Costs (actual costs plus shepherd credibility) associated with scrambling the townsfolk to kill the non-existing wolf Replacement cost for the sheep eaten by the wolf, and replacement cost for hiring a new shepherd

Note: I added a row called “Cost Assessment.”  Since it can not be universally stated that a type I or type II error is worse (as it is highly dependent upon the statement of the null hypothesis), I’ve added this cost assessment to help me understand which error is more “costly” and for which I might want to do more testing.

Let’s look at the classic criminal dilemma next.  In colloquial usage, a type I error can be thought of as "convicting an innocent person" and type II error "letting a guilty person go free".

 Null Hypothesis Type I Error / False Positive Type II Error / False Negative Person is not guilty of the crime Person is judged as guiltywhen the person actually did notcommit the crime (convicting an innocent person) Person is judged not guiltywhen they actually didcommit the crime (letting a guilty person go free) Cost Assessment Social costs of sending an innocent person to prison and denying them their personal freedoms (which in our society, is considered an almost unbearable cost) Risks of letting a guilty criminal roam the streets and committing future crimes

Let’s look at some business related examples.  In these examples I have reworded the null hypothesis, so be careful on the cost assessment.

 Null Hypothesis Type I Error / False Positive Type II Error / False Negative Medicine A cures Disease B (H0 true, but rejected as false) Medicine A curesDisease B, but is rejected as false (H0 false, but accepted as true) Medicine A does not cureDisease B, but is accepted as true Cost Assessment Lost opportunity cost for rejecting an effective drug that could cure Disease B Unexpected side effects (maybe even death) for using a drug that is not effective

Let’s try one more.

 Null Hypothesis Type I Error / False Positive Type II Error / False Negative Display Ad A is effective in driving conversions (H0 true, but rejected as false) Display Ad A is effective in driving conversions, but is rejected as false (H0 false, but accepted as true) Display Ad A is not effective in driving conversions, but is accepted as true Cost Assessment Lost opportunity cost for rejecting an effective Display Ad A Lost sales for promoting an ineffective Display Ad A to your target visitors

The cost ramifications in the Medicine example are quite substantial, so additional testing would likely be justified in order to minimize the impact of the type II error (using an ineffective drug) in our example.  However, the cost ramifications in the Display Ad example are quite small, for both the type I and type II errors, so additional investment in addressing the type I and type II errors is probably not worthwhile.

# Summary

Type I and type II errors are highly depend upon the language or positioning of the null hypothesis. Changing the positioning of the null hypothesis can cause type I and type II errors to switch roles.

It’s hard to create a blanket statement that a type I error is worse than a type II error, or vice versa.  The severity of the type I and type II errors can only be judged in context of the null hypothesis, which should be thoughtfully worded to ensure that we’re running the right test.

I highly recommend adding the “Cost Assessment” analysis like we did in the examples above.  This will help identify which type of error is more “costly” and identify areas where additional testing might be justified.

Views: 32766

Comment

Join Data Science Central

Comment by Bill Schmarzo on December 5, 2018 at 12:51pm

Thanks Larry for the added perspective.

Comment by Larry Berk on August 23, 2018 at 3:33pm

I routinely practice statistical testing, and yet I admittedly still find the terms 'positive' and 'negative' confusing.  The aim of a statistical test is 'find a meaningful difference between statistics' (means, variances, etc.).  In that context (as opposed to Bill's business examples), a NULL hypothesis is always that "in probabilistic terms, there is no difference".

Consequently, the outcome the statistical test desires is *REJECT* the NULL hypothesis.  This is the affirming, 'positive' outcome (for me the analyst).  Likewise, the disappointing, 'negative' outcome, is that probabilistically, there is no meaningful difference in the statistics (and, I have to rethink what I had hoped to demonstrate).

Therefore, I find it helpful to mentally swap the word 'positive' for the entire phrase 'REJECT the NULL hypothesis'.  This reinforces in my head that:

1) the word positive is not a substitute for the word 'true'; a 'positive finding' means the statistics (groups, entities, etc.) being compared are *in fact* different in probabilistic terms'.

2) 'negative' is not a substitute for the word 'false'; a 'negative finding' means that the statistics (groups, entities, etc.) being compared are *in fact* no different, also in probabilistic terms'.

Now I can keep the phrases straight:

'false positive' -- incorrectly concluding a difference when none really exists,

'false negative' -- incorrectly concluding no difference when a significant one really does exist

It's easy to get confused because the 'confusion matrix' (no pun intended) in for evaluating Classification algorithm performance also uses terms 'true positive', 'false negative', etc.  In this context 'positive' *is* effectively a substitute for the word 'true' and 'negative' *is* effectively substitute for the word 'false'.

Larry Berk,

Data Scientist, Hitachi Vantara