(This article is now a chapter of my github proto-book Bayesuvius)

Simpson's paradox is a recurring nightmare for all statisticians overseeing a clinical trial for a medicine. It is possible that if they leave out a certain "confounding" variable from a study, the study's conclusion on whether a medicine is effective or not, might be, without measuring that confounding variable, the opposite of what it would have been had that variable been measured. Statisticians have to enlist expert knowledge to assure themselves that no influential variables are left out.

Judea Pearl considers Simpson's Paradox a fundamental problem which is greatly clarified by his theory of causal Bayesian networks (See references below).

Here is a simple example of Simpson's Paradox (Or Simpson's curse, or Simpson's weed).

Some patients of both male and female genders are given a medicine or a placebo in a double blind study. Some recover from their ailment and others don't. Let

r= recovered? No=0, Yes=1

t= took medicine? No=0, Yes=1

g= gender? Female=0, Male=1

The situation can be modeled by the Bayesian Network (bnet) in Fig.1

Figure 1

For this bnet, one has

Therefore,

where is a conditional expected value (a kind of weighted average).

Suppose are non-negative real numbers. For the vector:

Define a positive outcome (or success or increasing with t if.

Define a negative outcome (or failure or decreasing with t if.

Figure 2

It is possible (see Fig. 2 for a graphical explanation of how) to find perverse cases in which and increase with t but decreases with t. So it is possible to conclude that the medicine is a success for each of the two g populations considered separately, yet the medicine is a failure when both populations are ``amalgamated". The lesson is that a ``trend reversal" is possible upon amalgamation. The sign of the outcomes is not necessarily preserved when we do a weighted average of type. is an expected value on the ``confounding" random variable g conditioned on the root random variable t.

Figure 3

Footnote: Sometimes the bnet Fig.3 is used to explain Simpson's paradox instead of the bnet Fig.1. But those two bnets are equivalent because

**References**

- "Understanding Simposon's Paradox", by Judea Pearl
- Simpson's Paradox: The riddle that would not die. (Comments on four recent papers), by Judea Pearl
- "Simpson's Paradox and the implications for medical trials', by

Norman Fenton, Martin Neil, Anthony Constantinou,

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- Data Science Leadership Exchange: Best Practices for Driving Outcomes

Despite an increasing awareness of the role data science plays in successful business outcomes, data science leaders still struggle to organize, implement and communicate effective data science initiatives.

Join this latest DSC webinar and gain advice on optimizing your data management strategies. Some of the industry’s best and brightest from Bayer, S&P Global and Transamerica will be presenting their insights and experiences. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- Data Science Leadership Exchange: Best Practices for Driving Outcomes

Despite an increasing awareness of the role data science plays in successful business outcomes, data science leaders still struggle to organize, implement and communicate effective data science initiatives.

Join this latest DSC webinar and gain advice on optimizing your data management strategies. Some of the industry’s best and brightest from Bayer, S&P Global and Transamerica will be presenting their insights and experiences. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central