Home » Technical Topics » Data Science

Why I think the Potential Outcomes Theory is woefully incomplete without Pearl’s enhancements to it


 Judea Pearl (left) and Donald Rubin (right) taken in 2014.

Full disclosure: I am a big fan of Judea Pearl and his contributions to Bayesian Networks (bnets) and Causal Inference (CI).

There has long been a raging debate between Pearl and the advocates of the theory of Potential Outcomes (PO). These advocates include the famous statistician Donald Rubin (the main inventor of PO), and the outspoken economist Guido Imbens. I consider this debate to be Rubenesque in proportions and imbecilic in content. Rubin is on record (see this Letter to the Editor and this tweet) as saying that he never uses Pearl’s DAGs, that they just clutter the PO picture.

“[Graphs are] based on an unprincipled and confused theoretical perspective.”

“to avoid conditioning on some observed covariates,… is nonscientific ad hockery.”

I find Rubin’s attitude very silly and unscientific. DAGs (Directed Acyclic Graphs) are a beautiful graphical representation of what is going on in PO; they clarify PO immensely and suggest many extensions and caveats to the basic PO model.

As part of my activities during 2020, I’ve been writing a book that might well be called “Loving Bayesian Networks in the time of Covid“, but is called instead Bayesuvius. Bayesuvius is a free, open source book about bnets and CI. The book contains in-depth explanations of the standard Rubin PO theory, and how to turn it from a weakling to a Charles Atlas using Pearl’s CI and bnets. Writing those in-depth explanations has forced me to think long and hard about PO. Today I found one more important reason, which I will describe next, why I think PO theory without Pearl’s contributions is pitifully opaque and inadequate.

PO theory introduces “treatment effects” called ATE, ATT, ATU, SDO, SB. I am using here the same acronyms that are used in Bayesuvius, which are also the ones used in Cunningham’s online mixtape book. There is also something called ACE (Average Causal Effect), which I learned about from one of Pearl’s books. ACE is defined in terms of the “do operator”. It gives the difference in outcomes with the treatment, and without the treatment, where without  the treatment means all incoming arrows to the treatment node are amputated. Hence ACE=0 iff there is no treatment effect according to a RCT (Randomized Clinical Trial). So I was wondering, how is ACE related to the old PO standbys ATE, ATT, ATU, etc. This is an important question because ACE=0 is the hypothesis that RCTs test for, and RCTs are the gold standard in causality research. At first I thought, probably ACE=0 iff ATE=0 or something like that. Well no! What I found is written up in a small section called “Zero ACE” of Bayesuvius in the chapter called “Potential Outcomes”. (remember, Bayesuvius chapters are in alphabetic order, because it is a visual dictionary). What I found is that (hold on to your papers)


So ACE=0 iff SDO=ATU and SB=0.

ATE=0 does not imply ACE=0 nor the converse.

I would have never asked this question without the key insights into PO theory that I’ve gained from Pearl’s CI.

And I didn’t melt any bunnies to prove this. Wow, what a time to be alive!