Causality – The Next Most Important Thing in AI/ML

Summary:  Finally there are tools that let us transcend ‘correlation is not causation’ and identify true causal factors and their relative strengths in our models.  This is what prescriptive analytics was meant to be.


Just when I thought we’d figured it all out, something comes along to make me realize I was wrong.  And that something in AI/ML is as simple as realizing that everything we’ve done so far is just curve-fitting.  Whether it’s a scoring model or a CNN to recognize cats, it’s all about association; reducing the error between the distribution of two data sets. 

What we should have had our eye on is CAUSATION.  How many times have you repeated ‘correlation is not causation’.  Well it seems we didn’t stop to ask how AI/ML can actually determine causality. And now it turns out it can.

But to achieve an understanding of causality requires us to cast loose of many of the common tools and techniques we’ve been trained to apply and to understand the data from a wholly new perspective.  Fortunately the constant advance of research and ever increasing compute capability now makes it possible for us to use new relatively friendly tools to measure causality. 

However, make no mistake, you’ll need to master the concepts of causal data analysis or you will most likely misunderstand what these tools can do.


Why Causality

In the age when the call for transparency in our models is a constant social media and regulatory cry, causality offers the greatest promise.  Causal data analysis also gives you tools well beyond what you currently have to guide you to exactly what to do to get the best outcomes.  This speaks to the heart of prescriptive analytics.

It may seem that correlation and curve fitting have done just fine at answering important questions like next best offer, is it fraud, what’s the value going to be, and even is it a cat.  But there are a whole variety of questions that our users would like to have answered like:

  1. Given that there are X factors that predict preference for a product, which ones should the business actually try to influence, and in what order of importance. (What actually causes change in the target variable)?

Just ranking the strength of different variables on their ability to predict the target is not the same as selecting those that are independently predictive and evaluating their relative contribution to the outcome.

  1. What are the rankings of those key drivers that actually cause change and how do they compare to my competitors so that I can make smart marketing allocations.

Isolating a variable like ‘manufacturer’ within the same probability model doesn’t allow for re-ranking variables, and doesn’t answer the causality question to begin with.

  1. Did that medicine actually cure the disease?

This problem would require having actually performed both options on the same person, an impossibility in the real world.  Simply splitting the sample universe into two gives a probability but not a causally supportable answer.

  1. Would I still have contracted cancer if I had not smoked for the last two years?

Similarly, there is no way to express the probability associated with two years of not smoking without having an apriori understanding of the strength of the causal relationship not answered by our probability models.

Some other examples of causal questions:

  • Can the data prove whether an employer is guilty of hiring discrimination?
  • What fraction of past crimes could have been avoided by a given policy?
  • What was the cause of death of a given individual, in a specific incident?

We readily understand that we can’t prove causation from observations alone.  We can observe correlation but that does not prove or even imply causation.  There is no way we can test the statement “symptoms do not cause diseases” with correlation tools.  Our models simply support that symptoms occur in the presence of disease, and disease occurs in the presence of symptoms.

Researchers have been stuck with trying to apply ‘common sense’ directional logic outside of the mathematics, but with no way to rigorously test or prove the degree of causation.

At a simple level, particularly in problems involving human behavior we include variables which are not mutable (age, gender, home ownership) alongside variables which might under some circumstances be controllable because they represent perceptions (is it stylish, is it reliable, is it easy, were you satisfied).

Correlation suffices so far, but the questions answered by causality are ‘which levers should I actually pull to effect change’.  And beyond that, ‘what would happen if I changed some of the underlying assumptions in the model’ – the ultimate scenario problem.



The techniques of causal modeling, more formally known as Structural Equation Modeling (SEM) have actually been employed in the social sciences and in epidemiology for many years. 

It’s not possible to read much in this area without encountering the work of Judea Pearl, professor and renowned researcher in causality at the UCLA Computer Science Department Cognitive Systems Lab.  His work first earned him the Turing Award in 2011 for his invention of Bayesian networks.  Since then he has been the singular force trying to insert causality into AI/ML.

The techniques and mathematics of SEM are beyond the scope of summarizing in this article, but leave it to say that the fundamental math established by Pearl and the rapid evolution of graph models have made accessible causality tools available.

There are several open source packages available including:

DAGitty is a browser-based environment for creating, editing, and analyzing causal models. The focus is on the use of causal diagrams for minimizing bias in empirical studies in epidemiology and other disciplines.

Microsoft’s DoWhy library for causal inference.

The Tetrad Project at Carnegie Mellon.

Inguo.app is a commercial spinoff from NEC and backed by Dr. Pearl himself that appears to offer the most commercially ready and easily understood platform for causal analysis.  Offered SaaS, it offers variations meant to directly facilitate explanation to users about key factors and what if scenarios. 

Their site claims to return results in seconds where the number of variables is 60 or less, longer for more complex problems or more accurate results.  One can easily imagine the combinatorial explosion that occurs as the number of variables increase. 

This diagram drawn from an Inguo case study shows the relative clarity with which causal factors and their relative strength can be determined.  In this case the streaming video client was able to rapidly focus in on the two or three controllable variables that caused increased adoption as well as the relative strength of their contributions.



Why Understanding Causal Data Analysis is Critical

Although the graphics appear simple to interpret, helping your users to understand what is being predicted will require you to really understand the differences between causal data analysis and the AI/ML we’ve become used to.  To give you just a few examples:

  • Although we may determine that X is causal of Y it doesn’t mean that X is the only cause of Y. CDA recognizes that there may be many unexamined causes of Y so the real purpose of CDA is to determine the contribution of X to the outcome Y.
  • Similarly, if we add up all the contribution values of X in our model they may not come to 100%, recognizing the statistical presence of unseen variables.
  • Causality isn’t necessarily transitive. If we find that X causes Y and Y causes Z, we cannot conclude that X causes Z because we are dealing in ‘average causal effects’.

For a quick look at even more facts about CDA try this article.

For a deep dive into Judea Pearl’s most recent summary of the field find his current paper here.

The major takeaway here is that this is an expansion of our current toolset that finally allows us to answer user questions about what to do next based on true causal relationships.  If you are struggling with transparency issues, examine causality tools.  If you are simply ranking variables by the strength of correlation, you may be actively misleading users.



Other articles by Bill Vorhies


About the author:  Bill is Contributing Editor for Data Science Central.  Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001.    He can be reached at:

[email protected] or [email protected]

Views: 20845

Tags: DAGitty, DoWhy, Inguo, Judea, Pearl, SEM, Tetrad, analysis, causal, causality, More…data, equation, model, project, structural


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Messaoud Zouikri on January 7, 2020 at 11:51pm

Thanks to the author for this very interesting topic. Many data scientists today come from the IT sector, and so, when dealing with some topics such as causality they think things in a more techical way and seek solutions by trying sometimes to invent the wheel. It is important to widen our point of view by looking also at other disciplines such as the Econometrics field that is a mix of economic theory, statistics and mathematics. For instance, for some ideas related to the present topic, we have the seminal work of Granger (1969), Engle et al. (1983) and for causality tests in econometrics see Hsiao (1979), for some recent development of the concept of causality cf. (Granger, 1988).

Comment by Johan Steunenberg on May 3, 2019 at 9:54am

Triggered by this article I played a bit with DoWhy but either it seems quite immature.

Comment by PG Madhavan on April 26, 2019 at 1:37pm

Bill, yes, Causality Analysis IS the next wave in Data Science. Why? For BUSIINESS applications, dashboards, etc., are nice but the CxO wants to know which knobs to turn for a desired business outcome! Which is Prescriptive Analytics . . . not to overload with terms but what causes what is the reason for ALL scienctific inquiry . . .

Now then, give credit where credit is due - Judea Pearl persisted over 20 years and made serious advances. As he points out, Randomized Control Trials (RCT) is the gold standard. These types of experiments are highly restrictive, expensive and not applicable to data already collected! One of the earlier comments mistake this application context for Design of Experiments . . .

Using already collected data whic is not from an RCT, finding causal relationships seem like magic. It is not -

See my article for some context: https://www.linkedin.com/pulse/future-machine-learning-ai-big-oppor...

There are a new class of Learning methods; I have called it "Model-based Learning" (as opposed to model-free learning such as in Deep Learning). Causality Analysis of Pearl and Inguo belongs there - a graph model is assumed (based on physical relationships) which is used to minimize certain information measures that can yield the final causal structure. So, no magic here!

SEM has been around much longer - causality is a holy grail . . . we need better and better methods.

Comment by Lance Norskog on April 26, 2019 at 1:02pm

I have not seen a basic idea described in the causality literature. Either I'm looking in the wrong place, or I really am another Einstein- but I'm guessing the former.

In information theory, signal and noise are a "package deal", you don't get one without the other. Causality can be viewed as mixing signals, just like audio tracks on a mixing board. Signal A "causes" signal B when B is the output of a combination of A and other signals.

Since noise is inescapable, under this model signal B must contain noise from signal A. If signal B comes entirely from signal A, B's noise floor must be same as A's. This gives a negation test: if the noise from a "casuation agent" is not present in the "caused output", the input signal is not really in the output signal.

Like I said, I'm not an Einstein or even a Claude Shannon. Someone else must have articulated this in the past. Does it sound familiar?


Lance Norskgo

Comment by Tapan Bagchi on April 25, 2019 at 7:34pm

Dear William,

All you guys in computer science suffer from (a) little knowledge about anything outside C Sc and (b) jumping on the latest bandwagon, without the grasp of the relevant fundamentals. I have seen your colleagues argue till they turn blue as to which particular algorithm is better WITHOUT DESIGNING THE APPRORIATE STATISTICAL EXPERIMENTS. You are doing the same here by name dropping (Pearl...). This Turing awardee had nothing to do with checking causality by hypothesis testing. I use computing as I use the English language. But I am not blinded by your flashy stuff. Many of your ML collegues have little grasp of Linear Algebra or non-Convex optimization. They just play with the tools like banjo.

Frequently we run into situations where we wish to control a phenomenon that is observed as a response, and we have speculations about what factors might be causing it. Why in heaven are we after AI/ML then? It is often intended that CS folks deliberately avoid the issue of establishing such causality, for they avoided statistics in college altogether. They assume that this has already been done by by someone else suitable experimentation [1, 2, 3]. The goal generally is to drive the response to some desired target—by manipulating the enlisted causes or predictors. Do you just want pretty graphics? Please read [1, 2, 3]. These are classics on causality. Hopefully you will then blog more professionally.

  1. Box, G E P, W G Hunter and J S Hunter (1978). Statistics for Experimenters, Wiley.
  2. Holland, P (1986). Statistics and Causal Inference, Journal of American Statistical Association, Vol 81 No. 396, 945-960.
  3. Montgomery, D C (2007). Design and Analysis of Experiments, 5th (2007); 8th ed. (2012), Wiley.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service