Correlation vs. causation

What are the techniques to discriminate between coincidences, correlations, and real causation? Also, if you have a model where the response is highly correlated to (say) 5 non-causal variables and 2  direct causal variables, how do you assign a weight to the two causal variables?

Can you provide examples of successful cause detection? In which contexts is it important to detect the true causes? In which context causation can be ignored as long as your predictive model generates great ROI?

Related articles

Load Previous Replies
  • up

    Alex Esterkin

    Facebook data scientists hilariously debunk Princeton "correlation equals causation" based study that says Facebook will lose 80% of users - by "proving" that Princeton will lose all its students by 2021.

    • up

      Taymour Matin

      Vincent, just started to read your blogs - thanks for your contributions!  

      One thing that doesn't seem to make into the discussion of correlation vs. causation is practicality.  Let us assume that X and Y are correlated but not causally linked.  Every time we observe X happening, we observe a similar pattern in Y.  If I notice X going up, I can take an action that depends on Y outcome.  If Y happens because of my inferences in X, I win even if the two are not causally linked.  Of course, there are a variety of situations where proving causality is necessary.  My point, however, is that not all situations require it.

      I would welcome your thoughts on this.


      • up

        Matt Anthony

        What about the decades old literature that basically started the modern conversation on this topic? Rubin's causal model is standard fare (or should be) for graduate statistics programs ... The fact that it remains unknown or unrecognized in these circles speaks to me that data science is losing touch with its roots in statistics as it is blended in with other disciplines.