Let say that your model is *Y* = *a* + *bX*, (for instance *X* is the time) but you know that *b* = 0. In short, you are trying to get the best fit for *Y* = *a*. Of course *a* is your average computed on your observations in that case (or better, the median if outliers are present.)

But how would you measure the correlation between *Y* and the constant *a*? It just does not exist, even though the closer the points (blue in the chart) to the baseline (red in the chart) the higher the correlation (if it existed!) should be. Now if you rotate all the points and your flat line so that it gets a slope different from zero, suddenly the correlation exists. How do you explain this paradox, and which metric would you use (bounded by 0 and 1 -- equal to 1 only when all observations are perfectly equal to the constant) to measure the quality of the fit when *b* = 0? The issue is that the correlation is not rotation-invariant (it is scale- and translation-invariant though.) It would be nice to have a metric, measuring the relationship, and bounded like the correlation coefficient, that is rotation-invariant. Do you know any? One way to do it is to rotate your points and baseline using various rotation angles, and compute the maximum correlation in absolute value, across all the rotated charts. But I am sure this is not the best solution.

This is a typical question in quality assurance problems, where *Y* (supposed to be constant) measures the default rate in a production setting (light bulbs, etc.)

**Note**: As an alternative, you could also test if *X* and *Y* are independent (*X* being the time) using a Chi Square test.

Tags:

© 2018 Data Science Central™ Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service