Let say that your model is *Y* = *a* + *bX*, (for instance *X* is the time) but you know that *b* = 0. In short, you are trying to get the best fit for *Y* = *a*. Of course *a* is your average computed on your observations in that case (or better, the median if outliers are present.)

But how would you measure the correlation between *Y* and the constant *a*? It just does not exist, even though the closer the points (blue in the chart) to the baseline (red in the chart) the higher the correlation (if it existed!) should be. Now if you rotate all the points and your flat line so that it gets a slope different from zero, suddenly the correlation exists. How do you explain this paradox, and which metric would you use (bounded by 0 and 1 -- equal to 1 only when all observations are perfectly equal to the constant) to measure the quality of the fit when *b* = 0? The issue is that the correlation is not rotation-invariant (it is scale- and translation-invariant though.) It would be nice to have a metric, measuring the relationship, and bounded like the correlation coefficient, that is rotation-invariant. Do you know any? One way to do it is to rotate your points and baseline using various rotation angles, and compute the maximum correlation in absolute value, across all the rotated charts. But I am sure this is not the best solution.

This is a typical question in quality assurance problems, where *Y* (supposed to be constant) measures the default rate in a production setting (light bulbs, etc.)

**Note**: As an alternative, you could also test if *X* and *Y* are independent (*X* being the time) using a Chi Square test.

Tags:

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions