Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article How to Lie with P-values (also discussing how to handle and fix it.) This is be…
© 2019 Data Science Central ®
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles