The fundamental assumption in many predictive models is that the predictors have normal distributions. Normal distribution is un-skewed. An un-skewed distribution is the one which is roughly symmetric. It means the probability of falling in the right side of mean is equal to probability of falling on left side of mean.
This article outlines the steps to detect skewness and resolve the skewness of data to build better predictive models. The article specifically discusses the following:
Finding the right transformation to resolve Skewness can be tedious. Box and Cox in their 1964 paper proposed a statistical method to find the right transformation. They suggested using below family of transformations and finding the λ:
Notice that because of the log term, this transformation requires x values to be positive. So, if there are zero and negative values, all values need to be shifted before applying this method.
You can find sample R and Python implementation of Boxcox transformation to resolve ske...
Comment
Links are not working. Looks like the articles were taken down.
"The fundamental assumption in many predictive models is that the predictors have normal distributions."
Which methods? Trees, OLS and others do not require this assumption. Could you explain? Thanks.
Thanks Mark... I fixed the link...
The link given at the end of the article gives a "You do not have permission to preview drafts" error.
You do not have permission to preview drafts.
© 2019 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
You need to be a member of Data Science Central to add comments!
Join Data Science Central