Hello All,
I am new to Data Science,I wanted to know what are the Best Practice during Data Preparation.
Like Converting Integer into Category.Is it good practice to Categorize the Data.
e.g Data Contains Age column.Is it good practice to Club Age into different Category.
Please let me know if there are any article which i can refer for the same.
Thanks
Nitish
The grouping you are talking about is called binning. See articles on this topic: https://www.datasciencecentral.com/page/search?q=binning
Dear Nitish,
Data preparation appeared to be the tedious and the most essential section in the modeling phase.
It could be time consuming if it is appropriately done. As the adage said:" Garbage In ==> Garbage Out".
However, data audit would be crucial prior to Data Prep including but not limited:
Thank you Nitish. I have been looking for something like this
IMHO, although it is a common in the medical journals, it is never a good practice
to "convert" ratio data (e.g., age) into ordinal data ("categories," bins). It results in a loss
of the information inherently contained in the ratio data compared to its ordinal representation.
For instance take age: bins of 0-5 years, 6-10 years, 11-15 years, etc. 6 and 10 years old are
treated as "equal," while 5 and 6 years old are treated as "different" as 1 and 10 years old.
© 2019 Data Science Central ®
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles