I am new to Data Science,I wanted to know what are the Best Practice during Data Preparation.
Like Converting Integer into Category.Is it good practice to Categorize the Data.
e.g Data Contains Age column.Is it good practice to Club Age into different Category.
Please let me know if there are any article which i can refer for the same.
The grouping you are talking about is called binning. See articles on this topic: https://www.datasciencecentral.com/page/search?q=binning
Data preparation appeared to be the tedious and the most essential section in the modeling phase.
It could be time consuming if it is appropriately done. As the adage said:" Garbage In ==> Garbage Out".
However, data audit would be crucial prior to Data Prep including but not limited:
Thank you Nitish. I have been looking for something like this
IMHO, although it is a common in the medical journals, it is never a good practice
to "convert" ratio data (e.g., age) into ordinal data ("categories," bins). It results in a loss
of the information inherently contained in the ratio data compared to its ordinal representation.
For instance take age: bins of 0-5 years, 6-10 years, 11-15 years, etc. 6 and 10 years old are
treated as "equal," while 5 and 6 years old are treated as "different" as 1 and 10 years old.
Here are some best practices:
Let me know if you have any doubts.