Subscribe to DSC Newsletter

Supervised Ratio usage for categorical features with high cardinatlity

Hello Members,

I hereby would like to have your inputs and insights on the above mentioned subject matter. I have a feature data of supplier codes and it is a part of the data set which is being used to feed in the classification model to understand if a particular will be received on time or not.

For the supplier code data, I wanted to use the Supervised Ratio ( No of positive outcomes/Total outcomes against a particular code). Understand we have ratios derived for individual supplier codes.

Now my query is related to if I have a set of POs filed which are yet to be delivered and with the set of inputs required by the model, I attempt to feed the data for one particular PO, what ratio of decimal should be used while I am testing the model on the future POs or set of data?

Let me know if I could explain the above issue and would like to have your insights on the same.



Views: 351

Reply to This


  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service