Subscribe to DSC Newsletter

Predictions - Effect of unique number of target classes on accuracy

When we perform machine learning of type classification, the target variable is a categorical (nominal) variable that has a set of unique values or classes . It could be a simple two class target variable like "approve application? " with classes (values)  of "yes" or "no". Sometimes they might indicate ranges like "Excellent", "Good" etc. for a target variable like satisfaction score. We might also convert continuous variables like test scores (1 - 100)  into classes like grades (A, B, C etc).


This experiment is to find the effect of the number of unique classes in the target variable on the accuracy of the prediction. The hypothesis is that accuracy will go down as the number of classes increases. This is because, with each additional class boundary, there is additional chance of a predicted sample to end up on the wrong side of the boundary.

For this experiment, I used a data set of  blood pressure levels. Each observation contains the patient's demographics and the actual systolic blood pressure measured. The value of the blood pressure is the binned into multiple classes (blood pressure ranges). Prediction of the blood pressure range is then done for varying number of bins (classes). The results are then tabulated as follows.


The experiment confirms the hypothesis. Accuracy drops sharply as the number of classes in the target variable increases. It does taper out beyond as size of 8.

Views: 451

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Jay Baker on November 3, 2014 at 6:36am

Yeah, I don't understand this. I think part of the article must be missing or something.

Comment by Alex Esterkin on October 30, 2014 at 8:50pm

What are you talking about???

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service