Subscribe to DSC Newsletter

Quality and correctness of classification models. Part 3 – Confusion Matrix

In the last part of the tutorial we introduced quantitative indicators of classification model quality. In the next two parts we will take a closer look at a couple of graphical indicators. The first one is called the Confusion Matrix (the name „Contingency Table” is also used).


What is a Confusion Matrix?

Confusion Matrix is an N x N matrix, in which rows correspond to correct decision classes and the columns to decisions made by the classifier. The number ni,j at the intersection of i-th row and j-th column is equal to the number of cases from the i-th class which have been classified as belonging to the j-th class.

Examples:

 

Forms of Confusion Matrices

Various forms of Confusion Matrices let us more easily observe certain characteristics of the classification (i.e. the cost incurred by incorrect classifications).

  • Numerical form – contains counts of observations assigned to particular classes.

  • Percentage form – contains the percentages of observations assigned to particular classes calculated as the ratio of the count of observations assigned to the class to the total observation count.

  • Gains and losses form – contains information about gains and losses due to correct and incorrect classification decisions.

Confusion Matrix in the gains and losses form contains sums of costs due to classification decisions.

Example 1

Example 2

Cut off point and the Confusion Matrix

Cut off point is a certain threshold value which can be used to determine whether an observation belongs to a particular class.

if P(class(x)=1) >= alfa, then assign to class 1

where:

alfa – the cut off point

P (class(x)=1) – probability, that the given element belongs to the class denoted by 1

For example:

If the probability (calculated by our classification model) that a given loan applicant will not be good at repaying the loan is greater or equal to 60%, then assign this applicant to the class of bad debtors, otherwise assign him/her to the class of good debtors.

Different cut off points can be considered for the same problem (i.e. assessing creditworthiness), which will lead to different confusion matrices. By analyzing these matrices the optimal cut off point can be selected.

Confusion Matrix – summary

  • A simple and readable way of collecting classification results
  • Makes assessment of classification quality easier
  • Different forms of the Confusion Matrix can help in observing the required properties of the classifier
  • Can be used to determine gains and losses due to classification

Interested in similar content? Sign up for Newsletter

You can follow us at @Algolytics

Views: 1772

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service