In the last part of the tutorial we introduced quantitative indicators of classification model quality. In the next two parts we will take a closer look at a couple of graphical indicators. The first one is called the Confusion Matrix (the name „Contingency Table” is also used).

**What is a Confusion Matrix?**

Confusion Matrix is an N x N matrix, in which rows correspond to correct decision classes and the columns to decisions made by the classifier. The number ni,j at the intersection of i-th row and j-th column is equal to the number of cases from the i-th class which have been classified as belonging to the j-th class.

Examples:

Various forms of Confusion Matrices let us more easily observe certain characteristics of the classification (i.e. the cost incurred by incorrect classifications).

**Numerical form**– contains counts of observations assigned to particular classes.

**Percentage form**– contains the percentages of observations assigned to particular classes calculated as the ratio of the count of observations assigned to the class to the total observation count.

**Gains and losses form**– contains information about gains and losses due to correct and incorrect classification decisions.

Confusion Matrix in the gains and losses form contains sums of costs due to classification decisions.

**Example 1**

Cut off point is a certain threshold value which can be used to determine whether an observation belongs to a particular class.

if P(class(x)=1) >= alfa, then assign to class 1

where:

alfa – the cut off point

P (class(x)=1) – probability, that the given element belongs to the class denoted by 1

For example:

If the probability (calculated by our classification model) that a given loan applicant will not be good at repaying the loan is greater or equal to 60%, then assign this applicant to the class of bad debtors, otherwise assign him/her to the class of good debtors.

Different cut off points can be considered for the same problem (i.e. assessing creditworthiness), which will lead to different confusion matrices. By analyzing these matrices the optimal cut off point can be selected.

- A simple and readable way of collecting classification results
- Makes assessment of classification quality easier
- Different forms of the Confusion Matrix can help in observing the required properties of the classifier
- Can be used to determine gains and losses due to classification

Interested in similar content? Sign up for Newsletter

You can follow us at @Algolytics

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central