Subscribe to DSC Newsletter

Optimal Binning for Scoring Modeling (R Package)

The R Package smbinning categorizes a numeric variable into bins (intervals) for its ulterior usage in scoring modeling. The theory behind it falls within a branch of Machine Learning called Supervised Discretization, a categorization technique that divides a continuous variable into a small number of intervals mapped to a discrete target variable. For example, time since an account was open (Integer in Months) and the credit performance (Good/Bad), as shown in Table 1.

Table 1. Binning for the characteristic Time on Books mapped to Credit Performance (Good/Bad).

The purpose of this package is to automate the time consuming process of selecting the right cut points, quickly calculate metrics such as Weight of Evidence and Information Value; and also document SQL codes, tables, and plots (Figure 1) used throughout the development stage.

Figure 1: Traditional plots for characteristics analysis.

Commercial softwares like STATISTICA and SAS have already implemented its own version of optimal binning with similar outputs, however, for analysts without the specific software or module, this package may help to run their analysis faster. 


R Package Website [Here]

Views: 9038

Tags: Categorization, Credit Scoring, Information Value, Marketing, Analytics, Modeling, Scoring, Supervised Discretization, Weight of Evidence


You need to be a member of Data Science Central to add comments!

Join Data Science Central


  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service