Home » Uncategorized

Classifications in R: Response Modeling/Credit Scoring/Credit Rating using Machine Learning Techniques

This article was written by Ariful Mondal. Artful is a senior manager, data science and big data analytics consultant at Tata Consultancy Services. 

1. Introduction

This is an attempt to showcase some worked out examples of Machine Learning (ML) use German Credit Data. Though we have selected credit scoring problem as a case study in this article, the same process will be applicable for wide range of classification or regression problems “Response modeling”, “Risk Management”, “Attrition/Churn management”, “Cross-Sell/Up-Sell”, “usage Patterns”, “Net Present Value”, “Life Time Value”, “Predictive Maintenance and condition based monitoring”, “Warranty”, “Reliability”, “Failure Prediction”, “Image/Video Processing”, “Crime”, “Medical Experiments”, “Hidden pattern recognition” . for Banking, Insurance, Finance, Telecom, Manufacturing, “Law Firms and Criminal Investigation”, “Surveillance”, “Catalogue”, “Travel Transport and Hospitality”, “Healthcare”, “Utilities”, “Publishing”, “Education” and any industry you may come across.

The basic difference of traditional modeling and machine learning is that “in traditional modeling we intend to set up a modeling framework and try to establish relationships while in machine learning we allow the model to learn from the data by understanding the hidden patterns”. Hence the first one requires analyst to have solid understanding of statistical techniques and business knowledge while the later one is more complex in nature and computational intensive, hence requires higher computation power of the systems and analyst needs to be tech savvy.

Kindly note that while traditional techniques perform well on small to large amount of data, machine learning will certainly learn better on high-dimensional and complex data such as Big Data set up.

Ariful has used following machine learning techniques in this article:

  1. Logistic Regression
  2. Recursive partitioning for classification (Basic and Bayesian)
  3. Random Forest
  4. Conditional Inference Tree
  5. Bayesian Networks
  6. Unbiased Non-parametric methods- Model Based (Logistic)
  7. Support Vector Machine
  8. Neural Network
  9. Lasso Regression

2808316170

What you will find in this article:

  • 1. Introduction
  • 2.Data analysis and variable creation
  • 3 Model Selection and Development
  • 4 Model Performance Comparision
  • 4.1 Receiver Operating Characteristic(ROC) curve
  • 5 Concluding Remarks
  • Appendix A: R Packages used in this paper
  • References
  • Appendix B About R Markdown

Check out all this information, here. For more articles about classifications in R, click here

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge