Response Modeling Using Machine Learning Techniques with R-Programming (WIP). I have tried to exhibit credit scoring case studies with German Credit Data.
This article includes detail programming of predictive modeling
1. Univariate And Bi-Variate Analysis
2. Information Value and Weight Evidence to access prediction power of variables
3. Multivariate Analysis and Dimension Reduction using Variable Clustering
4. Different Machine Learning Techniques and their performance evaluation using ROC, AUC and KS
This is an attempt to showcase some worked out examples of Machine Learning (ML) using on German Credit Data https://ocw.mit.edu/courses/sloan-school-of-management/15-062-data-.... Though we have selected credit scoring problem as a case study in this article, the same process will be applicable for wide range of classification or regression problems “Response modeling”, “Risk Management”, “Attrition/Churn management”, “Cross-Sell/Up-Sell”, “usage Patterns”, “Net Present Value”, “Life Time Value”, “Predictive Maintenance and condition based monitoring”, “Warranty”, “Reliability”, “Failure Prediction”, “Image/Video Processing”, “Crime”, “Medical Experiments”, “Hidden pattern recognition” etc. for Banking, Insurance, Finance, Telecom, Manufacturing, “Law Firms and Criminal Investigation”, “Surveillance”, “Catalogue”, “Travel Transport and Hospitality”, “Healthcare”, “Utilities”, “Publishing”, “Education” and any industry you may come across.
The basic difference of traditional modeling and machine learning is that “in traditional modeling we intend to setup a modelimg framework and try to establish relationships while in machine learning we allow the model to learn from the data by understanding the hidden patterns”. Hence the first one requires analyst to have solid understanding of statistical techniques and business knowledge while the later one is more complex in nature and computational intensive, hence requires higher computation power of the systems and analyst needs to be tech savvy.
Kindly note that while traditional techniques perform well on small to large amount of data, machine learning will certainly learn better on high-dimensional and complex data such as BigData setup.
If you want to do more experiments and not sure where to get a problem definition or data to machine learning, you may explore the online machine learning repository here http://archive.ics.uci.edu/ml/.
If you are looking for answers of some technical queries you may post your question here on http://stackoverflow.com/ and of course do not forget to ask your best friends on the web Google http://www.google.com and/or Bing https://www.bing.com/.
PS: This paper is now working in progress….I will update as and when I make changes to this paper.
Browse the (long) paper with source code, on RPubs network - http://rpubs.com/arifulmondal/216381
This original document is created using R Markdown.
All the best.
--- Ariful Mondal