*Originally posted on DataScienceCentral, by Dr. Granville. Click here to read original article and comments.*

In this article, I proposes a simple metric to measure **predictive power**. It is used for combinatorial feature selection, where a large number of feature combinations need to be ranked automatically and very fast, for instance in the context of transaction scoring, in order to optimize predictive models. This is about rather big data, and we would like to see an Hadoop methodology for the technology proposed here. It can easily be implemented in a Map Reduce framework. It was developed by the author in the context of credit card fraud detection, and click/keyword scoring. This material will be part of our data science apprenticeship, and included in our Wiley book.

*Feature selection* is a methodology used to detect the best subset of features, out of dozens or hundreds of features (also called variables or rules). By “best”, we mean with highest *predictive power*, a concept defined in the following subsection. In short, we want to remove duplicate features, simplify a bit the correlation structure (among features) and remove features that bring no value, such as a features taking on random values, thus lacking predictive power, or features (rules) that are almost never triggered (except if they are perfect fraud indicators when triggered).

The problem is combinatorial in nature. You want a manageable, small set of features (say 20 features) selected from (say) a set of 500 features, to run our *hidden decision trees* (or some other classification / scoring technique) in a way that is statistically robust. But there are 2.7 * 1035 combinations of 20 features out of 500, and you need to compute all of them to find the one with maximum predictive power. This problem is computationally intractable, and you need to find an alternate solution. The good thing is that you don’t need to find the absolute maximum; you just need to find a subset of 20 features that is good enough.

One way to proceed is to compute the predictive power of each feature. Then, add one feature at a time to the subset (starting with 0 feature) until you reach either

- 20 features (your limit)
- Adding a new feature does not significantly improve the overall predictive power of the subset (in short, convergence has been attained)

At each iteration, choose the feature to be added among the two remaining features with the highest predictive power: you will choose (among these two features) the one that increases the overall predictive power (of the subset under construction) most. Now you have reduced your computations from 2.7 * 1035 to 40 = 2 * 20.

Tags:

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions