Subscribe to DSC Newsletter

Feature Selection For Unsupervised Learning

This is my presentation for the IBM data science day, July 24.


After reviewing popular techniques used in supervised, unsupervised and semi-supervised machine learning, we focus on feature selection methods in these different contexts, especially the metrics used to assess the value of a feature or set of features, be it binary, continuous or categorical variables.

We go in deeper details and review modern feature selection techniques for unsupervised learning, typically relying on entropy-like criteria. While these criteria are usually model-dependent or scale-dependent, we introduce a new model-free, data-driven methodology in this context, with an application to an interesting number theory problem (simulated data set) in which each feature has a known theoretical entropy.

We also briefly discuss high precision computing as it is relevant to this peculiar data set, as well as units of information smaller than the bit.

To download the presentation, click here (PowerPoint document.)

DSC Resources

Views: 3413


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Deb on May 8, 2019 at 8:11pm
Also another query , will this method useful for sparse data. I implemented this and found that this is choosing sparse features as relevant ones as there are clearly 2 clusters exist.
Comment by Deb on May 8, 2019 at 7:20pm
Hi Vincent,
This is nice method , just a question-

Instead of using "Create an artificial response Y" , can i use 1st Principal component as response Y. Refer page#11 from slides ?


  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service