There are many ways to choose features with given data, and it is always a challenge to pick up the ones with which a particular algorithm will work better. Here I will consider data from monitoring performance of physical exercises with wearable accelerometers, for example, wrist bands.
The data for this project come from this source: http://groupware.les.inf.puc-rio.br/har.
In this project, researchers used data from accelerometers on the belt, forearm, arm, and dumbbell of few participants. They were asked to perform barbell lifts correctly, marked as "A", and incorrectly with four typical mistakes, marked as "B", "C", "D" and "E". The goal of the project is to predict the manner in which they did the exercise.
There are 52 numeric variables and one classification variable, the outcome. We can plot density graphs for first 6 features, which are in effect smoothed out histograms.
Comment
I am sorry but I have to say that I could not reproduce the results that this article claims based on the same HAR dataset she has cited here. I used the same R code that she has posted in her blog.
Almost every pair of variables appears to have a ratio greater than 0.75 which means that every variable has to be selected. So I don't know if she has tested it on multiple data sets and can cite some results and code here for us to confirm.
I'm sorry, but it turned out that such long article cannot be posted.
Sorry for belated reply,
my explanation turned out to be rather long, and I put it here:
http://myabakhova.blogspot.com/2016/02/computing-ratio-of-areas.html
I will see if I can publish it here as well.
This is an excellent article about feature selection using random forest. I found that feature selection for machine learning algorithms is a great opportunity for more research. Great addition to the literature.
Do you mind clarifying your formula exactly? By area between the curves are you referring to the sum of the (absolute value) areas where one curve is over/under the other, and by area under one curve, do we care which one?
Thank you!
© 2020 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central