Subscribe to DSC Newsletter

I am working on a data set that has 10 features, and my label ed output is the 'Weight of humans'.

I want to find, out of 10 features, which are he 2 or 3 features due to which the 'Weight' varies.. What are these 2-3 features on which the weight varies...

Please let me know, how i can find it out, using any ML technique or algorithm..

Views: 502

Reply to This

Replies to This Discussion

Hi, I'm a new student in data science, I'm doing the process the following way.

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

modelo = LogisticRegression()

rfe = RFE(modelo, 8)
fit =,Y)

melhores_colunas = X.columns[fit.support_]

Depends on what algorithm you are using. Assuming this is a classification problem and you are using RandomForestClassifier from sklearn, you can simply use it feature_importances_ method to look at the a sorted list of the features and determine which are more important.

Sklearn RandomForestClassifier:

Checkout this blog post-

Although it talks about RandomForestRegressor but it may help you understand this...

Thanks the above links are helpful, but here my case is a regression one, but here i don't want to predict anything, i want to know features that are impacting my label ed  output.

Have you tried looking at a correlation in the variables? Maybe you could use a scatter plot matrix or a correlation matrix to visualize your data and see what variables are impacting your label output.

Check out this article on visualizing your data, especially the Multivariate Plots section:

Hope that helps!!



  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service