I am working on a data set that has 10 features, and my label ed output is the 'Weight of humans'.

I want to find, out of 10 features, which are he 2 or 3 features due to which the 'Weight' varies.. What are these 2-3 features on which the weight varies...

Please let me know, how i can find it out, using any ML technique or algorithm..

Views: 526

Reply to This

Replies to This Discussion

Hi, I'm a new student in data science, I'm doing the process the following way.

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

modelo = LogisticRegression()

rfe = RFE(modelo, 8)
fit = rfe.fit(X,Y)

melhores_colunas = X.columns[fit.support_]

Depends on what algorithm you are using. Assuming this is a classification problem and you are using RandomForestClassifier from sklearn, you can simply use it feature_importances_ method to look at the a sorted list of the features and determine which are more important.

Sklearn RandomForestClassifier: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble....

Checkout this blog post- https://blog.datadive.net/selecting-good-features-part-iii-random-f...

Although it talks about RandomForestRegressor but it may help you understand this...

Thanks the above links are helpful, but here my case is a regression one, but here i don't want to predict anything, i want to know features that are impacting my label ed  output.

Have you tried looking at a correlation in the variables? Maybe you could use a scatter plot matrix or a correlation matrix to visualize your data and see what variables are impacting your label output.

Check out this article on visualizing your data, especially the Multivariate Plots section:


Hope that helps!!


© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service