I am working on a data set that has 10 features, and my label ed output is the 'Weight of humans'.
I want to find, out of 10 features, which are he 2 or 3 features due to which the 'Weight' varies.. What are these 2-3 features on which the weight varies...
Please let me know, how i can find it out, using any ML technique or algorithm..
Tags:
Hi, I'm a new student in data science, I'm doing the process the following way.
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
modelo = LogisticRegression()
rfe = RFE(modelo, 8)
fit = rfe.fit(X,Y)
print(X.columns)
print(fit.support_)
print(fit.ranking_)
melhores_colunas = X.columns[fit.support_]
Depends on what algorithm you are using. Assuming this is a classification problem and you are using RandomForestClassifier from sklearn, you can simply use it feature_importances_ method to look at the a sorted list of the features and determine which are more important.
Sklearn RandomForestClassifier: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble....
Checkout this blog post- https://blog.datadive.net/selecting-good-features-part-iii-random-f...
Although it talks about RandomForestRegressor but it may help you understand this...
Thanks the above links are helpful, but here my case is a regression one, but here i don't want to predict anything, i want to know features that are impacting my label ed output.
Have you tried looking at a correlation in the variables? Maybe you could use a scatter plot matrix or a correlation matrix to visualize your data and see what variables are impacting your label output.
Check out this article on visualizing your data, especially the Multivariate Plots section:
https://machinelearningmastery.com/visualize-machine-learning-data-...
Hope that helps!!
© 2019 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service