Subscribe to DSC Newsletter

Hi All,

I am having a bit confusion regarding mechanism of Variable Importance in Random Forest using original permuted scheme method 

An excerpt from the link  Why conditional importance? Page 3 as follows

Permutation scheme for the original permutation importance

The predictor variables are permuted in the computation of the importance measure: Strobl et al. (2008) show that the original approach, where one predictor variable Xj is permuted against both the response Y and the remaining (one or more) predictor variables Z = X1, . . . , Xj−1 , Xj+1 , . . . , Xp as illustrated in attached file, corresponds to a pattern of independence between Xj and both Y and Z. From a theoretical point of view, his means that a high value of the importance can be caused by a violation either of the independence between Xj and Y or of the independence between Xj and Z, even though the latter is not of interest here. For practical applications, this means that correlated predictor variables artificially appear more important than uncorrelated ones.

Capture.GIF

My question is why violation is observed between Y and both (Xj , Z) ? Why not only b/w Y and Xj only? 

Regards

Khurram

Views: 296

Reply to This

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service