I am having a bit confusion regarding mechanism of Variable Importance in Random Forest using original permuted scheme method
An excerpt from the link Why conditional importance? Page 3 as follows
Permutation scheme for the original permutation importance
The predictor variables are permuted in the computation of the importance measure: Strobl et al. (2008) show that the original approach, where one predictor variable Xj is permuted against both the response Y and the remaining (one or more) predictor variables Z = X1, . . . , Xj−1 , Xj+1 , . . . , Xp as illustrated in attached file, corresponds to a pattern of independence between Xj and both Y and Z. From a theoretical point of view, his means that a high value of the importance can be caused by a violation either of the independence between Xj and Y or of the independence between Xj and Z, even though the latter is not of interest here. For practical applications, this means that correlated predictor variables artificially appear more important than uncorrelated ones.
My question is why violation is observed between Y and both (Xj , Z) ? Why not only b/w Y and Xj only?