How to prove it in math: why deeper decision trees will never have higher expected cross entropy?
What I discussed here is not only the math derivation which has usually been ignored in decision tree, but also the following question: what does the cross-entropy really...