This is becoming a bigger issue every month - authors publishing articles about some statistical technique in a data science blog - while these techniques not only don't work well in many contexts, but in addition can not be understood or interpreted by the layman or your client. It makes data science looks bad.
In this case, it's about classical tests of hypotheses, implicitly (and unfortunately) assuming that the underlying distribution is normal. The drawbacks are as follows:
- I have rarely seen a normal (Gaussian) distribution in practice, and even after transformation, most distributions associated with modern problems are not normal.
- This test is not robust; a few outliers will easily invalidate your conclusions.
- This test is subject to p-hacking, a technique consisting of replicating your test dozens of times until it provides the conclusion that you like.
- This test relies on p-values, an arcane concept that nobody but an elite club of initiated professionals (charging a lot of money) understand. The term p-hacking comes from abusing p-values, to lie with statistics. Remember: there are lies, damn lies, and statistics (and Amazon reviews). Frankly, do you know what p-value means?
- It is very hard for the average person to understand these concepts. We have developed a math-free, stats-free methodology to perform a test of hypothesis: basically, you compute a (model-free, data-driven) confidence interval, and if the parameter that you measure is outside the bounds of your confidence interval, your assumption must be rejected. It can easily be performed even in Excel, as shown in my article. This framework is much easier to understand even by the non initiated, and in addition my confidence intervals are robust and distribution-free, unlike the standard version.
- My version of "hypothesis testing" is easy to implement even in SQL. It is also universal, in the sense that it applies to any king of data, even data with outliers, not well-behaved data, or data with a special, unusual distribution.
- Teaching the classic statistical version of this test is just like teaching assembly language in a programming class: this stuff should be automated and used in contexts where it works. But there is no need to teach this material in data science classes, it is a waste of time, especially since in most textbooks, there are about 100 pages of prerequisites (random variables, probability theory, and so on) before the concept can be introduced. .
- My approach is bottom-up (from data to modeling), that is, applied; the traditional test is top-bottom (from modeling to data), that is, theoretical.
- The only advantage of the classical test is that it has been published thousands of times in textbooks for over 150 years, making it some kind of (bad) standard. It was invented well before computers existed, at a time when mathematical elegance would prevail over lengthy computations. But tradition does not mean efficiency nor robustness.
- Because there are so many theoretical statistical distributions and so many ways to test hypotheses, classical statistics offers more than 100 different types of tests (just like it offers dozens of confusing regression techniques): Normal test (the one criticized here), Student test, F-test, Chi-Square for independence, Chi-Square for model fitting, Kolmogorov-Smirnoff, Wilcoxon, just to name a few popular ones. Each one can be one-sided or two-sided, and when testing multiple parameters, it gets even more complicated. Some require numerical algorithms to find the critical values. To the contrary, my approach, being distribution-independent, offers one simple universal test. And because it is based on confidence intervals, you don't even need to know what one-sided or two-sided means. And jargon such as Error I or Error II is replaced by English words: false-positives and false negatives.
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge