I pulled out a dusty copy of Thinking Stats by Allen Downey the other day. I highly recommend this terrific little read that teaches statistics with easily understood examples using Python. When I purchased the book eight years ago, the Python code proved invaluable as well.

Downey also regularly posts informative blogs. One that I came across recently is There is Still Only One Test, that explains statistical testing through a computational lens rather than a strictly mathematical one. The computational angle has always made more sense to me than the mathematical. In this blog, Downey clearly articulates the computational approach.

The point of departure for a significance test is the assumption that the difference between observed and expected is due to chance. A statistic such as mean absolute difference or mean square difference between observed and expected is then computed. Following this in the simulated case, data are randomly generated in which the “observed” are sampled from the “expected” distribution (where, by design, there is no statistical difference between observed and expected), the same comparison statistic is calculated, and the list of all such computations are stored and sorted. The actual observed-expected statistic is then contrasted with those in the simulated list. The p value represents how extreme that value appears in the ordered list. If it falls in the middle, we’d accept the null hypothesis that the difference is simply chance. If, on the other hand, it lies outside 99% of the simulated calcs, the p-value would be < .01, and we’d be inclined to reject the null hypothesis and conclude there’s a difference between observed and expected.

The remainder of this blog addresses the question of whether 60 rolls of a “fair” 6-sided die could reasonably yield the distribution of frequencies (8,9,19,6,8,10), where 19 represents the number of 3’s, and 8 denotes the number of 1’s and 5’s, etc. The expected counts for a fair die would be 10 for each of the 6 sides. Three comparison functions are considered. The first is simply the max across all side frequencies; the second is the mean square difference between observed and expected side frequencies; and the third is mean absolute value difference between observed and expected side frequencies.

The technology used is JupyterLab 0.32.1 with Python 3.6.5. The simulations are showcased using Python’s functional list comprehensions. The trial frequencies are tabulated using the Counter function in Python’s nifty collections library.

Read the entire post here.