I pulled out a dusty copy of Thinking Stats by Allen Downey the other day. I highly recommend this terrific little read that teaches statistics with easily understood examples using Python. When I purchased the book eight years ago, the Python code proved invaluable as well.

Downey also regularly posts informative blogs. One that I came across recently is There is Still Only One Test, that explains statistical testing through a computational lens rather than a strictly mathematical one. The computational angle has always made more sense to me than the mathematical. In this blog, Downey clearly articulates the computational approach.

The point of departure for a significance test is the assumption that the difference between observed and expected is due to chance. A statistic such as mean absolute difference or mean square difference between observed and expected is then computed. Following this in the simulated case, data are randomly generated in which the "observed" are sampled from the "expected" distribution (where, by design, there is no statistical difference between observed and expected), the same comparison statistic is calculated, and the list of all such computations are stored and sorted. The actual observed-expected statistic is then contrasted with those in the simulated list. The p value represents how extreme that value appears in the ordered list. If it falls in the middle, we'd accept the null hypothesis that the difference is simply chance. If, on the other hand, it lies outside 99% of the simulated calcs, the p-value would be < .01, and we'd be inclined to reject the null hypothesis and conclude there's a difference between observed and expected.

The remainder of this blog addresses the question of whether 60 rolls of a "fair" 6-sided die could reasonably yield the distribution of frequencies (8,9,19,6,8,10), where 19 represents the number of 3's, and 8 denotes the number of 1's and 5's, etc. The expected counts for a fair die would be 10 for each of the 6 sides. Three comparison functions are considered. The first is simply the max across all side frequencies; the second is the mean square difference between observed and expected side frequencies; and the third is mean absolute value difference between observed and expected side frequencies.

The technology used is JupyterLab 0.32.1 with Python 3.6.5. The simulations are showcased using Python's functional list comprehensions. The trial frequencies are tabulated using the Counter function in Python's nifty collections library.

Read the entire post here.

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central