My daughter just started a business analytics Master’s program. For the probability sequence of the core statistics course, one of her assignments is to calculate the probability of single 5 card draw poker hands from a 52-card deck.

I well remember this exercise from back in the day, when I computed all such relevant probabilities using basic combination counting techniques for an intro to probability course. My daughter though, a business undergrad, is less interested in the math than she is in the stats/computation, opining that’s where she’ll make money.

It’s hard to argue that logic, though I thought it might be an analytics “ah-hah” moment for her to connect the probability math with statistics. The population of 5 card draw hands, consisting of 52 choose 5 or 2598960 elements, is pretty straightforward both mathematically and statistically.

So of course ever the geek, I just had to attempt to show her how probability and statistics converge. In addition to explaining the “combinatorics” of the counts and probabilities, I undertook two computational exercises. The first was to delineate all possible combinations of 5 card draws from a 52 card deck, counting occurrences of relevant combinations such as 2 pair, a straight, or nothing in a cell loop.

The second and statistical approach revolves on Monte Carlo simulation that’s driven from “repeated random sampling to obtain numerical results. Their essential idea is using randomness to solve problems that might be deterministic in principle.” So by generating a sizable number of randomly sampled hands, similar logic to that articulated above can be used to tally counts of pertinent combinations. The Law of Large Numbers suggests that as sample size gets large, the statistical counts should look a lot like those probability calculations.

Method 1, grounded in math/probability, produces exact answers, while Method 2, which revolves on sampling, is approximate and variable, with generally more accuracy the larger the sample size.

Why approximate over exact? In this case the probability distribution is simple, but in more complicated instances, the math may be too complex. Better a Monte Carlo statistical estimate than intractable mathematics.

The remainder of this notebook looks at first the probability calculations and then the corresponding sampling-based approximations. The technology is Jupyter Notebook with Microsoft Open R 3.4.4.

For the probability calculations, I draw heavily on R’s choose and combn combinatorics functions. The basic sample function feeds the statistical estimation cell’s main loop. Simple frequencies of individual hands by suit and rank computed from the data.table package are central to the exercise. Not surprisingly, the notebook is very compute-intensive, consuming almost 2.5 hours from end to end.

Read the entire blog here.