My daughter just started a business analytics Master's program. For the probability sequence of the core statistics course, one of her assignments is to calculate the probability of single 5 card draw poker hands from a 52-card deck.
I well remember this exercise from back in the day, when I computed all such relevant probabilities using basic combination counting techniques for an intro to probability course. My daughter though, a business undergrad, is less interested in the math than she is in the stats/computation, opining that's where she'll make money.
It's hard to argue that logic, though I thought it might be an analytics "ah-hah" moment for her to connect the probability math with statistics. The population of 5 card draw hands, consisting of 52 choose 5 or 2598960 elements, is pretty straightforward both mathematically and statistically.
So of course ever the geek, I just had to attempt to show her how probability and statistics converge. In addition to explaining the "combinatorics" of the counts and probabilities, I undertook two computational exercises. The first was to delineate all possible combinations of 5 card draws from a 52 card deck, counting occurrences of relevant combinations such as 2 pair, a straight, or nothing in a cell loop.
The second and statistical approach revolves on Monte Carlo simulation that's driven from "repeated random sampling to obtain numerical results. Their essential idea is using randomness to solve problems that might be deterministic in principle." So by generating a sizable number of randomly sampled hands, similar logic to that articulated above can be used to tally counts of pertinent combinations. The Law of Large Numbers suggests that as sample size gets large, the statistical counts should look a lot like those probability calculations.
Method 1, grounded in math/probability, produces exact answers, while Method 2, which revolves on sampling, is approximate and variable, with generally more accuracy the larger the sample size.
Why approximate over exact? In this case the probability distribution is simple, but in more complicated instances, the math may be too complex. Better a Monte Carlo statistical estimate than intractable mathematics.
The remainder of this notebook looks at first the probability calculations and then the corresponding sampling-based approximations. The technology is Jupyter Notebook with Microsoft Open R 3.4.4.
For the probability calculations, I draw heavily on R's choose and combn combinatorics functions. The basic sample function feeds the statistical estimation cell's main loop. Simple frequencies of individual hands by suit and rank computed from the data.table package are central to the exercise. Not surprisingly, the notebook is very compute-intensive, consuming almost 2.5 hours from end to end.
Read the entire blog here.
Comment
I also used Monte Carlo simulations to compute confidence intervals and perform statistical tests of hypotheses, even when the distribution is known (t-test) but especially when it is intractable.
Another way to think of Monte Carlo methods is to think of them as ways to compute an intractable or inconvenient integral or sum. You need a process to generate points randomly within in a set and a map from set elements to boolean values that tell you if a point is in an interesting subset, then the "answer" is just the ratio. Estimating pi by dropping pennies randomly in a unit square and counting how many are in the inscribed circle, or dealing random poker hands and counting pairs, both are approaches to solving tricky integrals.
© 2019 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
You need to be a member of Data Science Central to add comments!
Join Data Science Central