.

Bayes' Theorem, which The Stanford Encyclopedia of Philosophy calls "...a simple mathematical formula" can be surprisingly difficult to actually solve. If you struggle with Bayesian logic, solving the "simple" formula involves not much more than guesswork. You have to translate a problem into "A given B" and "B given A", cross your fingers that you're guess for whatever A and B is is right, double check your thoughts, get thoroughly lost, and punch the resulting fractions into a calculator. The calculator will spit out an answer which may or may not be correct as you have no idea what your point-oh-something solution means in terms of the original problem. If this sounds like you, you're not alone: various studies have shown that the vast majority of physicians can't work the formula either.

But **there's a more intuitive way to get to the same answer,** without the counter-intuitive formula. The procedure in question? None other than the humble probability tree.

This example problem is adapted from a problem in Gigrenzer & Hoffrage's *How to Improve Bayesian Reasoning Without Instruction: Frequency Formats*:

Out of 1,000 patients, 10 have a rare disease. Eight of those diseased individuals display symptoms. Out of the 990 healthy individuals, 95 display symptoms. What is the probability a patient with symptoms actually has the disease?

Here's the traditional textbook method, using the Bayesian algorithm.

If you're good with numbers, you may be able to immediately see that the answer this question with a simple ratio: number of diseased people with symptoms / total number of people with symptoms.

Now let's construct the same answer with a probability tree:

From there, the math is a simple ratio:

Number of people with disease and symptoms (8) / Total number with symptoms (8 + 95)

which gives us:

8 / 103 = 0.078.

Let's try **another example** (borrowed from Bayes' Theorem Problems):

You want to know a patient’s probability of having liver disease if they are an alcoholic. 10% of patients at a certain clinic have liver disease. Five percent of the clinic’s patients are alcoholics. Out of those patients diagnosed with liver disease, 7% are alcoholics.

Like the first problem, the first branch here is also "disease", but the second branch needs to address "alcoholism" instead of "symptoms". We're not told "how many" patients, so I'll use 1000--which is usually a sufficient number for problems like this. You're also not told explicitly the number of alcoholics (or % of non-liver disease alcoholics), but you can use a little logical deduction:

Out of 1000, patients, 5% (50 total) are alcoholic,

7% of patients with liver disease are alcoholic. That gives you 7 (green box), leaving 43 for the orange box.

Now all we have to do is figure out the ratio:

Number of people with disease and alcoholism (7) / Total number with alcoholism (50)

which gives us:

7 / 50 = 0.14

Which is **exactly the same answer you would get by actually working the formula.** In fact, I've never come across a Bayes' related problem that can't be answered with a probability tree and a little logical reasoning. So if the formula is giving you headaches, just do what I did--and ditch it in favor of a more intuitive approach.

**References**

Gigrenzer, G. & Hoffrage, U. How to Improve Bayesian Reasoning Without Instruction: Frequency Formats. Psychological Review, 102 (4), 1995, 684–704. www.apa.org/journals/rev/

Gould, S. J. (1992). Bully for brontosaurus: Further reflections in natural history. New York: Penguin Books.

- Demand proliferates for low-code app development platforms
- US Senate mulling bill on data breach notifications
- AIOps network management requires vendor-buyer teamwork
- Deloitte SAP Industry Cloud apps aim to fill ERP gaps
- Compare 7 headless CMS offerings and their key differences
- 9 steps to a dynamic data architecture plan
- Building trustworthy AI is key for enterprises
- IoT and responsibility: Use digital for good
- 4 zero-trust IoT steps to scale security
- Apply hyperscale data center design principles to IT storage

Posted 27 July 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central