In practice, the Data Scientist wants to know which formula they will write in their Excel sheet when they enter all the data available into it: Bayes’ or usual?

The answer is that it depends: if all the data is well determined, so numerical, with closed functions, normal conditional probability. If there are doubts about the data or how to connect its parts, then it is better using the Bayesian probability.

When you use Bayesian probability, you are counting on some amount of guessing, so that one could safely say that whatever is sure to be the case will be expressed through normal conditional probability, and whatever is expressed in Bayesian probability may still be proven to be improper or not as good as something else.

Bayesian probability is then a statistical probability, and conditional probability is a mathematical probability.

The same difference one observes in non-strictly-objective research, so that research that deals with things beyond the World of Mathematics, the World of the Computers, and so on, and objective research, so that research that fully fits inside of the World of Algebra, or the World of the PCs.

If I want to know if the value of x in x+5=7 is 2, I work totally inside of the World of Mathematics, and I decide: yes, it is.

If I want to know how much chance I have to get liver disease if I am an alcoholic, and Science has progressed to the level of establishing a perfect mathematical relationship between both items, I work totally inside of the World of Mathematics. If doubts remain as to the correlation between ‘being an alcoholic’ and ‘having liver disease’, then I work inside of the ‘Bayesian World’ instead.

The way to apply the formula is:

(https://www.statisticshowto.datasciencecentral.com/bayes-theorem-pr...)

In normal conditional probability:

(https://stats.stackexchange.com/questions/250522/difference-between...)

A = having liver disease

P(A) = 0.1

B = being an alcoholic

P(B) = 0.05

P (A and B) = 0.07

P(A|B) = 0.07 x 0.1/0.05 = 0.14 = 14%

No difference.

However, if the next paragraph is the truth instead, so:

(https://www.statisticshowto.datasciencecentral.com/bayes-theorem-pr...)

We have:

P(A\B) = P(having liver disease given that the patient is alcoholic) = P (being an alcoholic given that the patient has the disease) x P(having liver disease)/P(being alcoholic)

Consequently:

P(A\B) = 0.07 x 0.1/0.05 = 0.14 = 14%

Notice that the formula above is the same as isolating P(B and A), which has to be identical to P(A and B), in the second formula of the previous picture, and replacing the numerator with the result.

In the Monty Hall Problem, we have:

- About 33% of the doors have a car on the first question. Exactly 50% on the second.
- All are doors, 3 of them on the first question, 2 of them on the second.
- Are doors, and have a car is about 33% on the first question, and exactly 50% on the second.

Calculation:

P(having a car/a door is opened) = P(a door is opened/having a car) x P(having a car)/P(door being opened)

P(having a car/a door is opened) = 100% x 0%/100% on the first question, since a door is always opened at that stage (Silvio or Monty Hall does that), and that door has no car, invariably. This probability is then 0, never a car on the first question.

P(having a car/a door is opened) = 100% x 50%/100% = 50%

Now, it is known that, as for results of the show, every 10 in 100 win the car, and 8 in 10 of the winners switched doors instead of sticking.

First question, all the same.

Second question moment, then:

P(having a car/other door is chosen) = P(other door is chosen/having a car) x P(having a car)/P(other door is chosen)

P(having a car/other door is chosen) = 0.8 x 50%/50% = 80%

On the other hand,

P(having a car/same door is chosen) = P(same door is chosen/having a car) x P(having a car)/P(same door is chosen)

P(having a car/same door is chosen) = 0.2 x 50%/50% = 20%

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central