Probability in general can be thought of as the chances of a particular event happening, say winning or loosing a game, the cliche’ example of tossing a coin and what not.

When I say **chance **of an event, I mean the percentage of times that event would turn out if I were to do that experiment a **large number of times.**

Let us consider a box that contains 13 balls, out of which 5 are Blue and 8 are Red. What will be the probability of picking a red ball? Let’s see, there are 8 red balls and a total of 13 balls, so by intuition I’d say 8/13.

The most basic mathematical definition of probability will be something like:

**Probability(A) = P(A) = (Count of a likely event) / (Total number of outcomes)**

According to the above formula,

**P(Red ball) = 8 / (8+5) = 8/13 = 0.615 = 61.5%**

**P(Blue ball) = 5 / (8+5) = 5/13 = 0.384 = 38.4%**

Now we’ll try to calculate it in R.

**Step 1: Creating the box which contains the balls.**

box <- rep( c(“red”, “blue”), times = c(8, 5) )

This command gives you an object “box” which contains the vector values “red” and “blue” with each value repeated 8 and 5 number of times respectively.

**Step 2: Commencing the experiment of picking the balls.**

Let us pick a ball from the box and call it an **event.**

event <- sample(box, 1) #this takes a single ball out of the box.

output :

`[1] "red"`

This command will give a random value from the box. The first argument takes in the object that contains the values and the second argument takes the number of items that must be picked at random from the 1st argument.

This particular event does not tell us much about the probabilities and that is true because the whole sense of probabilities lies in the experiment happening a large number of times.

So we make an object called **no_of_times **and give it a big enough value, say 33,000.

no_of_times <- 33000

Now we can again use the sample function to repeat the experiment 33000 times and all that in the blink of an eye.

event <- sample(box, no_of_times, replace = TRUE)

Note that the **replace** argument has been used here. The **sample() **function by default has **replace = FALSE. **That is why if you try to pick more than 13 balls from the box, it will throw an error because the balls that have already been picked are not being replaced and hence the total number of balls in the box decreases with each experiment.

The **event** object from above will contain** 33,000 outcome**s i.e whether the ball is blue or red. Simply displaying the output is of no use to us. What we need is a **concise table** telling us the total number of blue and red balls.

tab <- table( event ) #this will do the trick.

`event `

blue red

12760 20240

*The output will look something like shown above. You might get different values as the sampling is done randomly.*

But again, the output contains big numbers, what we are really interested in are the proportions of red and blue balls out of the total balls in the box.

prop.table( tab) #this can be considered as the final output.

`event `

blue red

0.3866667 0.6133333

This tells us that the probabilities of getting a blue ball is **38.66% **while that of red is **61.33%**. On comparing these values from the ones which were calculated at the beginning, we see a slight variance, but that is okay since this tells us that if we increase the number of times the experiment is done then we can get even closer to the ideal values.

Guys, I've also uploaded a video covering the same, please find it here.

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central