Subscribe to DSC Newsletter

Probability in general can be thought of as the chances of a particular event happening, say winning or loosing a game, the cliche’ example of  tossing a coin and what not.

When I say chance of an event, I mean the percentage of times that event would turn out if I were to do that experiment a large number of times.

Let us consider a box that contains 13 balls, out of which 5 are Blue and 8 are Red. What will be the probability of picking a red ball? Let’s see, there are 8 red balls and a total of 13 balls, so by intuition I’d say 8/13.

The most basic mathematical definition of probability will be something like:

 Probability(A) = P(A) = (Count of a likely event) / (Total number of outcomes)

According to the above formula,

P(Red ball) = 8 / (8+5) = 8/13 = 0.615 = 61.5%

P(Blue ball) = 5 / (8+5) = 5/13 = 0.384 = 38.4%

Now we’ll try to calculate it in R.

Step 1: Creating the box which contains the balls.

box <- rep( c(“red”, “blue”), times = c(8, 5) )

This command gives you an object “box” which contains the vector values “red” and “blue” with each value repeated 8 and 5 number of times respectively.

Step 2: Commencing the experiment of picking the balls.

Let us pick a ball from the box and call it an event.

event <- sample(box, 1)         #this takes a single ball out of the box.

output :

[1] "red"

This command will give a random value from the box. The first argument takes in the object that contains the values and the second argument takes the number of items that must be picked at random from the 1st argument.

This particular event does not tell us much about the probabilities and that is true because the whole sense of probabilities lies in the experiment happening a large number of times.

So we make an object called no_of_times and give it a big enough value, say 33,000.

no_of_times <- 33000

Now we can again use the sample function to repeat the experiment 33000 times and all that in the blink of an eye.

event <- sample(box, no_of_times, replace = TRUE)

Note that the replace argument has been used here. The sample() function by default has replace = FALSE. That is why if you try to pick more than 13 balls from the box, it will throw an error because the balls that have already been picked are not being replaced and hence the total number of balls in the box decreases with each experiment.

The event  object from above will contain 33,000 outcomes i.e whether the ball is blue or red. Simply displaying the output is of no use to us. What we need is a concise table telling us the total number of blue and red balls.

tab <- table( event )        #this will do the trick.

blue red
12760 20240

The output will look something like shown above. You might get different values as the sampling is done randomly.

But again, the output contains big numbers, what we are really interested in are the proportions of red and blue balls out of the total balls in the box.

prop.table( tab)           #this can be considered as the final output.

blue red
0.3866667 0.6133333

This tells us that the probabilities of getting a blue ball is 38.66% while that of red is 61.33%. On comparing these values from the ones which were calculated at the beginning, we see a slight variance, but that is okay since this tells us that if we increase the number of times the experiment is done then we can get even closer to the ideal values.

Guys, I've also uploaded a video covering the same, please find it here.

Views: 406


You need to be a member of Data Science Central to add comments!

Join Data Science Central


  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service