During a random talk with a friend of mine, I was told an observation that most of the well-known Hollywood personalities are born between the month of April and July. This made me curious and did a random search for few actors, and it seemed like most of them were born between those months but couldn't say it for sure so went ahead and wrote a script to fetch the date of birth data for the top 5,000 ranked male and female movie personalities in IMDB which is a total of 10,000 movie personalities.
There are few things to note about this data
After getting the data, plotted a simple bar plot to see how many of them are born in each month.
Okay, so the observation doesn't hold good as it is pretty much evenly distributed.
Let's see the same for Top 500 ranked actors.
The month of April has more highly ranked actors than other months. This distribution here is a Poisson distribution so we can check what is the probability that a month has the number of movie personalities 120 or higher.
The mean here is 79.58 and the confidence interval of the mean is between 74.54 and 84.58. Let's assume that the mean would be at 85 then what would be the probability of having 120 or higher.
The probability turns out to be 0.000199 which is very low. This means that the month of April does stand out compared to other months.
Let's see how the age of these top 500 ranked actors is distributed. We'll plot a histogram of the age distribution.
There is nothing that stands out in this graph.
Moving on, Let's see if there are any patterns on the date that a person is born on. For this, the year of birth for all movie personalities is set to a particular year. The day of the person born on 21st May 1974 would be different from the day of the person born on 21st May 1975 which means ignore the days in the graph below.
In the graph, the intensity of the color red represents the number of people born on that date. Looking at the graph, there are 7 dates that stand out compared to other dates: 29th April to 5th May. These 7 days lie around Labour day which is May 1st and it is May 1st that has the maximum number of births.
It would be interesting to deep-dive further into these patterns and understand what is causing it. This would require other data which I'll be scouting for.
Comment
About the second graph.
It's better to investigate not the April data alone. The correct question is: what is the probability that at least one of the twelve months has that outstanding value. This probability is approx 12 times higher than that of April alone and = 0.0024.
Any way, it is very low value.
© 2020 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Upcoming DSC Webinar
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Upcoming DSC Webinar
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central