Very short time periods (6 months) with several crashes, as well as long time periods (3 years) with no crashes are expected. An even distribution of plane crashes is indeed NOT expected - it would look very suspicious, and definitely not random. Here we assume that all events (plane crashes) are independent. We also assume that the average is two major plane crashes per year - which is realistic if you include all passenger airlines flying anywhere in the world - sometimes in dangerous weather, or above dangerous locales.
We did our own simulation in this Excel spreadsheet. The password for our spreadsheet will be published in our Monday digest. If you don't receive our digest in your mailbox this Monday, check out your promotion, social network or spam box in your email client (look for an email with subject line Weekly digest - August 4). We simulated 10 time series with on average 2 crashes per year: each time series represents 10 years of simulated observations. That is, 20 data points = 10 years x 2 crashes per year on average, for each of the 10 time series, with each data point representing a crash event (with time stamp simulated using the RAND function in Excel - for random number simulation).
Our conclusions are as follows
Here is how we did our simulations
Test our results with a mathematical model
We performed the Monte-Carlo simulations for you, but now we invite you to solve the problem using mathematical models. There is indeed an exact solution, easy to compute, for this problem. Let us know if your theoretical solution yields similar results. Here's how to proceed:
Note
Even if there are many crashes in a particular small time period, it does not mean that the likehood for a new crash in the next month is reduced, or increased. We are dealing here with memory-less processes.
Related articles
Comment
Hmm.... I am a bit out of practice with full mathematical rigor (mortis?) but here are my thoughts
Assume the probabilty of a binary event happenning per unit time is p. The probability of it not happening is q = 1-p
Assume p is small (say 0.01) then q is high. This means for any value of N the probability of an empty sequence (i.e the event not happening) is q**N and much higher than the probability of a full sequence of length N.
As a result if a realisation of this process is drawn out as a linear graph it would be dominatd by large empty spaces and smaller spaces in which something would be happening. It would look as if events were happening in clusters, even though they are random and a naive analysis would assume there was an underlying cause.
The interesting case is where p=q = 0;5 in which case I think it would look like an even distribution and be statistically symmetric between did and did not happen
As to air crashes I heard that 40% of all air journeys involve near misses. Clearly this means a crash now and then is inevitable. It is amazing that mid air collisions do not happen very often
I understand this is known as the inspector paradox. I read that is applies ot the fact that busses may depart from a stop with an average ten minute gap but you always seem to have a long wait. This is because the arrival times are dominated by long waits. If you arrive randomly then you are likely to be in the middle of a long gap.
It would seem that the way to detect this, if theprobabilities are unknown, would be to look at the distribution of intervals between the events ( e.g bus arrivals).
Now to get back to work while thinking about the full analysis, which isprobably on Wikipedia somewhere if I get the right search term.
You need to be a member of Data Science Central to add comments!
Join Data Science Central