Very short time periods (6 months) with several crashes, as well as long time periods (3 years) with no crashes are expected. An even distribution of plane crashes is indeed NOT expected - it would look very suspicious, and definitely not random. Here we assume that all events (plane crashes) are independent. We also assume that the average is two major plane crashes per year - which is realistic if you include all passenger airlines flying anywhere in the world - sometimes in dangerous weather, or above dangerous locales.
We did our own simulation in this Excel spreadsheet. The password for our spreadsheet will be published in our Monday digest. If you don't receive our digest in your mailbox this Monday, check out your promotion, social network or spam box in your email client (look for an email with subject line Weekly digest - August 4). We simulated 10 time series with on average 2 crashes per year: each time series represents 10 years of simulated observations. That is, 20 data points = 10 years x 2 crashes per year on average, for each of the 10 time series, with each data point representing a crash event (with time stamp simulated using the RAND function in Excel - for random number simulation).
Our conclusions are as follows
- In three of the simulated time series (out of 10), we found 4 crashes occurring within a 4-month time period; one of the ten time series had 4 crashes within a month
- In three of the ten time series, there was a 2.5 to 3 year time period with NO crash
Here is how we did our simulations
- Simulate 10 time series of random numbers in Excel using the RAND Excel function. Each time series is stored in a column. For each time series, generate 20 random numbers between between 0 and 10 (each one being a time stamp, representing a crash, with time unit being a year, thus each time series having 2 crash per year on average).
- Copy and paste the values ONLY (not the functions) in another tab, in the Excel spreadsheet
- Delete initial tab containing the RAND function (because each time you refresh it, new random numbers are generated and it screws up the sorted numbers, see next step)
- In the final (un-deleted) tab, sort each column separately: you most perform 10 sorts. Keep in mind that each value (a number between 0 and 10 years) represents a time stamp when a crash occured.
- Compute z = x(k+3) - x(k) for each column, where x(k) is the occurence (time) for crash number k. Then compute r = min(z) over all the 20 rows, for each column (time series). The number r represents the shortest time period (for each 10-year time series) where 4 crashes occured. Since we simulated 10 time series, we can easily compute confidence intervals for r.
- Compute y = x(k+1) - x(k) for each column, where x(k) is occurence (time) for crash number k. Then compute s = max(y) over all the 20 rows, for each column (time series). The number s represents the longest time period (for each 10-year time series) with no crash. Since we simulated 10 time series, we can easily compute confidence intervals for s.
Test our results with a mathematical model
We performed the Monte-Carlo simulations for you, but now we invite you to solve the problem using mathematical models. There is indeed an exact solution, easy to compute, for this problem. Let us know if your theoretical solution yields similar results. Here's how to proceed:
- The number N of events (crashes) follows a Poisson process with intensity v = 2 (2 crashes per year on average; one year = time unit). So the probability that exactly N = n events occur in a time period of length T = t is exp(-vt) * (vt)^n / n!
- The probability p that a 4-month (one year divided by 3) time period has exactly n = 4 accidents, is p = exp(-v/3) * (v/3)^n / n!
- The probability that there is at least one 4-month time period (any time period) within a 10-year time period, with 4 crashes, follows a binomial distribution of parameters (30, p) where 30 = 10 years multiplied by three 4-month periods per year.
Even if there are many crashes in a particular small time period, it does not mean that the likehood for a new crash in the next month is reduced, or increased. We are dealing here with memory-less processes.