# Subscribe to DSC Newsletter

It can be difficult reading the news to figure out what is happening: many pro- and anti- climate change evangelists have a political agenda, some really do great science, but it is difficult to assess the amount of bias in these studies, and how many are impacted by bias. Doing a survey, inviting people all over the world to answer questions such as "is it getting more extreme where you live", might be the way to answer the question. Here, after reading an article about South Carolina floods being the worst in a thousand year, I could not resist but ask this question:

What is the chance that - among 1,000 randomly selected locations on Earth - year 2015 will bring an extremely rare event that occurs no more than once every thousand year, in at least one of these 1,000 locations?

The answer is straightforward: the probability is 1 - (1 - 1/n)^n = 63% approximately, with n = 1,000. A pretty good chance that it will happen just by chance!

It's a bit more complicated because events are not independent (the worst droughts causing the worst fires), and because such events (extreme temperatures) are widespread enough to cover several of your 1,000 sample locations, not just one. But that means that the chance in question is probably well above 63%.

My point here is that it is very easy for a journalist or someone with political interests, to pick up one extreme event - one that occurs every thousand years on average - to convince you that we are close to the apocalypse. It's one of the many ways you can lie with statistics.

DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 674

### Replies to This Discussion

Great question. Your comment about lying with statistics made me think through the analysis a bit more. In your model the N year event horizon is the same as the number of locations sampled. This leaves open the possibility of someone choosing their own number of sampled locations to either inflate or deflate the probability of an event, and leading to more lies.

Model the arrivals as a poisson process, wave arms wildly and assume independence in time and space, then the probability that one or more events occur in a given year is 1-exp(-1/N) for a specific location, and 1-exp(-1*R/N) across R regions, where N is the once in N year event horizon. Furthermore, instead of taking a random sample of regions, let R = Total Area/Event area. For instance, suppose meteorologists define the extreme rainfall event area to be 10000 square miles (about the size of the Dallas/Fort Worth Metroplex), then the number of such regions in the USA is 3.8x10^6/10^4 = 380. Globally (land area only), the number of rainfall regions is 57.53x10^6/10^4=5753.

Looking just at the USA, supposing the rainfall event area is in the ballpark, and we want to know about 1000 year events, then R=380 and N=1000, and the model is poisson with P(x>0) = 1-exp(-380/1000). Therefore, the expected number of 1000 year events across the USA is 0.38 (or approximately 1 every 3 years). Furthermore, the probability that there are more than 2 such events in any year is less than 0.01 (use R, matlab, or python to compute the inverse poisson with lambda = .38). Put another way, we should be concerned if we see more than two 1000 year rainfall events in any one year across the USA.

Globally, the expected number of 1 in 1000 year rainfall events in any one year is 5753/1000=5.753 (again, supposing the event area is in the ballpark). The probability of there being more than ten 1000 year events in any one year is less than 0.04.

Floods and droughts can be modeled similarly, but I suspect flood regions are more narrowly defined, and drought regions are substantially larger than a metro area.

The proposed model reduces fudging and limits arguments to the size of an event area that presumably an expert can be called upon to answer, and the independence assumption arm wave.

It would be interesting to compare the model with actual data. Anyone know of a public database that tracks 100 year and 1000 year events?

Bob Ordemann

System 3 Data Sciences

## Videos

• ### DSC Webinar Series: Patterns for Successful Data Science Projects

Added by Tim Matteson

• ### DSC Webinar Series: Advanced Mapping with Tableau

Added by Tim Matteson 