Comments - Distribution of Arrival Times for Extreme Events - Data Science Central2018-10-16T14:42:11Zhttps://www.datasciencecentral.com/profiles/comment/feed?attachedTo=6448529%3ABlogPost%3A518404&xn_auth=noI did some unpublished work o…tag:www.datasciencecentral.com,2018-07-10:6448529:Comment:7428372018-07-10T22:21:55.336ZDavid Marxhttps://www.datasciencecentral.com/profile/DavidMarx
<p>I did some unpublished work on this back in 2015 you might be interested in: <a href="https://htmlpreview.github.io/?https://github.com/dmarx/statisticalArgumentForSettling/blob/master/statistical_argument_for_settling.html" rel="nofollow noopener" target="_blank">https://htmlpreview.github.io/?https://github.com/dmarx/statistical...</a> </p>
<p>I ran simulations for numerical approximations as well, but also developed some analytic theory. To summarize: you can model the arrival time of a…</p>
<p>I did some unpublished work on this back in 2015 you might be interested in: <a rel="nofollow noopener" href="https://htmlpreview.github.io/?https://github.com/dmarx/statisticalArgumentForSettling/blob/master/statistical_argument_for_settling.html" target="_blank">https://htmlpreview.github.io/?https://github.com/dmarx/statistical...</a> </p>
<p>I ran simulations for numerical approximations as well, but also developed some analytic theory. To summarize: you can model the arrival time of a new maximum with a geometric distribution, whose expectation is 1/p. For a continuous distribution with CDF F(x), the probability of observing an iid draw greater than a value m is 1-F(m). Therefore, if we want to know how many new draws j it will take to get a new maximum, we have E[j|m] = 1/(1-F(m)).</p>
<p>I think it's more useful if we instead parameterize this by the number of draws 'n' that we have made. I started getting into working out E[j|n] and was able to reason the heuristic that E[j|n] ~= n (i.e. 'm' is a roughly "one in n" outlier). This works for small n, but underestimates j as n grows. To get a better estimator, we need E[m|n], which allows us to calculate this as E[j|n] = 1/F(E[m|n]) . In the absence of an analytic estimator, simulation is slow for large n: instead, we can apply numerical integration to quickly achieve a high precision estimator for E[m|n]. </p>
<p>Going a bit deeper down the rabbit hole, if we have n samples, the distribution of the max is given by F(x)^n. E[F(x)^n] can be approximated numerically by monte carlo simulation (above). Alternatively, we can approximate this expectation with a median which we can calculate almost exactly from the CDF and n. Call the median for the max after n draws k_n, and we have E[j|n]=1/F(k_n).</p> Aaron Brown posted the follow…tag:www.datasciencecentral.com,2017-02-02:6448529:Comment:5196222017-02-02T23:27:27.679ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>Aaron Brown posted the following comment: </p>
<p class="qtext_para"><em>There are two problems with applying this to temperature records. First is that global mean temperature is closer to a random walk than independent observations. It takes a long time for the earth to heat up or cool down (notice that local temperatures peak about six to eight weeks after the time of maximum sunlight, and that’s just local), moreover many of the drivers of temperature have multiyear periods (El Niño, for…</em></p>
<p>Aaron Brown posted the following comment: </p>
<p class="qtext_para"><em>There are two problems with applying this to temperature records. First is that global mean temperature is closer to a random walk than independent observations. It takes a long time for the earth to heat up or cool down (notice that local temperatures peak about six to eight weeks after the time of maximum sunlight, and that’s just local), moreover many of the drivers of temperature have multiyear periods (El Niño, for example, as well as orbital and solar cycles).</em></p>
<p class="qtext_para"><em>With a random walk, the chance that the last 16 years have all been hotter than any previous year since 1880 is one in 50. If the temperatures were independent draws, the odds would be 1 in 3*10^20. So it makes a big difference. Of course, global temperatures are not quite a random walk, so the probability without human influence is less than 0.02, but on the other hand, it’s not true that the last 16 years have all been hotter than any previous year. If you look at long term global temperature estimates, you’ll see that its reasonably common to have runs of 14 out of 16 last years to be among the 16 hottest in a 137 year run, before human activities rose to a scale likely to influence global climate.</em></p>
<p class="qtext_para"><em>The other problem is we do know something about the distribution of global mean temperatures. That means you are not restricted to reasoning from records, you can look at how much hotter the recent temperatures were than older temperatures.</em></p>
<p class="qtext_para"><em>To see what I mean, consider the baseball record for triples hit in a season. The modern record was set at 36 in 1912 by Chief Wilson. The second best any player has ever done is 26. The record for strikeouts in a season was set at 223 in 2009 by Mark Reynolds, and the runner up is Adam Dunn who struck out 222 times in 2012. Now which record do you think is likely to stand longer?</em></p>
<p class="qtext_para"><em>If you want to make an estimate of this probability for global mean temperatures both with and without anthropogenic climate change, you either need a model for evolution of global temperatures, or use a longer dataset for control.</em></p>