Comments - The Fundamental Statistics Theorem Revisited - Data Science Central2018-04-26T15:14:13Zhttps://www.datasciencecentral.com/profiles/comment/feed?attachedTo=6448529%3ABlogPost%3A494987&xn_auth=no@Matthew - My answer is no ba…tag:www.datasciencecentral.com,2016-12-11:6448529:Comment:4972032016-12-11T02:43:22.463ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>@Matthew - My answer is no based on my investigations, but it is worth double-checking. </p>
<p>@Matthew - My answer is no based on my investigations, but it is worth double-checking. </p> What about a(k) = 1/k^(3/4)?…tag:www.datasciencecentral.com,2016-12-10:6448529:Comment:4971012016-12-10T22:27:15.761ZMatthew A. Riebelhttps://www.datasciencecentral.com/profile/MatthewARiebel
<p>What about a(k) = 1/k^(3/4)? Would that converge to Gaussian?</p>
<p>What about a(k) = 1/k^(3/4)? Would that converge to Gaussian?</p> @Carlos, I know that the Cauc…tag:www.datasciencecentral.com,2016-12-08:6448529:Comment:4962042016-12-08T19:14:32.474ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>@Carlos, I know that the Cauchy distribution doesn't have any moments, and there are other exceptions. It does not invalidate my article. Instead it makes it even more interesting as I propose to investigate a CLT where normalization is done via L1 metrics such as the median (the Cauchy distribution has a median.).</p>
<p><br></br>Also I know that the NN distribution is exponential. But you haven't read my paper: sums of exponential converge to a Gaussian once normalized. I never claimed that…</p>
<p>@Carlos, I know that the Cauchy distribution doesn't have any moments, and there are other exceptions. It does not invalidate my article. Instead it makes it even more interesting as I propose to investigate a CLT where normalization is done via L1 metrics such as the median (the Cauchy distribution has a median.).</p>
<p><br/>Also I know that the NN distribution is exponential. But you haven't read my paper: sums of exponential converge to a Gaussian once normalized. I never claimed that distances to neighbors, for an homogeneous Poisson process, are Gaussian (that would imply that some distances are negative.) Indeed my paper is not even about homogeneous Poisson process, but processes with a radial intensity - the homogeneous case is a particular version, and for the non-homogeneous case, distances do not follow either Gaussian or exponential distributions.</p>
<p>Finally, like Mark wrote, for CLT independence and identically distributed is NOT necessary.</p> @Carlos Aya THE Classical CLT…tag:www.datasciencecentral.com,2016-12-08:6448529:Comment:4961042016-12-08T16:01:25.762ZMark L. Stonehttps://www.datasciencecentral.com/profile/MarkLStone
<p>@<a class="fn url" href="http://www.datasciencecentral.com/profile/CarlosAya">Carlos Aya</a> <em>THE</em> Classical CLT requiring existence of mean and variance, and independent and identically distributed variables, requires all those things. There absolutely are CLTs which do not require variables be independent or identically distributed. In fact, there are CLTs for some cases in which not only does a mean not exist, but in which not even a 1 - epsilon moment exists - I proved such a CLT…</p>
<p>@<a href="http://www.datasciencecentral.com/profile/CarlosAya" class="fn url">Carlos Aya</a> <em>THE</em> Classical CLT requiring existence of mean and variance, and independent and identically distributed variables, requires all those things. There absolutely are CLTs which do not require variables be independent or identically distributed. In fact, there are CLTs for some cases in which not only does a mean not exist, but in which not even a 1 - epsilon moment exists - I proved such a CLT as a homework assignment in graduate school.</p>
<p>That said, convergence to a Normal can be extremely slow for some of the more exotic CLTs. Sample size in the millions might be required to get as close an approximation as sample size 30 would be for a nice CLT case. And where is the approximation going to be particularly poor? In the tails, which is what are used in many of the most crucial probability calculations,such as risk.. </p> Mr Granville, there are few "…tag:www.datasciencecentral.com,2016-12-08:6448529:Comment:4959752016-12-08T11:18:29.161ZCarlos Ayahttps://www.datasciencecentral.com/profile/CarlosAya
Mr Granville, there are few "simplifications" in your post that deserve clarification - even more if you aim to reach a wide audience, only with high school level education.<br />
<br />
First, there is an importan caveat in the CTL: it only applies if the original distribution has a mean. Odd cases like the Cauchy distribution do not obey CLT because it lacks this property. Is the Cauchy distribution one of those things mathematicians use to annoy people? Well, sadly is more common than we would like to…
Mr Granville, there are few "simplifications" in your post that deserve clarification - even more if you aim to reach a wide audience, only with high school level education.<br />
<br />
First, there is an importan caveat in the CTL: it only applies if the original distribution has a mean. Odd cases like the Cauchy distribution do not obey CLT because it lacks this property. Is the Cauchy distribution one of those things mathematicians use to annoy people? Well, sadly is more common than we would like to acknowledge. The ratio of two normals follows that, and you know how people love ratios, specially in the business world. Also, because the Cauchy dist has fat tails, it is now a common "trick" when people want to be more "fuzzy" dealing with errors in optimisation. Case in point the visualisation technique t-SNE popularised by Google. So, use t-SNE at your own peril..!<br />
<br />
Second, you mention an application of weights for nearest neighbours in this framework... ouch. Distance to 1-NN is related to the exponential distribution, so one should consider that rather than a Gaussian if required. Actually, for the sake of proper teaching, one should mention that the CLT also requires independent and identically distributed variables. When one uses 2-NN variables for example, one automatically creates a dependency between variables... ouch.<br />
<br />
Worth clarifying that if one wanted to have a better "average" for distance to 1-NN, a combinatory argument (not sure if within reach for a high school level, perhaps) will show that 2-NN contributions are in the order of 1/n, where n is the sample size; 3-NN of the order 1/n^2... and I believe it goes like that but I haven't worked out those.<br />
<br />
Finally, as someone already said, looking like a Gaussian and being a Gaussian are two different things. This is something that requires pen and paper... although computer experiments certainly help if one knows the trade. The Fundamental Data Science…tag:www.datasciencecentral.com,2016-12-07:6448529:Comment:4959312016-12-07T20:21:27.498ZMark L. Stonehttps://www.datasciencecentral.com/profile/MarkLStone
<p>The Fundamental Data Science Theorem: "Oversimplification and hype never go out of style". Approximately having a normal distribution of mean 0 and variance 1 is not the same as having a normal distribution of mean 0 and variance 1. Indeed, the approximation can be quite poor, especially in the tails, even for rather large values of n. Why should we care about the tails of the distribution? Because that's exactly what some of the most important life and death calculation are based on. …</p>
<p>The Fundamental Data Science Theorem: "Oversimplification and hype never go out of style". Approximately having a normal distribution of mean 0 and variance 1 is not the same as having a normal distribution of mean 0 and variance 1. Indeed, the approximation can be quite poor, especially in the tails, even for rather large values of n. Why should we care about the tails of the distribution? Because that's exactly what some of the most important life and death calculation are based on. Risk analysis, whether for a potentially catastrophic failure, as with example a nuclear power plant, or the ability of a financial institution or other company to withstand adverse business or larger economy circumstances, depends crucially on the tails of the distribution. Oversimplified analysis based on Normal approximation, not to mention not accounting for dependency of random variables, often results in underestimation of serious consequence risks by 2 or more orders of magnitude. So yes, high school kids can understand the Central Limit Theorem at some level, data science internship and bootcamp graduates can understand it at some level, a level sufficient to lead them to make risk assessments which are off by orders of magnitude.</p>
<p></p>
<p>Big Data, Data Science and their tools continue to advance.at a breathtaking rate. Spurious Correlations per sec (SPcs) increases with each new generation of tools and analysts. I hereby state on the record, the existence of Moore's Law for Spurious Correlations per second (SCps). Mark L. Stone - Dec 2016.</p> What can be applied aspects o…tag:www.datasciencecentral.com,2016-12-07:6448529:Comment:4958422016-12-07T20:13:37.820ZGregory Yom Dinhttps://www.datasciencecentral.com/profile/GregoryYomDin
<p>What can be applied aspects of your article?</p>
<p>What can be applied aspects of your article?</p>