.

The original version of the central limit theorem (CLT) assumes *n* independently and identically distributed (i.i.d.) random variables *X*1, ..., *Xn*, with finite variance. Let *Sn* = *X*1 + ... + *Xn*. Then the CLT states that

that is, it follows a normal distribution with zero mean and unit variance, as *n* tends to infinity. Here *μ* is the expectation of *X*1.

Various generalizations have been discovered, including for weakly correlated random variables. Note that the absence of correlation is not enough for the CLT to apply (see counterexamples here). Likewise, even in the presence of correlations, the CLT can still be valid under certain conditions. If auto-correlations are decaying fast enough, some results are available, see here. The theory is somewhat complicated. Here our goal is to show a simple example to help you understand the mechanics of the CLT in that context. The example involves observations *X*1, ..., *Xn* that behave like a simple type of time series: AR(1), also known as autoregressive time series of order one, a well studied process (see section 3.2 in this article).

**1. Example**

The example in question consists of observations governed by the following time series model: *X**k*+1 = *ρX**k* + *Y**k*+1, with *X*1 = *Y*1, and *Y*1, ..., *Y**n* are i.i.d. with zero mean and unit variance. We assume that |*ρ*| < 1. It is easy to establish the following:

Here "~" stands for "asymptotically equal to" as n tends to infinity. Note that the lag-*k* autocorrelation in the time series of observations *X*1, ..., *X**n* is asymptotically equal to *ρ*^*k* (*ρ* at power *k*), so autocorrelations are decaying exponentially fast. Finally, the adjusted CLT (the last formula above) now includes a factor 1 - *ρ*. If course if *ρ* = 0, it corresponds to the classic CLT when expected values are zero.

**1.1. More examples**

Let *X*1 be uniform on [0, 1] and *X**k*+1 = FRAC(*bXk*) where *b* is an integer strictly larger than one, and FRAC is the fractional part function. Then it is known that *Xk* also has (almost surely, see here) a uniform distribution on [0, 1], but the *Xk*'s are autocorrelated with exponentially decaying lag-*k* autocorrelations equal to 1 / *b*^*k*. So I expect that the CLT would apply to this case.

Now let *X*1 be uniform on [0, 1] and *X**k*+1 = FRAC(*b*+*Xk*) where *b* is a positive irrational number. Again, *Xk* is uniform on [0, 1]. However this time we have strong, long-range autocorrelations, see here. I will publish results about this case (as to whether or not CLT still applies) in a future article.

**1.2. Note**

In the first example in section 1.1, with *X**k*+1 = FRAC(*bXk*), let *Zk* = INT(*bXk*), where INT is the integer part function. Now *Zk* is the *k*-th digit of *X*1 in base *b*. Surprisingly, if *X*1 is uniformly distributed on [0, 1], then all the *Zk*'s, even though they all depend exclusively on *X*1, are (almost surely) independent. Indeed, if you pick up a number at random, there is a 100% chance that its successive digits are independent. However, there are infinitely many exceptions, for instance the sequence of digits of a rational number is periodic, thus violating the independence property. But such numbers are so rare (despite constituting a dense set in [0, 1]) that the probability to stumble upon one by chance is 0%. In any case, this explains why I used the word "almost surely".

**2. Results based on simulations**

The simulation consisted of generating 100,000 time series *X*1, ..., *X**n *as in section 1,* *with *ρ *= 1/2, each one with *n* = 10,000 observations, computing *Sn* for each of them, and standardizing *Sn* to see if it follows a *N*(0, 1) distribution. The empirical density follows a normal law with zero mean and unit variance very closely, as shown in the figure below. We used uniform variables with zero mean and unity variance to generate the deviates *Yk*.

Below is one instance (realization) of these simulated time series, featuring the first *n* = 150 observations. The Y-axis represents *Xk*, the X-axis represents *k*.

The speed at which convergence to the normal distribution takes place has been studied. Typically, this is done using the Berry-Esseen theorem, see here. A version of this theorem, for weakly correlated variables, also exists: see here. The Berry–Esseen theorem specifies the rate at which this convergence takes place by giving a bound on the maximal error of approximation between the normal distribution and the true distribution of the scaled sample mean. The approximation is measured by the Kolmogorov–Smirnov distance. It would be interesting to see, using simulations, if the convergence is slower when *ρ* is different from zero.

*To receive a weekly digest of our new articles, subscribe to our newsletter, here.*

**About the author**: Vincent Granville is a data science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at DataShaping.com, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target). He recently opened Paris Restaurant, in Anacortes. You can access Vincent's articles and books, here.

- What is cloud management? Everything you need to know
- Best practices for defining a cloud monitoring strategy
- cloud infrastructure
- Cloud automation
- Enabling collaboration a rising analytics trend
- Can tech self-correct the skills gap by upskilling employees?
- recurrent neural networks
- 5 benefits and challenges of IT/OT convergence
- Flexify.IO CTO says multi-cloud migration within easy reach
- Infor spins off EAM business to Hexagon but retains a stake

Posted 13 July 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central