.

This article is part of a new series featuring problems with solution, to help you hone your machine learning and pattern recognition skills. Try to solve this problem by yourself first, before looking at the solution. Today's problem also has an intriguing mathematical appeal and solution: this allows you to check if your solution found using machine learning techniques, is correct or not. The level is for beginners.

The problem is as follows. Let *X*1, *X*2, *X*3 and so on be a sequence recursively defined by X*n*+1 = Stdev(X1, ..., *Xn*). Here *X*1, the initial condition, is a positive real number or random variable. Thus,

It is clear that *Xn* = *An X1*, where *An* is a number that does not depend on *X*1. So we can assume, without loss of generality, that *X*1 = 1. For instance, *A*1 = 1 and *A*2 = 0. The purpose here is to study the behavior of *An* (for large *n*) using simple model fitting techniques. I plotted the first few values of *An*, below. In the figure below, the X-axis represents *n*, and the Y-axis represents *An*. The question is: how to approximate *An* as a simple function of *n*? Of course, a linear regression won't work. What about a polynomial regression?

The first 600 values of *An* are available here, as a text file.

**Solution**

A tool as basic as Excel is good enough to find the solution. However, if you use Excel, the built-in function Stdev has a correcting factor that needs to be taken care of. But you can just use the values of *An* available in my text file mentioned above, to avoid this problem.

If you use Excel, you can try various types of trend lines to approximate the blue curve, and even compute the regression coefficients and the R-squared for each tested model. You will find very quickly that the power trend line is the best model by far, that is, *An* is very well approximated (for large values of *n*) by *An* = *b* *n*^*c*. Here *n*^*c* stands for *n* at power *c*; also, *b* and *c* are the regression coefficients. In other words, log *An* = log *b* + *c* log *n* (approximately).

What is very interesting, is that using some mathematics, you can actually compute the exact value of *c*. Indeed, *c* is solution of the equation *c*^2 = (2*c* + 1) (*c* + 1)^2, see here. This is a polynomial equation of degree 3, so the exact value of *c* can be computed. The approximation is *c* = -0.3522011. It is however very hard to get the exact value of *b*.

It would interesting to plot the residual error for each estimated value of *An*, and see if it shows some pattern. This could lead to a better approximation: *An* = *b* *n*^*c* (1 + *d */ *n*), with three parameters: *b*, *c* (unchanged) and *d*.

*To receive a weekly digest of our new articles, subscribe to our newsletter, here.*

**About the author**: Vincent Granville is a data science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at DataShaping.com, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target). He recently opened Paris Restaurant, in Anacortes. You can access Vincent's articles and books, here.

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central