Subscribe to DSC Newsletter

The level in this article is for college students familiar with calculus, This material will be also of interest to college professors looking for new material to teach, or for original exam questions, as well as for business data scientists with some spare time, interested in refreshing their math skills. The problems cover real analysis, mathematical algorithms and numerical precision, correct visualizations, as well as geometry. The third problem is the most interesting one in my opinion, and could become a subject of active mathematical research with one new great, unsolved conjecture being proposed, of a probabilistic nature. The last problem has many applications in engineering science. 

1. The Simplest Function Defined by an Infinite Product

Some readers might be familiar with the popular product associated with the sine function:

However, there is even a more simple infinite product, converging to a function denoted below as h(x), and defined as follows: 

Note the alternance of the signs. Without the alternance, the product would not converge. There is no simple closed form (involving elementary mathematical functions only) for h(x). However, this product has interesting properties worth exploring, and this is the purpose of this exercise. The function h(x) is related to the Gamma function in more than one way. First, the Gamma function also has a similar infinite product. If you are not familiar with the subject, I encourage you to read this article (MIT class material for beginners) featuring a simple proof of the infinite product formula both for the sine and the Gamma functions. Second, we have the following result linking the Gamma function to h(x):

The function in the denominator in the rightmost expression is the Gamma function. The result is easy to prove. However, h(x) and g(x) are two different functions. 


Prove and leverage the following statements, to plot the function h(x) for x between -100 and +100:

If n is a positive integer, then

The latter approximation can be derived using the Stirling formula for n! (the factorial of n.) Note that h(0) = 1, and if n is a positive integer, then h(2n + 1) =0. Also note that the function h(x) oscillates constantly, and as x tends to infinity, we have:

  • The oscillations decrease in amplitude proportionally to the inverse of the square root function,
  • The extrema (alternating  between a minimum and a maximum) are located closer and closer to even integers. on the X-axis.

When x becomes large (say above 50) the infinite product converges very slowly and becomes extremely unstable from a numerical point of view. Using logarithms will help a bit though they are slow to compute, as well as using high precision computing (see also here.) However the best strategy is to use the above recurrence relations for the computations. 

What is the behavior of h(x) when x tends to minus infinity? Below is a plot for the function h(x).

Plot of h(x) between x = -20 and x = +20

The reason I show the chart only for x between -20 and +20 (rather than between -100 and +100 as in the suggested exercise) is because larger values of x (in absolute value) start to produce erroneous values for h(x) to the point that the chart becomes seriously and visibly wrong, despite computations looking fine at first glance.  The theoretical analysis was able to reveal this glitch. I am wondering how many charts are produced and published daily, without checking for accuracy, telling a message that is incorrect, for instance in this case, the fact that amplitudes for large x start increasing after a while (for x > 90), while the opposite is true. I believe college students should be made aware of this widespread issue, and learn how to address it, as in this problem. .

2. Surprising Series for Powers of Number 2

Everyone is familiar with Taylor expansions, for instance

Here we investigate a different type of series, based on factorial polynomials (see also here) rather than on powers of x. More precisely, the exercise here consists of proving that 

You should start by investigating this series for positive integer values of x. The result is then easy to prove; all but a finite number of terms are equal to 0. The result follows from a well known combinatorial identity. In the general case (x real), for which values of x does the series converge? For values between x = -1 and x = 0, convergence is a bit slow. How would you improve it? One way is to rewrite the formula as

The general case is actually easy to prove as well, using the Taylor expansion below, where y is considered as the main variable, and x is considered as a parameter. The result below is just a generalization of the binomial formula, to non-integer values of the exponent x

Then using y = 1, the result for 2^x (2 at power x) is proved. Can you see some analogy with the first problem in this article?

3. From Continuous Fractions to Nested Square Roots and More

Continuous fractions is a topic full of interesting and surprising results, especially about approximating irrational by rational numbers. Numbers like Pi have various continued fraction expansions exhibiting simple patterns. See also section 4 in this article. Continuous fractions, like the factorial base system (actually a base-independent system, unlike base-2 or base-10 systems), is one way to uniquely represent numbers. We have discussed other representations on Data Science Central, such as representations by infinite products, as well as by extremely fast converging Egyptian fractions

Here we discuss a different representation using square roots, as well as its generalizations to using arbitrary functions. Maybe a good word for these representations would be nested square roots or embedded square roots. Let x be a positive real number. Its nested square roots representation is an expression of the form

where the coefficients in the above expression are positive real numbers. It generalizes immediately to 

where f is an invertible function on its definition domain. For instance, if f is the square root function, it is invertible when defined for positive numbers. Let g be the inverse of f, so that f(g(x)) = x = g(f(x)). Probably the most natural way to define an f-based representation of x is to use the following iterative algorithm.

Algorithm to compute the coefficients.

The algorithm involves the following 2-step recursion, together with initializing the values when n = 1. It has been tested for the square root function only, and may need adjustments for functions having a different behavior. Here, we assume that x is positive. Again, g is the inverse of f

We ask you to write some code to implement it for the square root function, and to compute the representation (that is, the coefficients) for a few numbers, for instance for x = 0.5, 0.999, 1.001, 2, and 2.5.  Note that the brackets represent the floor function, also called integer part. We assume here that f, defined for positive real numbers only, is strictly monotone with f(0) = 0, f(1) = 1, and f(x)  <  x if x > 1. This is the case for the square root function. It might still work under less restrictive conditions, though this has not been studied yet.

Note: The algorithm fails if x is an integer. 


There are a number of interesting problems associated with f-representations of real numbers.  By representation, we mean the set of coefficients a_n produced by the above algorithm, for a specific x. You can start investigating the questions below, for the square root function.

  • Does the above algorithm converge, and does it converge to x? That is, does the f-representation of x, computed using the above algorithm, is actually equal to x? 
  • Can the coefficients in the representation be arbitrary, or only some sets of coefficients correspond to a potential representation? The latter is true for continued fractions.
  • Are there any numbers having a representation exhibiting simple patterns? This is the case for continuous fractions.
  • How fast is the convergence?
  • How is this related to continued fractions?
  • If x is between 0 and 1, the first coefficient (a_1) is equal to -1, and all other coefficients are positive. If x is greater than 1, all coefficients are positive. Prove it.  

Example: Nested Square Root for the Number Pi

As an illustration, for the number Pi, we have the following approximation:

As a short cut, the above expression consisting of the first seven coefficients, is denoted as [8; 2, 1, 0, 1, 0, 2]. The first 30 coefficients provide 14 correct decimals, and the coefficients in question are

[8; 2, 1, 0, 1, 0, 2, 0, 0, 2, 2, 1, 1, 2, 1, 1, 0, 0, 0, 0, 1, 2, 2, 0, 2, 0, 2, 2, 0, 1]

To get more than 14 correct decimals for Pi (or for any x in that matter) you need to use high precision computing libraries, as most programming languages provide only about 14 decimals precision, by default. You can download my code here; it generates the coefficients and converges to x unless x is an integer. 

No obvious patterns are found in the coefficients of number Pi, in the sense that any random number has a very similar chaotic representation consisting of 0's, 1's and 2's, except for the first coefficient. The exact proportion of 0's, 1's, and 2's, as well as whether or not the sequence of coefficients, for a given x, has an unusual Markov chain structure, will be the subject of an upcoming article. Solving this problem requires solving a stochastic integral equation. The a_n and g(x_n) sequences, thanks to their chaotic behavior, can be used in cryptographic systems, or for random number generation.

Note that the distribution of g(x_n) is close to (but not exactly equal to) a uniform distribution on [1, 2], and the sequence g(x_n) exhibits strong auto-correlations, while the a_n sequence of coefficients has much milder auto-correlations, very close to (yet different from) zero. This fact could be used to benchmark statistical tests, to check how good they are at detecting whether a number is equal to zero or not. The sequences a_n (the coefficients) and g(x_n) generated by the algorithm can be considered as ergodic stochastic processes.


Except for the first coefficient a_1 (corresponding to n = 1 in our algorithm), for the vast majority of positive real numbers (that is, except for a set of Lebesgue measure equal to 0, including positive integers), the coefficients in the nested square root representation are always equal to either 0, 1 or 2. Furthermore, regardless of x (except for the very few exceptions just mentioned) the distribution of 0's, 1's and 2's is always as follows: 

  • The coefficient a_n (n > 1) is equal to 0 about 46.4% of the time
  • It is equal to 1 about 30.4% of the time
  • It is equal to 2 about 23.2% of the time

Similar results are available for continuous fractions. For instance, the coefficients in continued fraction representations for almost all real numbers (excluding the first coefficient as in the nested square root case) have the following distribution:

  • It is never equal to 0 (except for finite continued fractions)
  • It is equal to 1 about 41.6% of the time
  • It is equal to 2 about 17.0% of the time
  • It is equal to 3 about 9.3% of the time
  • It is larger than 3 about 32.1% of the time

The exact probabilities, for continuous fractions, are known: See this article, page 10. Such results are usually known as ergodic properties. The most famous of these results is known as the Khinchin's theorem. Interestingly, the overabundance of small digits is a general phenomenon in nature, modeled by the Bendorf law. It has been exploited in criminal investigations to detect fake IDs, as man-made random digits (as opposed to computer-generated IDs) tend to have this property. 

For other fascinating number theory conjectures, including some of my own, read this article. For more on the distribution of digits in various systems, read this article.

4. Geometry: Shape Rearrangements and Coverage Problems

The first problem in this section is about cutting a regular polygon into pieces, and rearranging the pieces to form a different regular polygon. This is known as the third Hilbert problem, and it was proved that this is always feasible in two dimensions regardless of the polygon type, but not always in three dimensions. The picture below illustrates the problem.

In the above figure, a square was cut down, and the pieces were re-arranged to form a triangle. Both shapes have the same area. Can you use less than 4 pieces to make this transformation? The answer is no. This could be an interesting subject for restaurants serving pizzas, faced with a kid who does not like square pizzas, yet wants its fair share of the pie, possibly out of a triangle pizza (the kind of request I would have made when I was a kid.).

The next and final problem in this section - a coverage problem - has serious applications in engineering. This is a much more difficult problem, solved only recently in 1990, as a rebuttal to Kelvin's conjecture, after decades of simulations with computer technology. In 1887, Lord Kelvin asked how space could be partitioned into cells of equal volume with the least area of surface between them, i.e., what was the most efficient bubble foam? He proposed a foam, based on the bitruncated cubic honeycomb, which is called the Kelvin structure. The Kelvin conjecture was disproved by the discovery of the Weaire–Phelan structure. The new structure in question -- the Weaire–Phelan structure -- is the inspiration for the design of the Beijing National Aquatics Centre for the 2008 Olympics in Beijing in China.

It is related to honeycomb structures widely used in the industry. For more articles on coverage problems (in 2-D rather than 3-D) with applications for instance to optimal cell phone tower distribution, click here and here.

For related articles from the same author, click here or visit Follow me on Twitter at @GranvilleDSC or on LinkedIn.

DSC Resources

Popular Articles

Views: 4973


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent Granville on February 1, 2018 at 1:09pm

Regarding nested square roots (problem #3) I proved earlier this morning that the distribution of 0's, 1's and 2's (for the vast majority of positive real numbers), respectively denoted as p(0), p(1), and p(2), is as follows:

This is confirmed by empirical evidence. Note that p(0) + p(1) + p(2) = 1.

A regular number is a number satisfying this distribution, and the immense majority of numbers are regular numbers. Is Pi a regular number? Probably, and I am investigating this problem at this moment. I also created a general framework to study this kind of problems (mostly based on chaotic stochastic process theory), that also applies to the decimals of Pi and other popular numbers such as square root of 2, expressed in base 2 or 10 (or any base.) This could be a major step towards proving that the decimals of Pi are truly random - I once offered $500,000 to prove or disprove this fact. This is a 500 years old problem at least, and possibly the most challenging (unsolved) mathematical problem of all times.

To make sure that you don't miss my upcoming article on this topic, I invite you to sign up here to receive our newsletter.

Follow Us


  • Add Videos
  • View All


© 2018   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service