Are the Digits of Pi Truly Random? - Updated with Blockchain Application and More

This article covers far more than the title suggests. It is written in simple English and accessible to quantitative professionals from a variety of backgrounds. Deep mathematical and data science research (including a result about the randomness of Pi, which is just a particular case) are presented here, without using arcane terminology or complicated equations.  

The topic discussed here, under a unified framework, is at the intersection of mathematics, probability theory, chaotic systems, stochastic processes, data and computer science. Many exotic objects are investigated, such as an unusual version of the logistic map, nested square roots, and representation of a number in a fractional or irrational base system. 

The article is also useful to anyone interested in learning these topics, whether they have any interest in the randomness or Pi or not, because of the numerous potential applications. I hope the style is refreshing, and I believe that you will find plenty of material rarely if ever discussed in textbooks or in the classroom. The requirements to understand this material are minimal, as I went to great lengths (over a period of years) to make it accessible to a large audience. The content (including truly original exercises and applications) is also valuable to college professors looking for off-the-beaten-path material to teach, as well as to students wishing to get exposure and interest to advanced topics usually reserved for 'elite' data scientists, without being discouraged due to the simplicity of the presentation.

The randomness of the digits of Pi is one of the most fascinating, unsolved mathematical problems of all times, having been investigated by many million of people over several hundred years. The scope of this article encompasses this particular problem as part of a far more general framework. More questions are asked than answered, making this document a stepping stone for future research.

If you are looking for a quick summary of the material presented in this long article, click here

Content of the Article:

1. General Framework

  • Questions, Properties and Notations about Chaotic Sequences Investigated Here
  • Potential Applications, Including Random Number Generation

2. Examples of Chaotic Sequences Representing Numbers

  • Data Science Step
  • Mathematical Step
  • Numbers in Base 2, 10, 3/2 or Pi 
  • Nested Square Roots
  • Logistic Map

3. About the Randomness of the Digits of Pi

  • The Digits of Pi are Random in the Logistic Map System
  • Paths to Proving Randomness in the Decimal System
  • Connection with Brownian Motions

4. Curious Facts

  • Randomness and The Bad Seeds Paradox
  • Application to Cryptography, Financial Markets, Blockchain, and HPC
  • Digits of Pi in Base Pi

5. Exercises

1. General Framework

We are dealing with sequences x(n) of positive real numbers defined by a recurrence relationship x(n+1) = g(n) for some function g, and initiated with a seed x(1) = x. We are interested only in functions g that produce chaos and preserve the domain. For instance, if x(n) is anywhere in [0, 1] we want x(n+1) to also be anywhere in [0.1].

We also study another sequence derived from x(n), and defined by a(n) = h(x(n)) for some function h. The function h is typically very simple and produces only small integer values such as (say) 0, 1, ... , 9 regardless of x(n). The sequence a(n) is called the digits of x represented in the chaotic system defined by the function g.

Using appropriate functions for g, the sequence x(n) behaves as a realization of a stochastic (random) process, each term x(n) acting as a random deviate of some random variable X(n). To be of any value here, the function g must be associated with random variables that are identically distributed, though not necessarily independent. Instead of using the notation X(1), X(2) and so on, we simply write X as all these random variables have the same distribution. The distribution of X is called the equilibrium distribution (or fixed-point distribution) of the system, and it is the solution of a simple stochastic integral equation. Read this article for some illustrations, especially section 3. Likewise, the a(n) sequence has a statistical distribution associated with it, called the digits distribution, and easily derived from the equilibrium distribution. 

Examples of functions g producing these patterns are investigated in details in this article. Such systems are usually referred to as chaotic systems. It is a special case of (ergodic) stochastic process or time series. Note that in practice, for a given chaos-inducing function g, not all seeds -- the initial value x(1) -- produce a sequence x(n) with the desired properties. Seeds failing to do so are called bad seeds and result in equilibrium and digits distributions that are degenerate. Even though all the systems discussed here have infinitely many bad seeds, these seeds are so rare in practice (compared with the good seeds) that the probability to stumble upon one by chance for a given chaos-inducing  g, is zero.  

At this point, the initiated reader probably knows what I am going to talk about, even though much of it is original research published for the first time. In the grand scheme of things, the randomness of Pi (and I will discuss it later in this article) appears as a very peculiar case, certainly not the most noteworthy result to prove, though still very fascinating.

Questions, Properties and Notations about Chaotic Sequences Investigated Here

For the remaining of this article, we are going to focus on the following, applied to a few different types of chaos-inducing functions g:

  • The digits a(n) are denoted as [a(1), a(2), a(3)... ] or {a(n)} and uniquely represent the number x. We want to build sequences {a(n)} based on the functions h and g, such that if x and x' are two distinct numbers, then the respective representations {a(n)} and {a'(n)} are different.
  • There is a function f, usually straightforward (but not always) such that for a given chaos-inducing g and a given h, we have x = f(a(1), a(2), a(3)...).
  • For a given g and h, can the algorithm producing {a(n)} generate any arbitrary {a(n)} or only some specific types of {a(n)} with some constraints on the digits? Sometimes yes (example: decimal representations), sometimes no (example: continuous fractions).
  • Are the sequences {x(n)} and {a(n)} auto-correlated? Sometimes yes, sometimes no. What are the implications? For the representation of a number in any integer base b such as b = 10, {x(n)} is auto-correlated while {a(n)} is not. If the base b is not an integer, {x(n)} is much less auto-correlated, but {a(n)} becomes auto-correlated. Also if the base b is not an integer, the digits distribution is no longer uniform. 
  • Can we compute the equilibrium and digits distributions? Yes using an iterative algorithm, but in all the examples discussed in the next section, exact solutions in closed form are known and featured, and I will show you how they were obtained.

Potential Applications

The sequences {x(n)} and {a(n)} can be used for continuous / discrete number generation and in cryptographic applications. See also this article with applications to modeling population demographics, and physics. They can also be used for machine precision benchmarking, as numerical errors accumulate so fast that after 50 iterations, regardless of the sequence, all digits are wrong unless you use high precision computing.   

2. Examples of Chaotic Sequences Representing Numbers

All the g functions presented here are chaos-inducing. This is necessary if one wants, with a given g, to represent all numbers, not just a small fraction of them. For each case, we present, whenever possible, the following features:

  • The function f discussed earlier, to reconstruct the original number x (the seed)
  • The (continuous) equilibrium distribution and its support domain
  • The (discrete) digits distribution
  • The functions g and h
  • Other properties such as auto-correlations or exact formula for x(n) when available

Three types of chaotic processes are discussed

  • Tradition representation of a number in base b, with b > 1 (integer or not) and denoted as Base(b)
  • Logistic map of exponent p, denoted as LM(p)
  • Nested powers of exponent q, denoted as NP(q)

In all cases, the seed x(1) = x is a positive real number. The equilibrium distribution (if it exists) is always solution of the stochastic integral equation

P(X  <  y) = P(g(X)  <  y).

A way to solve this apparently complex equation is described in details, and in very simple words, in this article, with an illustration for the logistic map LM(1/2) discussed below. It involves two steps:

  • Data Science Step: Gather data to compute the empirical percentile distribution for x(n), and perform various regressions (polynomial, logarithmic and so on) until you discover one with an incredibly fantastic fit with data -- like a perfect fit. 
  • Mathematical Step: Plug that tentative distribution into the stochastic integral equation, and see if it solves the equation. 

Also, by definition, as discussed earlier, x(n+1) = g(x(n)) and a(n) = h(x(n)). This can be used to design a trivial algorithm to compute all the digits in any system powered by a chaos-inducing function g. The seed x can be reconstructed using the formula x = f({a(n)}. Here the notation INT represents the integer part, sometimes called floor function. In the formula displayed as images, the floor function is represented by left and right brackets: this is the standard convention.

I now discuss three types of number representation system.

Numbers in Base 2, 10, 3/2 or Pi 

This system (the traditional decimal system in base b) is characterized by:

with b > 1. The seed x must be in [0,1] so that all x(n)'s are also in [0.1]. If x = Pi, use x-3 instead. 

We distinguishes two cases:

  • If b is an integer: The equilibrium distribution is uniform on [0,1] and the digits distribution is uniform on {0, 1, ... , b} for all but very rare bad seeds x such as rational numbers. This fact has been "known" for millennia (at least for b = 10) and it is taken for granted; it can be proved rigorously by solving the stochastic integral equation P( <  y) = P(g(X)  <  y) attached to this system. However, in general, such systems do not produce a uniform distribution for the digits: this is an exception to the examples discussed in this article. The auto-correlation between x(n+1) and x(n) is equal to 1 / b, while the auto-correlation between a(n+1) and a(n) is equal to 0. So the h function actually decorrelates the sequence {x(n)}, to use the vocabulary of time series theory.
  • If b is not an integer: The equilibrium distribution is NOT uniform on [0,1]. The auto-correlation between x(n+1) and x(n) is not 0, but much closer to 0 than in the above case. This time, the auto-correlation between a(n+1) and a(n) is NOT equal to 0. The digits take values in {0, 1, ... , INT(b)}. I did not test if the digits distribution was uniform on that domain, but my gut feelings tells me that this is not the case.

Computation details for a customizable base b, are available in this spreadsheet. Rather than computing auto-correlations based on a single seed (that is, one instance of the process) using large values of n, I computed them using a large number of seeds (all random numbers) computing the auto-correlations for a small, fixed n but  across many seeds. Due to the nature of the process (it is ergodic) both computations yield the same conclusions. The reason for using many seeds and a small n is because large n (above 20) lead to total loss of numerical accuracy. 

Nested Square Roots

This is a particular case of a far more general model described in section 3 in the following article. It is characterized byThis system is related to continued fractions, except that a(n) can only be equal to 0, 1, or 2, as opposed to taking any arbitrary positive integer value. The seed x must be in [1, 2] so that all x(n)'s are also in [1, 2]. If x = Pi, use x-2 instead. 

The equilibrium distribution is given by the following formula:

The digits distribution is given by the following formula:

Again, these results are easily derived from the stochastic integral equation. Note that the sequence {x(n)} exhibits strong auto-correlations in this case, though auto-correlations for {a(n)} are close to (but different from) 0.

This system has an infinite number of bad seeds, like the other systems discussed in this article. These are numbers x that do not have the prescribed digits distribution. Their set has measure 0, so they are very rare. An example is (1 + SQRT(5) ) / 2, with all the digits being equal to 1. Another one is SQRT(2), with a(1) = 0 and all other digits being equal to 2. Any number like SQRT(2) that at some point in its digit representation, only has 2's (the highest possible value for a digit in this system), causes the same problem as numbers in the decimal system that at some point, only have 9's (the highest possible value for a digit in the decimal system of base 10).

Finally, for the nested square root system, the formula x = f({a(n)} to reconstruct the seed x using its digits, must be computed backwards. The opposite is true for the decimal system. For more on nested square roots, click here and read entry #118 (the one that is most relevant to our problem.)

Logistic Map LM(p)

The logistic map has many applications, for instance in social sciences. Here we are interested in the logistic map of exponent p, with p = 1 corresponding to the standard, well known version. We also discuss the case p = 1/2, as these are two cases (possibly the only two cases) that provide a chaotic-inducing g covering all real numbers (one that meets the criteria outlined in section 1) with known equilibrium distribution that can be expressed with a simple closed form formula. The function h associated with this system was chosen arbitrarily, but this is also the simplest function h that one could come up with. I don't know yet if there is a simple formula for the function f.

This system is characterized by

The seed x must be in [0,1] so that all x(n)'s are also in [0.1]. If x = Pi, use x-3 or x/4 instead. A remarkable fact, for the case p = 1, is that the auto-correlations for {x(n)} are all equal to 0, making it, in some way, more random than the decimal system or other systems discussed in this article. For the case p = 1/2 the auto-correlation between x(n+1) and x(n) is equal to -1/2, it is equal to +1/4 between x(n+2) and x(n), and to -1/8 between x(n+3) and x(n), see proof here (check out section 3 after clicking on the link.) By contrast, it is equal to 1/2 for the decimal system in base 2.  

The equilibrium distributions, defined for y in [0, 1],  are equal to:

As usual, this applies to good seeds only (the vast majority of seeds). A proof of the formula for p = 1/2 can be found here. The integral for the case p = 1 can be explicitly and automatically computed using Wolfram Alpha, see here (I think the formula returned by this API is wrong though, as it involves an expression in one square root and its opposite in another square root: one of them has to be negative.) 

For the case p =1, the digits distribution is uniform on {0, 1}. The probability for a digit to be 0 is equal to P( X < 1/2) = 0.5, based on this computation. This is not true for the logistic map with p = 1/2.

3. About the Randomness of the Digits of Pi

The discussion here is not just about Pi, but about any number. The digits {a(n)} of a number x, in a specific system defined by g and h, are said to be random, if they have the proper digits distribution, as well as the proper auto-correlation structure associated with that system. In short, any finite set of digits of x must have the proper joint distribution that all but a tiny subset of numbers (of Lebesgue measure zero) have in that system. The target digits distribution can be computed from the equilibrium distribution, as discussed in the previous section. Numbers satisfying this criteria are called good seeds.

For instance, in the decimal system of base b, if b is an integer, the distribution in question is uniform, auto-correlation between successive digits are zero, and the digits are independently distributed with that same uniform distribution. If the base b is not an integer (say b = 3/2), the theoretical auto-correlation between a(n+1) and a(n) is clearly not zero, the digits distribution is clearly not uniform, and both can be computed exactly using the same mathematical arguments as in this article (see section 3.)     

In the literature, the randomness of the digits is defined as normalcy. For the traditional decimal system, the definitions (random digits, good seed, normal number) are similar though good seed is stronger in the sense that it takes the absence of auto-correlation into account.  Anyway, for Pi or SQRT(2), even proving that the digit 5 appears infinitely many times -- a very weak result indeed -- would be a phenomenal discovery.

The Digits of Pi are Random in the Logistic Map System

We show here why Pi is a good seed in the standard logistic map system, with its digits behaving exactly like those of pretty much all numbers in that system. The result is based on the fact that for the standard logistic map (corresponding to p = 1 in the previous section), an exact formula is available for {x(n)} and thus for {a(n)} as well, see here. It is given by

It implies that bad seeds (all bad seeds?) are numbers x that make w(x) a rational number, as these seeds don't follow the equilibrium distribution. Pi/4 is not one of them, so Pi/4 must be a good seed.

While this result about the randomness of the digits of Pi may seem spectacular at first glance, there are still big hurdles to overcome to formally prove it. In particular:

  1. Are the digits a(n) properly defined in the logistic map system? No function x = f({a(n)}) has been found yet, unlike in the decimal or nested square root systems. Can two different numbers have the same digits?
  2. Are the only bad seeds those that are very easy to spot in the exact formula for x(n)? If difficult to prove, we may be back to square one.
  3. Probably the biggest hurdle is to prove that w(Pi/4) is irrational. 

Note that for the logistic map system, the digits distribution (for good seeds, that is for pretty much all numbers) is uniform on {0, 1} like for the decimal system of base 2. This was proved in the previous section. Also, the a(n)'s are independent since the x(n)'s are. Thus the digits system produced by the logistic map seems sound. Interestingly, except for the first digit a(1), the numbers x and 1 - x always share the same digits representation. Finally, since x must be in [0, 1], here we use Pi/4 (say), rather than Pi. 

Paths to Proving Randomness in the Decimal System

The problem with the decimal system (in any base) is the opposite of the logistic map system. The function x = f({a(n)}) is known, but there is no explicit, closed-form formula for a(n). This makes things more complicated. Note however that there is a way to directly compute the n-th digit of Pi, a(n), in base 16, using the Bailey–Borwein–Plouffe formula.

To prove that Pi is a good seed in the decimal system, we could try one of the following approaches:

  • Find an approximation for x(n) or maybe an infinite series involving terms similar to w(x), making it easier to spot the bad seeds -- all of them. Today, the only bad seeds known are rational numbers and artificially manufactured numbers such as x = 0.101101110...
  • Study the problem in a base b that is not an integer. Try a sequence of bases that converge to (say) b =2 or b = 10. Or try sequences of bad seeds (rational numbers) that converge to Pi, and look at the convergence (or lack of) of the distributions attached to these bad seeds, to a uniform distribution on [0, 1]. See exercise 9 in the last section. Or better, consider a combination of sequence of bases converging to base 10 (say) and a sequence of bad seeds converging to Pi, simultaneously. 
  • Use a transformation of {x(n)} that more easily leads to a solution. In particular, find a status-preserving mapping v such that for any x, if v(x) is a good seed, then x is also a good seed. Maybe it could be easier to prove that v(Pi) -- rather than Pi -- is a good seed.
  • Find a mapping between the logistic map and the decimal system, for x(n). However, such mappings may be very complicated if they exist at all. For base 2, an interesting result (with mapping to the logistic map) is available: See exercise 7 in the last section.
  • The x(n)'s are auto-correlated in the decimal system, while they are not in the logistic map system. Could this be the source of the difficulties? Maybe finding a one-to-one mapping that decorrelates the x(n)'s could help. Note that in both systems, the digits a(n)'s are not auto-correlated.
  • The function g in the decimal system of base b is not continuous: g(x) is the fractional part of bx. To the contrary, it is continuous in the logistic map system. Yet, the fractional part function has a simple Fourier series (with terms consisting of sine and cosine functions)  that could possibly help. Its inverse also has a simple Fourier series, see here. In short, for the decimal system of base b, we have:

These are all very challenging problems.

Connection with Brownian Motions

The focus here is to try to further characterize bad seeds, to help answer questions about the status of Pi or other mathematical constants. We raise a few extra suggestions to solve this problem. 

The process {x(n)} is chaotic. In number theory, working with the cumulative process, in this case y(n) = x(n) + x(n-1) + ... + x(1), which is much smoother, sometimes makes things easier. We expect that z(n) = ( y(n) - n E(X) ) / SQRT(n) has a normal (Gaussian) distribution if x is a good seed (see here), since the x(n)'s all have the same distribution, and are barely correlated if x is a good seed, depending on g. The process {z(n)}, when properly scaled, is then a Brownian motion. Here E(X) represents the expectation of the equilibrium distribution; in particular, it is equal to 0.5 for the decimal system of base b (b integer) and for the logistic map. It is also easy to get its exact value for the other systems discussed in this article.

But what if x is a bad seed? For the decimal system (of any base, even non-integer ones) maybe the Fourier series for g(x), provided in the previous sub-section, could help with the computation of the distribution of z(n). Yet an irrational number such as x = 0.223388000099.. in base 10 (with duplicated decimals) might still provide a Gaussian distribution, despite being a bad seed. What happens if x is rational? Or if x only contains 0's and 1's? These are also bad seeds in the decimal system of base 10. What about applying the same methodology to the a(n)'s directly, which are known not to be correlated for the decimal system, if x is a good seed and the base is an integer?

4. Curious Facts

Here we discuss additional interesting topics related to the representation of numbers in various systems.

Randomness and The Bad Seeds Paradox

Bad seeds (or non-normal numbers) for the decimal system at least, are so rare (though there are infinitely many of them, including all rational numbers) that their set has a Lebesgue measure equal to zero. In other words, if you pick up a number at random, the chance that it is a bad seed is zero. Contrarily, the chance that all its digits are uniformly and independently distributed -- the feature of a good seed -- is 100 per cent! Yet there are as many bad seeds as good ones. Any number with duplicated digits, such as 0.339955666677..is a bad seed (its digits are highly auto-correlated). And there is a one-to-one mapping (bijection) between these numbers and all real numbers in [0, 1]. So maybe, the Lebesgue measure is not a useful metric in this context.

Application to Cryptography, Financial Markets, Blockchain, and HPC

If there is no known, fully reversible mapping between the decimal system in base 2 and the digits produced by the logistic map system -- and since there is no known f({a(n)}) available to reconstruct a seed x in the logistic map system -- one can build an encryption system as follows:

The code to be encoded is the seed x. It must be a good seed for the logistic map. The immense majority of numbers are, including simple numbers such as 3/10. 

  • The secret code is the digits of x in base 2.
  • The public code is its sequence of digits in the logistic map system, easy to compute with high precision computing. But it is impossible, one would think, to reconstruct the secret code if you only know its digits in the logistic map system, assuming it has a enough digits.

This can be used for authentication, by checking whether the logistic map code entered by some users to sign-up on some platform, is actually a valid one.  It can also be used in Blockchain / bitcoin mining, to make sure that the bits of data sent by blockchain or bitcoin miners, are legit. Currently, hash functions are used instead, but the logistic map can play the same role. In short, sending data and getting it validated by incorporating a signature into it, is made extremely computer-intensive for security reasons, whether -- as of today -- the signature is a very peculiar kind of hash with many zeroes at the beginning (and very hard to produce by chance), or whether it consists of digits that match one of numerous  pre-specified valid seeds or some digit patterns in some of the number representation systems discussed here.

Another application is about transmitting secret messages. Your secret message is the seed x, encoded in some base b in the decimal system. What you transmit publicly, is the same message but encoded in a different base, say b'. As long as one of the two bases is an irrational number, you are safe. Reconstructing the original message is very easy if you know the bases used in the encryption scheme. So the bases b and b' could be the encryption keys -- possibly one of them being public, the other one private.

Yet another application is in finance. The process y(n) = x(1) + ... + x(n) simulates a generalized random walk, and can be used (when scaled properly) as a Brownian motion to model stock price fluctuations, when using good seeds, and also with some special types or bad seeds.  

Finally, the algorithm used here to compute the digits of Pi or any other number, in any system, is not practical. On most machines, x(n) becomes totally wrong in as little as 30 iterations due to round-off errors accumulating exponentially fast. This is discussed here. Nevertheless, it would be interesting to see how far you can go, with high precision computing (HPC), until the digits being computed are no longer correct. Indeed, the computation of a(n) for large n and good seeds, can be used to benchmark various libraries (in Python, Perl and so on) to check how well and how fast they perform, to produce correct values. For Pi, more than a trillion digits are known, and we can use this number as a seed, in these benchmarking tests.

In this article, the focus was never on a single number like Pi, but on all numbers. The processes investigated here being all ergodic, it is easier to identify general statistical properties by focusing on a large collection of numbers, using small values of n and digits correctly computed, rather than on tons of digits computed for an handful of numbers, with pretty much all digits incorrectly computed due to round-off errors with standard computations. 

Digits of Pi in Base Pi

Finally, just for fun, the first 31 digits of Pi in base Pi are (after skipping the integer part equal to 3):

Pi - 3 = [0, 1, 1, 0, 2, 1, 1, 1, 0, 0, 2, 0, 2, 2, 1, 1, 3, 0, 0, 0, 1, 0, 2, 0, 0, 0, 2, 0, 3, 0, 1, ...]

They provide the approximation (when re-written in base 10) Pi = 3.14159265358979. This is just a standard sequence satisfying the equilibrium distribution of its system, with no obvious patterns. In base Pi, regardless of the seed x (as long as it is a good seed, and pretty much all of them are), all digits are either 0, 1, 2, or 3, the digits distribution is not uniform, and we have auto-correlations. So it looks like Pi is normal (that is, a good seed) in base Pi, as its digits seem to follow the (strange) prescribed distribution that applies to all good seeds in that base.

Interestingly, in base b (b being an integer or not), x = 1 / (b -1 ) is always a bad seed, as all its decimal digits a(n) are all equal to 1. This is also true for base b = Pi. However if b is irrational, xb (in this case x = Pi) seems to be a good seed, despite 1 / (b - 1) being a bad one. For comparison purposes, here are the digits for a few other seeds, in base b = Pi:

  •  3/7 = [1, 1, 0, 0, 2, 2, 0, 3, 0, 0, 2, 1, 1, 1, 0, 0, 0, 3, 0, 0, 0, 0, 2, 0, 1, 2, 0, 1, 0, 2, 2, ...]
  • 1 / (Pi -1) = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
  • SQRT(2) - 1 = [1, 0, 2, 3, 0, 0, 1, 2, 1, 2, 1, 2, 0, 2, 2, 2, 2, 1, 1, 0, 1, 1, 2, 1, 0, 1, 2, 0, 2, 2, 0, ...]

For instance, in layman's terms, here is how the expansion of SQRT(2) looks like in base Pi:

It is a good exercise for statisticians (see exercise 3 below) to check, with a statistical test of hypotheses, whether the digits distribution for Pi and SQRT(2), are identical in base Pi. Also, while 1 / (Pi -1 ) is a very bad seed in base Pi, it is (most likely) a good seed in base 2 or 10, or pretty much in any other base (even non-integer ones.)

5. Exercises

[1] Randomly pick up 20 numbers b(1) ... b(30) in {0, 1, 2}. Compute 

Then compute the first 20 digits of x, denoted as a(1) ... a(20), in the nested square root system. Do we have a(1) = b(1), a(2) = b(2), and so on? Repeat this procedure with 10,000 numbers. Are the a(n)'s and b(n)'s always identical? For numbers with the last b(n)'s all equal to zero, you may run into problems.


[2] Create a summary table of all the number representation systems discussed here. For each system, include the parameters (base b for decimal system), sub-cases (b integer versus not integer), the equilibrium and digits distribution as well as their domain, the functions g(x), h(x), and f({a(n)}), constraints on the seed (for instance x must be in [0, 1]) and typical examples of bad seeds. 


[3] Compute the digits of Pi, 3/7, and SQRT(2) in base Pi, and their empirical distribution. Compute at least 30 correct digits (for larger n, you may need to use high precision computing.) Use a statistical test to asses whether these numbers have the same digits distribution, and the same auto-correlation structure, in base Pi. The exact distribution, assuming that these numbers are good seeds in the base Pi system, can be derived using the stochastic integral equation P(X  <  y) = P(g(X)  <  y) to get the equilibrium distribution for x(n), and then derive the digits distribution for a(n). You can do the same for base 10. Base 10 is actually easier since the theoretical distributions are known to be uniform, and there is no auto-correlation (they are equal to zero) and no need for high precision computing if you use tables of pre-computed digits, instead of the iterative algorithm mentioned in this article. Yet, even in base 10, you can still test whether the auto-correlations are zero, to confirm the theoretical result.


[4] This is a difficult problem, for mathematicians or computer scientists. Study the system defined by

Part I.  Which values of c and x, with x in [0, 1], yield a chaotic system? If c is big enough, the system seems chaotic, otherwise it is not. What is the threshold (for c) to induce full chaos for most seeds? Show that auto-correlations between a(n+1) and a(n) are high for good seeds, even for c = 4, but are decreasing as c is increasing.

Part II. Answer the following questions:

  • What is the distribution of the digits for this system, for a good seed, depending on c?
  • Are all the digits always in {0, 1, ... , INT(c)} ?
  • Can you find the equilibrium distribution, and the function x = f({a(n)}) ?

This is known as the iterated exponential map. Note that g(x) is sometimes denoted as c^x mod 1. See exercise 10.4.11 in the book Non Linear Dynamics and Chaos by Steven Strogatz (2nd Edition, 2015.)


[5] This problem is similar to the previous one, but a little less difficult. Study the system defined by

with the seed x in [0, 1]. Show that to get a chaotic system, the seed must be an irrational number. In that case, the digits can be any small or large positive integer (there is no upper bound), but most of the time, that is, for most n's, a(n) is a small integer. Indeed, what is the digits distribution? What about the equilibrium distribution? How is this system related to continued fractions? Are the x(n)'s auto-correlated?

Solution: The equilibrium distribution, solution of the stochastic integral equation, is given by 

You can check it out by plugging that distribution in the stochastic integral equation P(X  <  y) = P(g(X)  <  y). To find out how to discover such a solution using empirical (data science) methods, click here and read section 4.The digits distribution is known as the Gauss-Kuzmin distribution. Besides rational numbers, an example of bad seed is x = (1 + SQRT(5)) / 2, with all the digits equal to 1.


[6] For the same g and h as in the previous exercise, what is the function x = f({a(n)}) ? Can the sequence {a(n)} of digits be arbitrary? Is it possible to find a number x (of course, it would be a bad seed) such that a(n) = n for all n?

Solution: This system is actually the continued fractions system, thus we have

To answer the last question, try x = 0.697774657964008. The first digits (up to n = 11) are 1, 2, 3, ..., 11. Interestingly, if you allow the digits not to be integers, it opens the possibilities. For instance, x = f({a(n)}) with a(n) = 1/n, yields x = (Pi - 2) /2. While this is a well known result, it is not a valid representation in the standard continued fraction system (exercise: compute the digits of that number in the continued fraction system; obviously, they will all be integers.) It is like allowing a digit in the base 10 system to not be an integer between 1 and 9. 

[7] Equivalence between systems. This exercise is based on the exact formula available for the logistic map. If {(x(n)} is the sequence generated in base 2 with a given seed x, then the sequence {y(n)} with y(n) = sin^2(Pi x(n)), corresponds to the standard logistic map with seed y = sin^2(Pi x). See exercise 10.3.10 in the book Non Linear Dynamics and Chaos by Steven Strogatz (2nd Edition, 2015.) 

Conversely, if {(y(n)} is the sequence generated in the logistic map with a given seed y, then the sequence {x(n)} with x(n) equal to either arcsin(SQRT(y(n))) / Pi or 1 - arcsin(SQRT(y(n))) / Pi depending on whether the n-th digit of x = arcsin(SQRT(y)) / Pi is equal to 0 or 1 in base 2, corresponds to the base 2 system with seed x

These facts could possibly be used to prove that Pi-3 or Pi/4 is a good seed in base 2. See here for more.

[8] About the various distributions attached to bad seeds. If x is a bad seed, the digits distribution can be anything. In any base, show that there is an infinite number of bad seeds that do not even have any digits distribution.

Solution: In base 2, consider the following seed: the first digit is 0. Digits #2 to #4 are all 1's. Digits #5 to #8 are all 0's. Digits #9 to #16 are all 1's. Digits #17 to #32 are all  0's, and so on. 

[9] Sequences of bad seeds converging to a good seed, or the other way around. This is about the base 10 system. The bad seed x = 1/2 can be approximated by a sequence of (supposedly) good seeds, for instance mPi / (2mPi +1) as m tends to infinity, or by a sequence of bad seeds (m-1)/(2m) where m is a prime number. In the latter case, the exact distribution of the x(n)'s is known, for each m, since we are dealing with rational numbers. As m tends to infinity, does the distributions in question converge to the distribution associated with the limit x = 1/2? Likewise, we can use the m first factors of the expansion of Pi/4 as an infinite product, as successive approximations to Pi. All these approximations are rational numbers, and thus bad seeds. Does the (known) distributions of {x(n)} attached to these rational numbers (bad seeds) converge to the distribution attached to the limit Pi, that is, to a uniform distribution on [0, 1], as m tends to infinity? 

For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on LinkedIn.

DSC Resources

Popular Articles

Views: 11376


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent Granville on February 7, 2018 at 2:28pm

I just added an application to Blockchain in section  4, and two new exercises (#4 and #5) about other interesting number representation systems..

Comment by Vincent Granville on February 7, 2018 at 6:59am

Thank you everyone  for the nice comments. I greatly enjoyed doing this research, and it is definitely a topic that I am passionate about. I have no doubts that more is coming.

Comment by Suresh Babu on February 7, 2018 at 5:54am

Thank you, Vincent!  Very nice article.  The Pi is endlessly fascinating (pun intended).

Comment by Peter Vanroose on February 7, 2018 at 4:13am

Are the digits of square root 2 random (or chaotic)?

(For the answer to this question: read the article; just wanted to point out that pi is not too special in this respect...)

Comment by Kenneth C Black on February 6, 2018 at 6:46pm


I wish I could write articles like that. Very nicely done. You have given us something to ponder for a long time to come. 

Interestingly, earlier today I was telling my six-year-old son about "Rain Man" and computational savants. He wanted to see what I meant, so I found a youtube video of Daniel Tammet (Daniel Tammet and Pi) from the David Letterman show from Pi Day many years ago. At that time, Daniel had recited over 22,000 digits of PI from memory. He said it was like running a marathon in his brain.

I have a feeling that reading and studying this article is going to be like running a marathon in my brain.

Finally, I've been blogging about the roles of Tableau and Alteryx in my data science job for about 5 years now. One of the benefits of doing this is to remember how to do things since this field is rapidly changing.

A week or two ago, I decided to do some additional benchmarking of Tableau using files I had put together in 2013. When I opened the spreadsheet, I found a couple of PI estimation methods stuck in my benchmark examples. I had completely forgotten about those but they are shown in Figure 3 of this article.

Apparently, I was going to have Tableau compute these formulas and benchmark how long they took but I never got around to doing it. Life must have gotten in the way I suppose.

Thank you very much for this masterpiece. 


Comment by Claude Cundiff on February 6, 2018 at 1:36pm

I remember being really surprised when I looked at the md2 implementation and finding the use of pi. This is a great post!

MD2 Wiki

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service