.

I introduce here a family of very peculiar statistical distributions governed by two parameters: *p*, a real number in [0, 1], and *b*, an integer > 1. These distributions were discovered by solving the following functional equation, corresponding to *b* = 2.

Here *f*(*x*) is the density attached to that distribution. The support domain for *x* is also [0, 1]. This type of distribution appears in the following context.

Let *Z* be an irrational number in [0, 1] (called *seed*) and consider the sequence *x*(*n*) = {*b*^*n* *Z*}. Here the brackets represent the fractional part function. In particular, INT(*b* *x*(*n*)) is the *n*-th digit of *Z* in base *b*. The values *x*(*n*) are distributed in a certain way due to the *ergodicity* of the underlying process. The density associated with this distribution is the function *f*, and for the immense majority of seeds *Z*, that density is uniform on [0, 1]. Seeds producing the uniform density are sometimes called *normal* numbers; their digit distribution is also uniform.

However, the functional equation 2*f*(*x*) = *f*(*x*/2) + *f*((1+*x*)/2) may have plenty of other solutions. Such solutions are called *non-standard* solutions. The set of seeds producing non-standard solutions is known to have Lebesgue measure zero, but there are infinitely many such seeds. All rational seeds are, but they produce a discrete distribution. Thus their density is of the discrete type. We are interested here in a non-discrete solution.

**1. Example with p = 0.75 and b = 2**

The uniform distribution corresponds to *p* = 0.5. By uniform, I mean uniform on the set of all normal numbers in [0, 1]. This set has its Lebesgue measure equal to 1, but it is full of holes; in particular, no rational number is a normal number.

Below is a non-standard density satisfying the requirements. Actually, the plot below represents its percentile distribution. It was produced with a seed *Z* in [0,1] built as follows: the *n*-th binary digit of *Z* is 1 if Rand(*n*) < *p*, and 0 otherwise, using a pseudo random number generator. Here *p* = 0.75. Note that P.25 = 0.5 and corresponds to a dip in the chart below (P.25 denotes the 25-*th* percentile.) Dips are everywhere, only the big ones are visible. By contrast, the percentile distribution for the uniform (standard) case *p* = 0.5 is a straight line, with no dips.

**2. General solution**

The functional equation is a bit more complicated if *b* is not equal to 2. It becomes

Using the construction mechanism outlined in the previous section to generate a non-standard seed *Z* (sometimes called a non-normal number or *bad seed*), it is clear that *x*(*n*) is a random variable. We also havewhere *b* is the base and *d*(*n*+*k*) is the (*n*+*k*)-th digit of the seed *Z* in base *b*. This formula is very useful for computations. Note that *Z* = *x*(0). Furthermore, by construction, these digits are identically and independently distributed with a Bernouilli distribution of parameter *p*. Thus, using the convolution theorem, the characteristic function for the seed *Z* is

Take the derivative of the inverse Fourier transform (see section *inverse formula* here) and you obtain

If *p* = 0.5 and *b* = 2 we are back to the uniform case. Otherwise the solution is quite special: the density *f* is nowhere differentiable it seems. The support domain, though dense in [0, 1], has Lebesgue measure zero. Thus the characteristic function and density are non-standard, and would be considered improper in classical probability theory. See picture below for *p* = 0.55 and *b* = 2. See also here.

Now we should prove that this case is *ergodic*, for the functional equation to apply. I also tried to check with some sampled values of *x* to see whether 2*f*(*x*) = *f*(*x*/2) + *f*((1+*x*)/2), but the function being discontinuous everywhere, and since I got its value approximated probably to no more than two decimals, it is not easy.

**3. Applications, properties and data**

The distribution attached to this type of density has the following moments:

**Expectation**:*p*/ (*b*- 1).**Variance**:*p*(1 -*p*) / (*b*^2 - 1).

Why does *f*(*x*) must satisfy the functional equation discussed above? This a consequence of the fact that the underlying distribution is the equilibrium distribution for the sequence *x*(*n*) = {*b* *x*(*n*-1) } = {*b*^*n* *Z*}. In particular, the equilibrium distribution is solution to some stochastic integral equation P(*X* < *x*) = P({*b* *X*} < *x*). For details, see my book *Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems* available here, see pages 65-66.

Potential applications are found in cryptography, Fintech (stock market modeling), Bitcoin, number theory, random number generation, benchmarking statistical tests (see here) and even gaming (see here.) However, the most interesting application is probably to gain insights about how non-normal numbers look like, especially their chaotic nature. It is a fundamental tool to help solve one of the most intriguing mathematical conjectures of all times (yet unsolved): are the digits of standard constants such as Pi or SQRT(2) uniformly distributed or not? For instance, when *b* = 2, any departure from *p* = 0.5 (a normal seed) results in a strong discontinuity for *f*(*x*) at *x* = 0.5. If you look at the above chart, *f(*0) = *f(*1/2) = *f*(1) regardless of *p*, but discontinuities are masking this fact.

The charts featured here, as well as the underlying computations, were all produced in Excel. You can download the spreadsheet here. In particular, a very efficient algorithm is used to produce (say) one million digits of Z, and to compute one million successive values of *x*(*n*) each with a precision of 14 decimals. You can play interactively with the parameters *b* and *p* in the spreadsheet, and even try non-integer values of *b* (I suggest you try *b* = 1.5 and *p* = 0.5). If *b* < 2 is not an integer, the functional equation is more complicated: it is found in section 2.1 in this article.

**Note**

Another way to produce a well-behaved non-normal seed Z in base *b* = 2 is as follows. Let us denote as *d*(*n*, *Z*) the *n*-th binary digit of *Z*. Set *d*(*n*, *Z*) to max[ *d*(*n*, SQRT(2)/2), *d*(*n*, SQRT(3)/2) ]. Another non-normal seed is obtained as follows: *d*(*n*, *Z*) = max[ *d*(*n*, SQRT(2)/2), *d*(*n* + 1, SQRT(2)/2) ]. For both seeds, the theory remains applicable, and *p* = 3/4 (same case as the one featured in the first picture.) The reason for this is that if *Z* and *Z*' are two normal numbers linearly independent over the set of rational numbers, then their digits are distributed independently. Also the successive digits of *Z* or *Z*' behave as if they were independently distributed.

To produce 10,000 (or even millions) of digits of SQRT(2), you can use the Sagecell platform (here) with the command "N(sqrt(2),prec=10000).str(base=2)". Or you can download the first 32 millions binary digits here (5 MB in compressed format.)

- Using a business rules engine to streamline decision-making
- IBM boosts vertical cloud push with financial services cloud
- Exploring GRC automation benefits and challenges
- Check model accuracy with Facebook AI's new data set
- AR use cases gain ground due to COVID-19, maturing tech
- Air Force's data overhaul makes analytics a priority
- AI adoption in the supply chain requires a strategic approach
- New DataRobot CEO sees bright AI future for the vendor
- Why consider an augmented data catalog?
- Consider IoT TPM security to augment existing protection
- 11 Best Data Science Blogs to Follow

Posted 12 April 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central