.

Back to the basics. Here we are dealing with the oldest data set, created billions of years ago -- the set of integers -- and mostly the set consisting of two numbers: 0 and 1. All of us have learned how to write numbers even before attending primary school. Yet, it is attached to the most challenging unsolved mathematical problems of all times, such as the distribution of the digits of Pi in the decimal system. The table below reflects this contrast, being a blend of rudimentary and deep results. It is a reference for statisticians, number theorists, data scientists, and computer scientists, with a focus on probabilistic results. You will not find it in any textbook. Some of the systems described here (logistic map of exponent *p* = 1/2, nested square roots, auto-correlations in continued fractions) are research results published here for the first time.

This material is described in this article, including how to derive all the results, and the equivalence between the base-2 system and the standard logistic map (with *p* = 1) mentioned in exercise 7. It includes applications to cryptography, random number generation, high performance computing (HPC), population growth modeling, financial markets, BlockChain, and more.

A higher resolution of the table below is available here.

Many other systems are not described in this table, including:

- The iterated exponential map, possibly one of the most mysterious chaotic systems. See exercise 4 here for details.
- The nested cubic root, a generalization of the nested square root.
- The generalized continued fraction of power
*p*, especially*p*= 2, defined by

**Other interesting facts**:

For the standard logistic map (*p* = 1) a closed-form formula is available for *x*(*n*):

Also, if *x*(*n*) is the base-2 sequence for *x*, then sin^2(Pi *x*(*n*)) is the logistic map sequence for sin^2(Pi *x*).

In base *b*, if *b* is an integer, the exact formulas are as follows:

Note that *x*(1) = *x*, since x is between 0 and 1.

Thus, proving that *x* is a good seed in base *b* (*b* integer) amounts to proving that the sequence { *b*^*n* *x* } is equidistributed modulo 1 (see also here for a curious fact about *x* = Pi.)The case *x* = SQRT(2) in base-2 is studied here. It was proved that for this particular case, the frequencies of 0's and 1's in the base-2 representation, are both equal to 50%. See here. I am working on a more simple proof, using the simple recursion described here. A special formula for the digits of Pi in base-16 is also available, see here. See also this link.

Not all number representation systems are as well-behaved as those describe here. For examples of more complex systems, see here. Also, even in well-behaved systems, some numbers (a subset of bad seeds) do not have any kind of statistical distribution for their digits. An example in base-2 is as follows: the first digit is 1, the next 2 digits are 0, the next 4 digits are 1, the next 8 digits are 0, the next 16 digits are 1, the next 32 digits are 0, and so on.

**Reverse-engineering number representation systems**

Some systems, like the Logistic Map 1, or base-*b* where *b* is an integer, have an exact formula to compute *x*(*n*) for any *n* without having to compute *x*(*n*-1), *x*(*n*-2) and so on. Is it possible, in these systems, given a particular *n* and *x*(*n*), to retrieve the seed *x* (and thus all its digits) by solving the equation in question?

We tried in base-2 for *x* = SQRT(2) / 2 and *n* = 10, and the answer is positive (at least for a good seed *x*, which is the case here.) In this case, for a given *n*, one has to solve the equation

where *x* is the unknown, *b* = 2, and both *n* and *x*(*n*) are known. A dichotomic search algorithm provides a solution very efficiently, since we know that *x* is in [0, 1]. The solution is not unique though. This could be an issue for NSA and other organizations developing strong encryption systems. Or, to the contrary, it could help them in their reverse-engineering activities. Or better, it could help them explore systems with a natural, built-in backdoor, without being accused of manufacturing man-made, mandatory, artificial backdoor, in existing encryption schemes.

Imagine an encryption system where *x* is the message, and *x*(*n*) is the encrypted message, using (say) the Logistic map 1 with *n* = 500. No one is ever going to figure out that you can actually retrieve *x* from *x*(*n*). And computing *x*(*n*) itself is not easy, requiring industrial-strength high precision computing. Retrieving *x* from *x*(*n*) is even more difficult, despite the simple equation providing the solution: It is all about industrial strength HPC. But the fear of reversibility (successful attacks on cryptographic keys) has led to the development of quantum algorithms and quantum computers.

**Related articles**

*For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on LinkedIn.*

**DSC Resources**

- Services: Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Contributors: Post a Blog | Ask a Question
- Follow us: @DataScienceCtrl | @AnalyticBridge

Popular Articles

Comment

- Comment by Vincent Granville on June 14, 2018 at 5:31am

- Dremio accelerates data lake operations with Dart Initiative
- Bundesliga delivering insight to fans via AWS
- How 5G will augment Wi-Fi in 3 industries
- AI, new skills and self-defense code emerge as app-dev musts
- Q&A: Inside data catalog vendor Alation's $110M in funding
- Cribl aims to ease data observability with LogStream update
- AI capabilities a target for merger and acquisition activity
- L'Oréal to 'revolutionize' beauty services using Google Vertex
- Evolution of analytics sped up by pandemic
- Is it legal to record virtual meetings and video conferences?

Posted 7 June 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central