*Updated on August 12. See new section entitled "Future research on the topic."*

Here I propose an alternative to traditional cryptography. Traditional cryptography relies on hash functions (Bitcoin) or large prime numbers (RSA.) The foundations of this new technology are based on my book on numeration systems, available for free to DSC members. The key idea is to use numeration systems with non-integer bases, to represent numbers.

**1. Representation of numbers in a non-integer base**

The following also applies to integer bases such as base 2 (binary), base 10 (decimal) or base 16 (hexadecimal). However our interest is in bases that are real numbers between 1.5 and 1.9. In these bases, the digits -- just like in the binary base-2 system -- are always 0 or 1. Unlike the binary system though, the proportions of 0's and 1's are not equal to 50%, and there is some auto-correlation in the sequence of digits. I explain in the next section how to handle this problem. Another issue is that in these small bases, a digit carries less than one bit of information, thus making the message to be transmitted, longer than the binary code (base-2) that represents it. More precisely, the amount of information stored in one digit is equal lo log(*b*) / log(2) bit, where *b* is the base. This is why we want to avoid bases that are smaller than 1.5.

**Algorithm to compute the digits**

To compute the digits of a number *x* between 0 and 1, in base *b*, one proceeds as follows:

- Start with
*x*(1) =*x*and*a*(1) = INT(*bx*). - Iteratively compute
*x*(*n*) =*b***x*(*n*-1) - INT(*b***x*(*n*-1)), and*a*(*n*) = INT(*b***x*(*n*)).

Here INT represents the integer function, also called floor function. The above algorithm is just a version of the greedy algorithm. In all cases, *x*(*n*) is a real number between 0 and 1, and *a*(*n*) is the *n*-th digit of *x*, in base *b*.

Once the digits in base *b* are known, it is easy to retrieve the number *x*, using the formula

Typically, you need to use high performance computing if you want to compute more than 45 digits or so, due to limitations in machine precision. How to do it is described in chapter 8, in my book. It is not a challenging problem if you only need a few hundred, maybe 2,000 digits, which is the case in practice. Note that RSA, which relies on large prime numbers with hundreds of digits, also requires high precision computing.

Finally, note that two different numbers have two different representations in the same base, an advantage over hash functions (subject to collisions.)

**2. Application to cryptography**

You have a message *x*, to be transmitted, for example the binary (base-2) representation of your original, text message. You encode *x* in a base *b*, with *b* chosen between 1.5 and 1.9. The base *b* can even be a transcendental number.

Your encoded message simply consists of the digits of *x* in base *b*, and these digits can be shared publicly. What is kept secret is the base *b*, so that if an attacker knows the digits, he still can't retrieve the message as he does not know the base. The base is the equivalent of a key in standard cryptographic systems.

If you know both the digits and the base, you can easily reconstruct the original message, though it will require a bit of high precision computing. As always, you split the original message (if it is too long) in blocks of (say) 512 bytes, and encode each block separately.

**Challenges**

As in all cryptographic systems, there are some challenges to overcome, to make it more robust against attacks.

One of the challenges here is not related to security, but about how many digits (in base *b*) you need to use to be able to reconstruct the full message. Based on entropy theory, it is likely that using twice as many digits than in the original base-2 version of the message, should be enough, if *b* is larger than 1.5. However, this has to be tested. Keep in mind that we encode small blocks one at a time, each block having up to 2,000 base-2 digits, though smaller blocks offer some advantages.

The other challenges are about the security of the system. Can an attacker try a large number of test bases to guess which base is used? Bases that are very close to each other will produce identical digits, at least for the first few digits. It might then be possible to use a dichotomic search to identify the secret base. Another issue is the distribution of 0's and 1's, as well as the auto-correlation structure of the digits, which allow you to identify the secret base, by performing some statistical analysis. In the next subsection, we address this issue.

**Improving security**

As discussed earlier, using a single base *b* produces a weak cryptographic system. Yet it still requires advanced statistical knowledge -- not tools available on the dark net -- to break it. And being a new system, it would probably take years before someone can do a successful attack. Indeed, agencies such as NSA might like it, because it appears at first glance to be safe enough to use in commercial applications, yet it gives the government a natural back door to decode messages.

The first idea that comes to mind to make this system stronger, is to use a mapping of *x*(*n*) to scramble the distribution of 0's and 1's: for instance, using *x'*(*n*) = SQRT(1 - *x*(*n*)), instead of *x*(*n*). You must also scramble the auto-correlation structure of the digits. The legit recipient of the message would have to know the base used for encoding, as well as the scrambling mechanism, to retrieve the original message. However it might be next to impossible to retrieve the original message after scrambling, even if you know all the scrambling parameters.

A much easier solution consists of using multiple bases, say two bases *b* and *b'*. Digits in even position come from base *b*, while digits in odd position come from base *b'*. The legit recipient must know both *b* and *b'* to decode the message. More sophisticated versions of this trick can be implemented, to increase security. For instance, if a digit is equal to 0, the next digit is in the alternate base, otherwise it is in the same base as the current digit: so digits from both bases are interlaced in a more complex way.

**Further research on the topic**

A new approach consists of scrambling the distribution of *x*(*n*) as discussed in the previous sub-section, and making *x*(*n*) with (say) *n* = 3, the public encoded message. Note that *x*(*n*) is a real number -- not an integer -- so in practice it needs to be truncated, but kept long enough to be able to reconstruct the full original message.

If you know

- the base
*b*(private), *x*(*n*) in that base after the scrambling (public),- the number
*n*(private), - and the mapping used for scrambling (private),

then you can retrieve *x*(1) which is equal to *x*, using a dichotomic search. In this case, the base *b*, the number *n*, and the mapping function, constitutes the key used for decryption. The larger *n*, the more difficult it is to retrieve *x* for the legit recipient.

**Final note**

Readers interested in this article can write and submit a patent, about this technology. This content is offered as open intellectual property, and can be used in any application, commercial or not. Statisticians could be interested in doing simulations to see how easy or difficult it is to break this system, especially by analyzing the digit distribution to identify secret bases.

*For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on on LinkedIn.*

**DSC Resources**

- Free Book: Applied Stochastic Processes
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Post a Blog | Forum Questions

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central