This may lead to a practical, feasible solution, to factor the product of two very large prime numbers (with thousands of digits), making many security systems vulnerable to new types of attacks: factoring encryption keys, en masse.
Let's start with the pictures. The patterns (as well as how to leverage them) are explained below.
Pattern associated with 37,200,034,517 = 45,853 * 811,289 (product of two primes)
Pattern associated with 651,313,413,679 = 501,077 * 1,299,827 (product of two primes)
The fact that products of two large primes exhibit such strong patterns, make you think that there must be a simple algorithm to extract the two factors, and thus compromising the encryption keys attached to it. Here we give some hints to find a solution to this (supposedly) very difficult problem. Chances are that such algorithms have been found before and are classified, used by the military exclusively. Here we want to publish solutions for the general public.
We've been interested in such problems for a long time, see for instance this article, or another one on time series with multiple periodicities (some exhibit patterns very similar to the two above charts), or an article about producing perfectly random numbers - digits of SQRT(2) - which is also a main ingredient in all encryption systems, as well as reverse-engineering / hacking of such systems.
How to factor a product of two large primes?
There's a lot of literature on the subject given the fact that it is related to national security, and widely presented as a very hard problem. Here we explore a specific strategy, hoping that it can lead to interesting results, or at least beautiful mathematics.
Let's say that n = p * q is a product of two large primes. The purpose is to try to find numbers m, r, s, such that m = r * s, with m very is close to n, and one of the factors r or s (say r), is close to either p or q (say p), where "being close" needs to be defined. Rather than a formal definition, let's provide two examples based on the two above charts:
The general idea is that it is far more easier to find r (or s) than p (or q) - because m can easily be partially factored, unlike n - and then a little work can help us find p once we have found r.
The general algorithm is as follows:
Note that the amount of computation, except maybe for the factors recombination, barely increases as n increases. The numbers d and k are pretty much the same for both n's investigated here, and indeed smaller than the value suggested. I suggested d = 1,000 and k = 1,500 (both are a function of n) to be conservative. If we choose a too small value, we may fail to discover the right m leading to the factorization.
Finally, our tests involve only two numbers n, but these numbers were picked out randomly: they might be either easier to handle, or to the contrary more challenging, than your average product of two large primes. More tests need to be carried out. Note that the largest n = 651,313,413,679 = 501,077 * 1,299,827 that we analyzed, was based on a prime factor q = 1,299,827, which is the largest prime number in our list of top 100,000 prime numbers used to produce this article. In short, 1,299,827 is the 100,000-th prime number.
So, what's the pattern in the above charts?
The first chart represents Modulo(37,200,034,517 | 45,869-t) on the y-axis. for various values of t between 1 and about 500 on the x-axis, showing that t=16 provides a strong minimum (zero) at 45,869 - 16 = 45,853 which is one of the two primes dividing 37,200,034,517. For the second chart, it's the same data, for the larger n = 651,313,413,679.
DSC Resources
Additional Reading
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge
Comment
There are many publications on this topic in the math/computing/physics journals, trying to solve this.
"Shor Algorithm"
Here is a comment from one of our readers:
Vincent, I was fascinated to see your pictures below. They look so much like coverage probability graphs for CIs for proportions – the simplest of which is the broken curve in the cover picture on my book (attached), though I’ve seen much more complex ones for larger sample sizes that look even more like the patterns below. It could be interesting to explore why they’re so similar!
Best wishes.
Robert G. Newcombe PhD CStat FFPH HonMRCR
Professor of Biostatistics
Cochrane Institute of Primary Care and Public Health
School of Medicine
Cardiff University
Factoring is actually a great problem to implement in a Map-Reduce environment, such as Hadoop.
© 2018 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
You need to be a member of Data Science Central to add comments!
Join Data Science Central