# .

This may lead to a practical, feasible solution, to factor the product of two very large prime numbers (with thousands of digits), making many security systems vulnerable to new types of attacks: factoring encryption keys, en masse.

Let's start with the pictures. The patterns (as well as how to leverage them) are explained below. Pattern associated with 37,200,034,517 = 45,853 * 811,289 (product of two primes) Pattern associated with 651,313,413,679 = 501,077 * 1,299,827 (product of two primes)

The fact that products of two large primes exhibit such strong patterns, make you think that there must be a simple algorithm to extract the two factors, and thus compromising the encryption keys attached to it. Here we give some hints to find a solution to this (supposedly) very difficult problem. Chances are that such algorithms have been found before and are classified, used by the military exclusively. Here we want to publish solutions for the general public.

We've been interested in such problems for a long time, see for instance this article, or another one on time series with multiple periodicities (some exhibit patterns very similar to the two above charts), or  an article about producing perfectly random numbers - digits of SQRT(2) - which is also a main ingredient in all encryption systems, as well as reverse-engineering / hacking of such systems.

How to factor a product of two large primes?

There's a lot of literature on the subject given the fact that it is related to national security, and widely presented as a very hard problem. Here we explore a specific strategy, hoping that it can lead to interesting results, or at least beautiful mathematics.

Let's say that n = p * q is a product of two large primes. The purpose is to try to find numbers m, r, s, such that m = r * s, with m very is close to n, and one of the factors r or s (say r), is close to either p or q (say p), where "being close" needs to be defined. Rather than a formal definition, let's provide two examples based on the two above charts:

• n = p*q = 37,200,034,517 = 45,853 * 811,289, while m = r*s = 37,200,034,214 = 45,869 * 811,006. Note how m is close to n, and how r (highlighted in red) is close to q.
• n =  651,313,413,679 = p*q = 501,077 * 1,299,827, while m = r*s =  651,313,413,048 = r*s = 500,872 * 1,300,359. Note how m is close to n, and how r (highlighted in red) is close to q.

The general idea is that it is far more easier to find r (or s) than p (or q) - because m can easily be partially factored, unlike n -  and then a little work can help us find p once we have found r.

The general algorithm is as follows:

1. For each number n, n-1, n-2, ... , n-d (where d is much smaller than n; d < 1,000 if n = 651,313,413,679) compute the residues modulo 2, 3, 5, 7, ... , k (where k is a very small prime number compared to p or q, k < 1,500 if n = 651,313,413,679). This is accomplished very efficiently, it does not involve divisions, and does not involve storing a very big table of prime numbers. The naive solution to factoring  n = 651,313,413,679 requires checking out about 60,000 prime numbers to find a divisor. Here, in order to do our computations, we benefit from the fact that if modulo(y, z) = x>0, then modulo(y-1, z) = x-1, where y is any number m between n and n-d, and z any prime between 2, 3, ... , k. So essentially we just perform small subtractions, and this part of the algorithm is actually the easy one.
2. The partial factoring described in step 1 produces a table containing a few hundreds of potential candidates m when n = 651,313,413,679 (see spreadsheet for illustration purposes), with partial factorization m = u *v for each m. In some cases v is a rather large number that is either prime or not completely factored.
3. Focus on the m's where v is not too large, and not too small. These are very easy to detect, and it reduces the list of candidates m from a few hundreds, to a few dozens or less.
4. One or more of these m's will provide a factorization close to the desired m = r * s, which is in turn close to the solution n = p * q. In the case where n = 651,313,413,679, we get the nice m = u * v = 433,453 * 1,502,616 = (433,453 * 3) * (1,502,616 / 3) = 1,300,359 * 500,872 = r * s. In the case where n = 37,200,034,517, we get m = 501077 * 1299827 = r * s right away, without having to recombine small divisors.
5. The biggest challenge here is to check for each m (maybe for 10-20 of m's after filtering out undesirable m's, as described in step #3), all the u's that are big enough (eliminating the many obvious ones that are way too big), and find an r (derived from an u) close to p (|- r| < 300 say, though you don't know p) so that eventually, one of these r's, after testing a few of these u's, leads to p: a prime factor of n. This may look like a tedious computation, but most candidates for m or u can actually be quickly ruled out, because of some strong patterns found in this process, suggesting a very efficient approach to solve the problem. If you check cells G304: AR309 in the spreadsheet, you'll see that residues of 37,200,034,517 mod 45,869-t, (45,869 = r here) for small values of t, is a piece-wise 2nd-degree polynomial (2nd order differences are constant, equal to 318). This greatly helps find p (one of n's two prime factors), in this case p = 45,853: you don't need to test all t's.

Note that the amount of computation, except maybe for the factors recombination, barely increases as n increases. The numbers d and k are pretty much the same for both n's investigated here, and indeed smaller than the value suggested. I suggested d = 1,000 and k = 1,500 (both are a function of n) to be conservative. If we choose a too small value, we may fail to discover the right m leading to the factorization.

Finally, our tests involve only two numbers n, but these numbers were picked out randomly: they might be either easier to handle, or to the contrary more challenging, than your average product of two large primes.  More tests need to be carried out. Note that the largest n = 651,313,413,679 = 501,077 * 1,299,827 that we analyzed, was based on a prime factor q = 1,299,827, which is the largest prime number in our list of top 100,000 prime numbers used to produce this article. In short, 1,299,827 is the 100,000-th prime number.

So, what's the pattern in the above charts?

The first chart represents Modulo(37,200,034,517 | 45,869-t) on the y-axis. for various values of t between 1 and about 500 on the x-axis, showing that t=16 provides a strong minimum (zero)  at 45,869 - 16 = 45,853 which is one of the two primes dividing 37,200,034,517. For the second chart, it's the same data, for the larger n = 651,313,413,679.

DSC Resources

Views: 8442

Comment

Join Data Science Central Comment by Sione Palu on June 22, 2015 at 9:20am

There are many publications on this topic in  the math/computing/physics journals, trying to solve this.

"Shor Algorithm"

https://en.wikipedia.org/?title=Shor%27s_algorithm Comment by Vincent Granville on June 20, 2015 at 6:17am

Here is a comment from one of our readers:

Vincent, I was fascinated to see your pictures below. They look so much like coverage probability graphs for CIs for proportions – the simplest of which is the broken curve in the cover picture on my book (attached), though I’ve seen much more complex ones for larger sample sizes that look even more like the patterns below. It could be interesting to explore why they’re so similar!

Best wishes.

Robert G. Newcombe PhD CStat FFPH HonMRCR

Professor of Biostatistics

Cochrane Institute of Primary Care and Public Health

School of Medicine

Cardiff University Comment by Mirko Krivanek on June 19, 2015 at 8:06pm

Factoring is actually a great problem to implement in a Map-Reduce environment, such as Hadoop.