.

# Introduction to Number Theory: Fascinating Facts and Conjectures about Primes and Other Special Numbers

I discuss here off-the-beaten-path beautiful, even spectacular results from number theory: not just about prime numbers, but also about related problems such as integers that are sum of two squares. The connection between these numbers and prime numbers will appear later in this article. A few important unsolved mathematical conjectures are presented in a unified approach, and some new research material is also introduced, especially an attempt at generalizing and unifying concepts related to data set density and limiting distributions. The approach is very applied, focusing on algorithms, simulations, and big data, to help discover fascinating results. Even though some of the most exciting topics of mathematics are discussed here (including fundamental, century-old problems still unresolved as well as brand new hypotheses), most of the article can be understood by the layman. Among other things, you will learn some new ways to estimate Pi based on non-traditional experiments, or how a conjecture for prime numbers somehow generalizes to apply to Fibonacci numbers as well. This article can be used as an introductory tutorial on number theory, both in post-graduate courses, or in high school classes, to to get the student hooked on mathematics. Or it can be used by the professional mathematician interested in discovering new areas of research.

We start by discussing the concept of density - fundamental to this article - and we provide a rigorous definition (and possibly a new result), for measuring the density of an infinite subset of integers. Then we investigate forward difference sequences to address the issue arising from zero (singular) densities: in the case of prime numbers, it leads to beautiful, mysterious conjectures. Finally, we provide an algorithm that computes quantities related to densities, for a number of integer families, including prime numbers, and integers that are sum of two squares. The last section discusses potential areas for additional research, such as a probabilistic number theory, generating functions for composite numbers (possibly leading to a generating function for primes) as well as strong abnormalities in the continued fraction expansions for many constants, including for the special mathematical constants (Pi, K, e, and other transcendental numbers) mentioned in this article.

1. Asymptotic density: examples, definition

Intuitively, the asymptotic density (in short, density) of odd integers, among all positive integers, is exactly equal to 1/2 as 50% of all integers are odd if you consider the first n integers, for n increasingly large. Likewise, because the prime numbers (less or equal to n) are becoming increasingly rare as n grows, the density of this set (the prime numbers) is asymptotically equal to 0. One way to put it is to say that as n tends to infinity, w(n)/n tends to 0, where w(n) is the number of primes less or equal to n. Indeed, according to the prime number theorem, w(n) is asymptotically equivalent to n/log(n), which we write as w(n) = O(/ log(n)) using the O-notation. It means that as n tends to infinity, w(n) / (/ log(n)) tends to a constant c, in this case c=1. Here "log" stands for the natural logarithm of base e.

Other examples of interesting numbers are the sum of 1, 2, 3, or 4 square numbers. Square numbers are so rare among integers, far more rare than primes, that their density is 0. Numbers that are the sum of four squares are very abundant. Actually any integer can be written as the sum of four square integers, usually in several different ways. Thus the density of this set (integers that are sum of four squares) is exactly 1. For the sum of three squares, the density is exactly 5/6, as all positive integers except those of the form 4^k * (8m+7) can be expressed as the sum of three squares.

What about integers that are the sum of two squares? They are more abundant than prime numbers, yet rare enough that their density is still 0. More precisely, if x(n) is the number of integers that are sum of two squares, and less than n, we have x(n) = O(n / SQRT(log n)). In fact, as n tends to infinity, x(n) / (n / SQRT(log n)) tends to a constant c = 0.764... known as the Landau-Ramanujan Constant. Interestingly, if v(k) is the number of ways that an integer k can be expressed as the sum of two squares (allowing zeros and distinguishing signs and order), and if y(n) is the sum of all v(k) for = 1, ..., n, then y(n) is asymptotically equivalent to n. That is, y(n) = O(n), and more precisely, as n tends to infinity, y(n)/n tends to the constant c=Pi. This provides another way to estimate Pi.

An algorithm to produce x(n) and y(n) is provided in section 3, but convergence of y(n)/n is very slow: the algorithm needs to be enhanced. Interestingly, the same algorithm can be used to get all the composite numbers, and thus all the prime numbers. So in some sense, both prime numbers and numbers that are sum of two squares belong to a same family of integers - each one being a sub-type of the general family.

What about square-free integers? The density of this set is equal to 6/(Pi^2), providing yet an other original way to estimate Pi. And if we consider numbers that are the sum of two primes, rather than the sum of two squares, what density should we expect to get? Remember, primes are far more numerous than square numbers. It turns out that each even integer bigger than two can be written as the sum of two primes, and thus the density of numbers that are sum of two primes is exactly 1/2, However, no proof has been found yet, and this is known as one of the most famous mathematical conjectures of all times.

Sums of two squares is a particular case of Waring's problem. This is a topic with numerous well-known conjectures. For a bunch of conjectures about prime numbers, click here.

A new definition of asymptotic density

In many cases, we would like to compare two densities, even when they are both equal to zero: for instance, integers that are sum of two squares, versus primes. Let's introduce a function H and its set of real coefficients p, q, r, s etc. as follows. For n (large enough), let's define

H(n; p, q, r, s, ...) = n^p * (log n)^q * (log log n)^r * (log log log n)^s * ...

where 0 <= p <= 1, and the leading coefficient (the first coefficient among p, q, r, s ... not equal to zero) is strictly positive. For an infinite subset S of positive integers, let's define z(n; S) as the number of elements in S that are less than, or equal to n. Here we conjecture that for any such subset S, there exist coefficients p, q, r, s, ... satisfying the above conditions, such that

z(n; S) = O(H(npqrs, ...)).

That is, the density of the set S is characterized by the coefficients p, q, r, s, .... Moreover, the family of limiting functions n, log n, log log n, log log log n etc. is complete in the sense that no other limiting function is needed to compute any density, and at the same time, you need all of them: removing one will make some sets to have a singular density. This statement is still a conjecture, but it is true for all the useful sets that I have ever seen, including all the examples in this article.

Definition: Let W = W(pqrs, ...) = { g(p)/2 } + { g(q)/4 } + { g(r)/8 } + { g(s)/16 } + .., where g(t) = / (1+|t|). Define U>0 as the sum of all positive terms in W's formula, while V<=0 is the sum of all negative terms, so W = U + V. The bivariate vector (U, V) uniquely defines the density of the set S in question, and can be used to compare densities.

Exercise

Let's D(n) be the set of numbers that are divisible by 2 or 3 or 5 or ... or p(n) where p(n) is the n-th prime number, with p(1) = 2. What is the density of D(n)? If we denote as d(n) the density of D(n), how fast does d(n) converge to 1 as n tends to infinity? Can you find a recurrence formula for d(n)?

Solution: We have:

• d(n+1) = d(n) +{ ( 1 - 1/p(1) ) ... ( 1 - 1/p(n) ) } / p(n+1)
• ( d(n+1) - d(n) ) / ( d(n) - d(n-1) ) = ( p(n) - 1 ) / p(n+1)
• 1 - d(n) = ( 1 - 1/p(1) ) ( 1 - 1/p(2) ) ... ( 1 - 1/p(n) ).

These formula are derived from the inclusion-exclusion principle and the fact that being divisible by 2, 3, 5, ... , p(n) are independent events. It would not work if instead of these numbers, we had chosen (say) 2, 3, 5, 7, and 10. Using the Bonferroni inequalities, one can find nested lower and upper bounds for 1 - d(n).

2. Prime gap distribution, forward differences

In this section, two more conjectures about prime numbers will be discussed. A new result presented here is that one of these conjectures is not specific to prime numbers, and can easily be generalized.

The purpose of this section is to obtain a meaningful (non-singular) density measurement when the original set (be it the prime numbers, or numbers that are sum of two squares) has a zero (that is, singular) density. In order to achieve this goal, we look at a data set D(S) derived from the original set S. In particular, we look at the data set generated by forward differences.

For instance, the set S of squares 0, 1, 4, 9, 16, ... has density 0. In this case, D(S) -- defined in the next subsection -- is the set of odd integers 1, 3, 5, 7, 9, ... and has density 1/2 as discussed earlier.

Definitions

Forward differences are defined as follows. If S = { s(1), s(2), ... , s(n), ...} is the original set, then the forward difference set is D(S) = { d(1), d(2), ... , d(n), ... } with d(n) = s(n+1) - s(n).

We also define E(S) as the set of absolute forward differences, that is E(S) = { e(1), e(2), ... , e(n), ... } with e(n+1) = |s(n+1) - s(n)|. D(S) can be somewhat chaotic and can reveal interesting patterns about S, while E(S) is more well-behaved. Successive applications of the operator E are denoted as E^2(S) = E(E(S)), E^3(S) = E(E(E(S))) and so on, and likewise for the operator D. The sequence of operators D, D^2, D^3, ..., is sometimes used  to detect errors in a sequence of real numbers.

If the infinite sequence of positive integers s(n) is growing slowly enougth without sharp gaps between s(n+1) and s(n) (this assumption applies both to prime numbers and numbers that are sum of two squares) then one would expect that the first element in each successive absolute forward differences E(S), E^2(S), E^3(S), ... is either 0 or 1. A famous conjecture by Gilbreath claims that for prime numbers, it is always equal to 1. For numbers that are sum of two squares, it oscillates between 0 and 1, in a way that is not easy to predict. In the case of Fibonacci numbers, based on my investigation, it seems to be periodic with 0, 1, 1 repeating itself indefinitely in a loop. Based on this fact, I believe that Gilbreath's conjecture is not specific to, nor a unique characteristic, of prime numbers. Also, it makes me think that prime numbers are somewhat more well behaved than numbers that are sum of two squares, as the latter lack this periodicity property. The Gilbreath conjecture for prime numbers is illustrated in the table below:

2   3   5   7   11  13  17  19  23  29  31 ...
1   2   2   4   2   4   2   4   6   2 ...
1   0   2   2   2   2   2   2   4 ...
1   2   0   0   0   0   0   2 ...
1   2   0   0   0   0   2 ...
1   2   0   0   0   2 ...
1   2   0   0   2 ...

Finally, we define the gaps as the sequence of first differences. The remaining of this section focuses on this topic, and more specifically, on the study of the forward differences s(n+1) - s(n) for primes. Also, in all the cases discussed in this article, since we are dealing with sequences of increasing numbers, we have D(S) = E(S). This is true only for the first-order forward differences D, but not for D^2, D^3 and so on.

Prime gap distribution

The first-order forward differences for prime numbers - known as prime gaps - consists of potentially all the positive, even integers (if not all of them, at least the vast majority of them), together with the number 1. Thus its density is expected to be as high as 1/2. The sequence of prime gaps have many interesting properties, most of them still un-proved:

• The set of prime gaps is believed to be infinite, and these gaps are believed to cover all even integers (there is supposedly no limit on how large a prime gap can be, though they grow incredibly slowly)
• It is believed that infinitely many times, the gap is equal to 2, corresponding to twin primes: this is known as the famous twin prime conjecture. Note however that, unlike the series of inverse of primes that diverges, the series of inverse of twin primes converges.
• With still so many unknowns about the most basic properties of these gaps, no one knows even the most basic facts about the distribution of these gaps: what is the mean? what is the variance? Why are some values (for instance, multiples of 6) over-represented?

For integers that are sum of two squares, the first-order forward differences (or gaps) are of course even more concentrated on the left-hand side due to the higher (yet still singular) density of these numbers. That is, they tend to have smaller gaps than primes. It is not even known whether these gaps cover all integers, or whether there is an upper bound M such that these gaps are never larger than M. But numbers that are sum of two squares are somewhat easier to generate than primes using the algorithm in section 3, and thus studying the distribution of their gaps may be easier, requiring less computer processing time.

Note: If we denote as p(k) the k-th prime with p(1) =2, and G(k) = p(k+1) - p(k) is the k-th prime gap for k > 0, then it is very easy to prove that SUM{ G(k) / p(k+1) p(k) } = 1/2 (The sum is over all integers k >0.)

3. Prime-generating and other algorithms

Actually I describe just one algorithm here, but it is generic enough that it can generate not just primes, but also other similar data sets including, especially,  integers that are sum of two squares. Because it is generic, it is not especially efficient for prime number generation, but it is incredibly simple, and easy to implement in a distributed environment. It is probably the easiest algorithm to build

• a table of numbers that are sum of two squares: up to n = 25,000,000 in a few seconds on an old laptop; the computational complexity is O(n),
• or a more modest table of prime numbers; the computational complexity is O(n^{3/2}).

Algorithm to generate and count numbers that are some of two squares

Below is the algorithm to generate numbers that are some of two squares. It assumes that the arrays x[ ] and y[ ] are initialized to 0. When the two loops are completed, x[t] = 1 if t is the sum of two squares, and 0 otherwise, for 0 <= t <= n, while y[t] is the number of ways that t can be written as the sum of two squares (13 = 9 + 4 = 4 + 9 counts as one combination.)

\$n=25000000;
\$m = int(sqrt(\$n));

for(\$k=0; \$k<=\$m; \$k++) {
for(\$l=\$k; \$l<= \$m; \$l++) {
\$t=(\$k*\$k) + (\$l*\$l);
if (\$t<=\$n) {
\$x[\$t]=1;
\$y[\$t]++;
}
}
}

The above algorithm is described in a meta-language easy to understand. Actually, it is written in a real programming language. Do you recognize it? (solution: it is written in Perl, and tested.)

The general algorithm

The core of the above algorithm is the line that computes t = k^2 + l^2. This can be generalized as t = f(k, l) where f is an arbitrary integer-valued function. Besides the linear function f(k, l) = k + l which is of limited interest, the two most basic cases are

• f(kl) = k^2 + l^2, corresponding to numbers that are sum of two squares,
• f(k, l) = k * l, corresponding to prime numbers (actually to composite numbers if k and l are > 1, and thus, by complementarity, to prime numbers)

By now, the connection between primes and numbers that are sum of two squares should be obvious. Another shared property: the product of two integers that are sum of two squares, is an integer that is sum of two squares; the product of two composite numbers is a composite number.

The general algorithm takes the following form:

for \$k in A(\$n) {
for(\$l in B(\$n, \$k)) {
\$t=f(\$k, \$l);
if (0<= \$t<= \$n) {
\$x[\$t]=1;
\$y[\$t]++;
}
}
}

Here A(\$n) and B(\$n, \$k) are two sets of integers (usually intervals) selected to minimize the computations; the first one in the outer loop depends only on n, while the second one in the inner loop usually depends on both n and k (although inefficient algorithms might use  B(\$n, \$k)A(\$n) not depending on k.)

For prime number generation, you can use A(\$n) = {2, 3, ... , n/2} and B(\$n, \$k) = {2, 3, ... , m} with m = INT{SQRT(n)} though this may not be the optimum choice to minimize the number of computations..

4. Selected research topics

We provide here some highlights on topics related to what we have explored so far in this article, to broaden the horizon. This section will allow you to get familiar with, or at least be aware of other important subjects in number theory, not covered in the previous sections.

Prime number generating function

Finding a simple function that would generate all the primes (and just the primes only) has been a topic of much interest for a long time. A new approach with some potential consists of first finding a generating function for composite numbers, as primes and composites are complementary. A starting point could be to define

Phi(x) = SUM[ x^{u*v} ] where the double sum is over all integers u, v > 1.

We have Phi(x) = SUM[ x^{2k} / (1 - x^k) ] where the (simple) sum is over all integers k > 1. We also have

Phi(x) = SUM[ P(k) * x^

where

• the sum is over all integers  k > 1,
• P(k) is the number of ways in which k can be factored as a product of two positive integers, none of the factors being equal to 1. The order matters: 2 * 12 and 12 * 2 counts as two instances.

Thus, P(k) = 0 if and only if k is a prime. The general formula for P(k) is not easy to obtain. If k has (say) 3 prime factors p, q, r, say k = p^a q^b r^c, then P(k) depends only on {a, b , c}, and any permutation of the exponents a, b, c yields the same count. For instance P(p^a q^b r^c) = P(p^c q^a r^b) = P(2^a 3^b 5^c) = P(2^c 3^a 5^b). So P(p^a q^b r^c) = P({a, b, c}) is a permutation-invariant function that depends only on {a, b, c} regardless of ordering. Recurrence formulas for P(k) are easy to establish, based on the exponents a, b, c in the prime factorization of k. All this generalizes easily if k has 1, 2, 3, 4, or any number of prime factors.

This type of problems belongs to the branch of mathematics called combinatorics, and more specifically, multiplicative partitions. It is analogous to our former problem of finding all the combinations in which a number can be written as the sum of two squares.

Oddities in continued fractions

Continued fractions can be used to represent numbers, and provide excellent rational approximations - the best ones in some sense - to non-rational numbers. This is actually a subject of its own in number theory.

By looking at the continued fraction expansions of some special mathematical constants below (and number theory is full of interesting constants) you will notice that the number 1 is over-represented in all cases. Also large numbers pop up now and then in all the expansions below. I would have expected these expansions to look much more random. Can you explain this erratic behavior? Do these numbers, in each expansion below (except in he first one), follow a same, known distribution?

Here are six interesting continued fraction expansions, for special transcendental mathematical constants:

• Number e: 2, 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, 1, 12, 1, 1, 14, 1, 1, 16, 1, 1, 18, 1, 1, 20, 1, 1, 22, 1, 1, 24, 1, 1, 26, 1, 1, 28, 1, 1, 30, 1, 1, 32, 1, 1, 34, 1, 1, 36, 1, 1, 38, 1, 1, 40, 1, 1, 42, 1, 1, 44, 1, 1, 46, 1, 1, 48, 1, 1, 50, 1, 1, 52, 1, 1, 54, 1, 1, 56, 1, 1, 58, 1, 1, 60, 1, 1, 62, 1, 1, 64, 1, 1, 66...
• Number π: 3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3, 13, 1, 4, 2, 6, 6, 99, 1, 2, 2, 6, 3, 5, 1, 1, 6, 8, 1, 7, 1, 2, 3, 7, 1, 2, 1, 1, 12, 1, 1, 1, 3, 1, 1, 8, 1, 1, 2, 1, 6, 1, 1, 5, 2, 2, 3, 1, 2, 4, 4, 16, 1, 161, 45, 1, 22, 1, 2, 2, 1, 4, 1, 2, 24, 1, 2, 1, 3, 1, 2, 1...
• Landau-Ramanujan constant (discussed in section 1 of this article): 0, 1, 3, 4, 6, 1, 15, 1, 2, 2, 3, 1, 23, 3, 1, 1, 3, 1, 1, 6, 4, 1, 1, 2, 4, 39, 1, 1, 3, 1, 2, 1, 3, 181, 2, 1, 103, 2, 1, 5, 1, 1, 1, 1, 6, 1, 7, 5, 1, 2, 1, 2, 2, 2, 2, 1, 21, 8, 1, 9, 2, 2, 4, 2, 6, 1, 4, 1, 1, 1, 1, 15, 1, 1, 10, 1, 7, 2, 1, 1, 2, 4, 2, 4, 1, 3, 1, 4, 1, 1, 23, 1, 1, 48, 1, 38, 1, 1, 1,,,
• Euler's constant: 0, 1, 1, 2, 1, 2, 1, 4, 3, 13, 5, 1, 1, 8, 1, 2, 4, 1, 1, 40, 1, 11, 3, 7, 1, 7, 1, 1, 5, 1, 49, 4, 1, 65, 1, 4, 7, 11, 1, 399, 2, 1, 3, 2, 1, 2, 1, 5, 3, 2, 1, 10, 1, 1, 1, 1, 2, 1, 1, 3, 1, 4, 1, 1, 2, 5, 1, 3, 6, 2, 1, 2, 1, 1, 1, 2, 1, 3, 16, 8, 1, 1, 2, 16, 6, 1, 2, 2, 1, 7, 2, 1, 1, 1, 3, 1, 2, 1, 2...
• Catalan's constant: 0, 1, 10, 1, 8, 1, 88, 4, 1, 1, 7, 22, 1, 2, 3, 26, 1, 11, 1, 10, 1, 9, 3, 1, 1, 1, 1, 1, 1, 2, 2, 1, 11, 1, 1, 1, 6, 1, 12, 1, 4, 7, 1, 1, 2, 5, 1, 5, 9, 1, 1, 1, 1, 33, 4, 1, 1, 3, 5, 3, 2, 1, 2, 1, 2, 1, 7, 6, 3, 1, 3, 3, 1, 1, 2, 1, 14, 1, 4, 4, 1, 2, 4, 1, 17, 4, 1, 14, 1, 1, 1, 12, 1...
• Twin prime constant (discussed in section 4 of this article): 0, 1, 1, 1, 16, 2, 2, 2, 2, 1, 18, 2, 2, 11, 1, 1, 2, 4, 1, 16, 3, 2, 4, 21, 2, 405, 2, 1, 33, 1, 2, 8, 2, 29, 1, 4, 4, 4, 4, 1, 9, 3, 1, 4, 1, 1, 2, 26, 1, 8, 2, 6, 1, 4, 1, 3, 9, 46, 1, 6, 1, 1, 4, 2, 1, 12, 1, 1, 7, 35, 1, 1, 2, 1, 4, 1199, 2, 3, 1, 2, 3, 3, 13, 15, 4, 1, 1, 1, 10, 9, 6, 3, 1, 3, 1

Note that the standard continued fraction expansion for e exhibits a simple pattern, see the first expansion in the above list. This is a well known, proven fact. It is also possible to generate a (non-standard) continued fraction expansion for Pi that has a similar pattern. Pi has many other interesting formulas besides its continued fraction expansions. And for the number SQRT(2)/2, I've found a curious formula that sequentially generates all its binary digits (bits) in base 2, one at a time. I used it to design a legal, lawsuit-proof lottery system, and to design high quality random number generators. Another interesting pattern is found in the product of two large primes, and this could help hackers crack strong encryption systems.

Another area of interest is to find whether some of the numerous mathematical constants are simple functions of each other. For instance, to this day, no one knows if e + Pi is a rational number or not

Probabilistic number theory

This is a branch of number theory that uses heuristic and probability theory to build conjectures. For instance, based on the prime number theorem, if you assume that the probability for a large number n being prime is 1/log n, you can easily build conjectures or compute special constants regarding the distribution of twin primes or about the prime gap distribution discussed in section 2 of this article.

An example is the Hardy-Littlewood conjecture: it claims that the number g(n) of twin primes less or equal to n is asymptotically equivalent to n / (log n)^2, more specifically g(n) / (n / (log n)^2 ) tends to a constant c = 0.660 known as the twin prime constant, as n tends to infinity.

In short, probabilistic number theory is trying to (heuristically) validate results (say) about prime numbers, by assuming the results are not specific to prime numbers, but instead to any set of numbers that behave similarly to prime numbers, from a probabilistic point of view.

Besides 'synthetic' prime numbers artificially manufactured to mimic properties of the standard prime numbers, to eventually quickly and easily build powerful conjectures or perform simulations, there are several (non probabilistic) generalizations of prime numbers, for instance primes in quadratic fields. Even the set S of numbers that are sum of two squares has its own prime numbers 2, 5, 9, 13, 17, 29, 37, 41, 49, 53, 61, 71, ... that can be used to uniquely generate all numbers in S. And yes, 9 and 49 are prime numbers in S. The reason is simple:  9 = 3 x 3 but 3 does not belong to S; 49 = 7 x 7 but 7 does not belong to S.

Vincent Granville develops state-of-the art techniques to automate data science in the context of machine-to-machine or device-to-device communications. These robust machine learning techniques are designed to be scalable, simple, yet efficient and accurate. The statistical engineering framework invented by Vincent, and known as deep data science, is essentially math-free, easy to explain to the layman, leading to easy interpretations, maintenance, and fine-tuning even by the non-initiated. By design, it works well on big data, and it has been used by search engines, fortune 500 companies, and large organisations, to process trillions of transactions (some in real time) and large data sets. Vincent is a also an expert in traditional statistics, with numerous publications in top statistical journals, as well as a self-taught number theory expert. Vincent published his first paper in Journal of Number Theory before starting his PhD studies in applied statistics. Some applications of Vincent's number theory discoveries include the design of a legal lottery system (not run by a state), based on a curious formula that sequentially generates all the (seemingly random) binary digits of the square root of 2.

Top DSC Resources