In my first article on this topic (see here) I introduced some of the complex stochastic processes used by Wall Street data scientists, using a simple approach that can be understood by people with no statistics background other than a first course such as stats 101. I defined and illustrated the continuous Brownian motion (the mother of all these stochastic processes) using approximations by discrete random walks, simply re-scaling the X-axis and the Y-axis appropriately, and making time increments (the X-axis) smaller and smaller, so that the limiting process is a time-continuous one. This was done without using any complicated mathematics such as measure theory or filtrations.
Here I am going one step further, introducing the integral and derivative of such processes, using rudimentary mathematics. All the articles that I've found on this subject are full of complicated equations and formulas. It is not the case here. Not only do I explain this material in simple English, but I also provide pictures to show how an Integrated Brownian motion looks like (I could not find such illustrations in the literature), how to compute its variance, and focus on applications, especially to number theory, Fintech and cryptography problems. Along the way, I discuss moving averages in a theoretical but basic framework (again with pictures), discussing what the optimal window should be for these (time-continuous or discrete) time series.
1. General Framework
As in my previous article, we define a time-continuous process as the limit of a time-discrete process. The time-discrete process is referred to as the base process. The fundamental example is as follows:
Start with discrete random variables U(k), k = 1, 2, and so on (the base process) that are independently and identically distributed, with mean equal to 0. Typically, the time series { U(k) } is a white noise.
The collection of random variables { Z(t) } defines the resulting, time-continuous, properly re-scaled stochastic process. In this case, Var[Z(t)] = t Var[U(1)]. Also Z(t) has a Gaussian distribution, by construction and by virtue of the central limit theorem. This process is known as a Brownian motion. The initial random variables U(k) could be Gaussian, or uniform on {-1, +1}, or uniform on [-1, +1]. It does not matter. See illustration here.
This process is continuous everywhere but nowhere differentiable. Thus the idea to build processes that are derived from { Z(t) }, but smoother (differentiable everywhere.) We introduce two such types of processes that meet this goal:
Finally, we also define the inverse operation of integration as differentiation. The differentiated process of S(t) is Z(t). In practice, the smoother (integrated or moving average) process is easier to study and sometimes displays patterns that can't be identified in the original process. More on this in the last section.
2. Integrated, Moving Average and Differential Process
Here we define three operations to transform a stochastic process into another one, hopefully more useful for interpretation and decision making, than the original process: Integration, differentiation, and moving average. In all three cases, the construction follows the same principle: In the construction of the Brownian motion described in the previous section, replace X(n) = U(1) + ... + U(n) by X(n) = V(1) + ... + V(n), where V(k) is described below for each transformation. We also discuss some of the challenges to make this methodology mathematically robust.
Challenges
The general construction procedure described above needs to be further studied from a theoretical point of view, for the following reasons:
Proper Re-scaling and Variance Computation
The second step in the general framework (see first section) needs to be adjusted to get things right, when constructing differential, integrated, or moving average processes. In short, you must keep the variance of X(n) not depending on n as n tends to infinity. Let us show you how it works for the integrated Brownian motion. In this case, for the integrated process, we have:
Thus,
So, the proper re-scaling factor for the Y-axis, as n tends to infinity, is in this case Y(n) = X(n) / SQRT(n^3 / 3). Using similar arguments, one can easily prove that
Also, S(t) has a Gaussian distribution for the same reasons as described in the first section. The same logic applies to compute Var[M(t)]. The details are left as an exercise. A more complicated exercise consists of computing the covariance between S(t) and S(t - s) for s > 0, and proving that {S(t) } is NOT a Brownian motion itself (being differentiable everywhere unlike Brownian motions.)
Figure 1: Brownian motion realization (blue) and its moving average (red)
In Figures 1 and 2, the X-axis represents the time axis, between 0 and 1. The Brownian motion is continuous everywhere while differentiable nowhere. To the contrary, the moving average and integrated processes are both continuous and differentiable everywhere.
Figure 2: Brownian motion realization (blue) and its integrated process (green)
3. Application to Number Theory
The purpose here is to study pairs of large prime numbers p and q with p < q, used to design cryptographic keys m = pq. Some of these primes exhibit strong patterns that make them unsuitable for use in highly secure cryptographic systems, making it less difficult to factor m. Factoring a product of two large primes each with hundreds of digits, is usually an intractable problem if p and q are carefully chosen. In other words, we are looking for primes p and q that are not as "random" (or to put it differently, less strongly prime) than your typical large prime numbers. Factoring m allows you to break the key in cryptographic systems, something you want to avoid precisely by carefully choosing p and q.
We discuss here one example of such bad pair, namely p = 1879 and q = 3803. We use the same construction technique as outlined in the first section, resulting in a process that looks exactly like a Brownian motion up to some value of t, then suddenly exhibits two big jolts very early on the time scale, in this particular example.
The base process { U(k) } (see first section) is defined by
The brackets in the formula represent the integer part function. The resulting process { Z(t) } created using the technique described in the first section, is a Brownian motion up to the first jolt occurring at k = SQRT(m); the second jolt occurs at k = q. Note that for small k's, { U(k) } is reminiscent of number representations in various systems, as described in this article. In other examples, the interesting jolt occurs at k = p (the smaller prime.) In most cases, no obvious patterns are found.
Figure 3: Brownian motion up to first jolt; second jolt used to break cryptographic key
Figure 3 is based on the first 25,000 observations. The base process { U(k) } does not display these jolts. They are only visible in the process { Z(t) } pictured in Figure 3. Detecting these jolts (the second one) is of particular interest since it allows you to factor m to retrieve the two large prime factors p and q. To do this, one solution consists of using sampled values of U(k) together with interpolation techniques. Note that as k becomes large (higher than 12,000) the U(k)'s become so heavily auto-correlated that the process { Z(t) } is no longer a Brownian motion. In some ways, it could be used as a better model to simulate stock market prices, than standard Brownian motions.
For testing purposes, tables of large prime numbers are easy to find on the Internet, for instance here. Details about the computations are found in in this spreadsheet.
For related articles by the same author, click here or visit www.VincentGranville.com. Follow me on LinkedIn.
DSC Resources
Comment
"t" is said to be continuous, but I think "t" is rational not real (I illustrated this in comment section of Part 1). The problem is that Y(t' / n, n) = X(t') is undefined for a real "t = t' / n".
Hi Marco, thanks for pointing out this mistake. I corrected it, you are right.
Great article, but when you write
"Standardize X(n) so that its variance does not depend on n, that is, introduce Y(n) = X(n) / SQRT(Var[U(1)]). This step consists of re-scaling the Y-axis."
should it not read Y(n) = X(n) / SQRT(n) ?
This way
Var(Y(n)) = Var(X(n) / SQRT(n)) = 1/n Var(X(n)) = Var(U(1)).
Stochastic process, indeed, is used a lot in Finance, especially in derivative pricing. Well-known Black-Scholes (and Merton) model for European Call option assumes stock prices follow Geometric BrownianMotion.
Just to name a few:
Interest rates:
Equities/FX:
Credit:
Best regards,
Zhongmin
© 2019 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Technical
Non Technical
Articles from top bloggers
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central