Vincent Granville's Posts - Data Science Central2021-09-20T21:57:46ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranvillehttps://storage.ning.com/topology/rest/1.0/file/get/2800211702?profile=RESIZE_48X48&width=48&height=48&crop=1%3A1https://www.datasciencecentral.com/profiles/blog/feed?user=3v6n5b6g08kgn&xn_auth=noAre Data Scientists Becoming Obsolete?tag:www.datasciencecentral.com,2021-09-13:6448529:BlogPost:10677512021-09-13T03:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9561199101?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/9561199101?profile=RESIZE_710x" width="500"></img></a></p>
<p>This question is raised on occasion. Salaries are not increasing as fast as they used to, though this is natural for any discipline reaching some maturity. Some job seekers claim it is not that easy anymore to find a job as a data scientist. Some employers have complained about the costs associated with a data science team, and ROI expectations not…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9561199101?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9561199101?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>This question is raised on occasion. Salaries are not increasing as fast as they used to, though this is natural for any discipline reaching some maturity. Some job seekers claim it is not that easy anymore to find a job as a data scientist. Some employers have complained about the costs associated with a data science team, and ROI expectations not being met. And some employees, especially those with a PhD, complain that the job can be boring.</p>
<p>I believe there is some truth to all of this, but my opinion is more nuanced. Data scientist is a too generic keyword, and many times not even related to science. I myself, about 20 years ago, experienced some disillusion about my job title as a statistician. There were so many promising paths, but the statistical community, in part because of the major statistical associations and academic training back then, missed some big opportunities, focusing more and more on narrow areas such as epidemiology or census data, but failing to catch on serious programming (besides SAS and R) and algorithms. I was back then working on digital image processing, and I saw the field of statistics missing the machine learning opportunity and operations research in particular. I eventually called myself a computational statistician: that's what I was doing, and it was getting more and more different from what my peers were doing. I am sure by now, statistics curricula have caught up, and include more machine learning and programming.</p>
<p>More recently, I called myself data scientist, but today, I think it does not represent well what I do. Computational or algorithmic data scientist would be a much better description. And I think this applies to many data scientists. Some, focusing more on the data aspects, could call themselves data science engineers or data science architects. Some may find the word business data scientist more appropriate. Junior ones are probably better defined as analysts.</p>
<p>Some progress has been made in the last 5 years for sure. Applicants are better trained, hiring managers are more knowledgeable about the field and have more clear requirements, and applicants have a better idea as to whether an advertised position is as interesting as it sounds in the description. Indeed, many jobs are filled without even posting a job ad, by directly contacting potential candidates that the hiring manager is familiar with, even if by word-of-mouth only. While there is still no well-known, highly recognized professional association (with a large number of members) or well-known, comprehensive certification for data scientists as there is for actuaries (and I don't think it is needed), there are more clear paths to reaching excellence in the profession, both as a company or as an employee. A physicist familiar with data could easily succeed with little on-the-job practice. There are companies open to hiring people from various backgrounds, which broadens the possibilities. And given the numerous poorly solved problems (they pop up faster than they can properly be solved), the future looks bright. Examples include counting the actual number of people once infected by Covid (requiring imputation methods) which might be twice as high as official numbers, assessing the efficiency of various Covid vaccines versus natural immunization, better detection of fake reviews / recommendations or fake news, or optimizing driving directions from Google map by including more criteria in the algorithm and taking into account HOV lanes, air quality, rarity of gas stations, and peak commute times (more on this in my next article about my 3,000 miles road trip using Google navigation). </p>
<p><a href="https://en.wikipedia.org/wiki/Renaissance_Technologies" target="_blank" rel="noopener">Renaissance Technologies</a> is a good example: they have been working on quantitative trading since 1982, developing black-box strategies for high frequency trading, and mastering trading cost optimization. Many times, they had no idea and did not care why their automated self-learning trading system made some obscure trades (leveraging volatile patterns undetectable by humans or unused by competitors), yet it is by far the most successful hedge fund of all times, returning more than 66 percent annualized return (that is, per year, each year on average) for about 30 years. Yet they never hired traditional quants or data scientists, though some of their top executives came from IBM, with a background in computational linguistics. Many core employees had backgrounds in astronomy, physics, dynamical systems, and even pure number theory, but not in finance.</p>
<p>Incidentally, I have used many machine learning techniques and computational data science, processing huge volumes of multivariate data (numbers like integers or real numbers) with efficient algorithms, to try to pierce some of the deepest secrets in number theory. So I can easily imagine that a math background, especially one with strong experimental / probabilistic / computational number theory, where you routinely uncover and leverage hard-to-find patterns in an ocean of seemingly very noisy data behaving worse than many messy business data sets (indeed dealing with chaotic processes), would be helpful in quantitative finance, and certainly elsewhere like fraud detection or risk management. I came to call these chaotic environments as gentle or controlled chaos, because in the end, they are less chaotic than they appear to be at first glance. I am sure many people in the business world can relate to that.</p>
<p><span style="font-size: 14pt;"><strong>Conclusion</strong></span></p>
<p>The job title <em>data scientist</em> might not be a great title, as it means so many things to different people. Better job titles include data science engineer, algorithmic data scientist, mathematical data scientist, computational data scientist, business data scientist, or analyst, reflecting the various fields that data science covers. There are still many unsolved problems, the list growing faster than that of solved problems, so the future looks bright. Some such as spam detection, maybe even automated translation, have seen considerable progress. Employers and employees have become better at matching with each other, and pay scale may not increase much more. Some tasks may disappear in the future, such as data cleaning, replaced by robots. Even coding might be absent in some jobs, or partially automated. For instance, the Data Science Central article that you read now sits on a platform created in 2008 (by me, actually) without a single line of code. This will open more possibilities, as it frees a lot of time for the data scientist, to focus on higher level tasks.</p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>. A selection of the most recent ones can be found on <a href="https://www.vgranville.com/" target="_blank" rel="noopener">vgranville.com</a>. </em></span></p>Machine Learning Perspective on the Twin Prime Conjecturetag:www.datasciencecentral.com,2021-09-07:6448529:BlogPost:10653722021-09-07T15:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9525260066?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/9525260066?profile=RESIZE_710x" width="720"></img></a></p>
<p>This article focuses on the machine learning aspects of the problem, and the use of pattern recognition techniques leading to interesting, new findings about twin primes. Twin primes are prime numbers <em>p</em> such that <em>p</em> + 2 is also prime. For instance, 3 and 5, or 29 and 31. A famous, unsolved and old mathematical conjecture states that…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9525260066?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9525260066?profile=RESIZE_710x" width="720" class="align-center"/></a></p>
<p>This article focuses on the machine learning aspects of the problem, and the use of pattern recognition techniques leading to interesting, new findings about twin primes. Twin primes are prime numbers <em>p</em> such that <em>p</em> + 2 is also prime. For instance, 3 and 5, or 29 and 31. A famous, unsolved and old mathematical conjecture states that there are infinitely many such primes, but a proof still remains elusive to this day. Twin primes are far rarer than primes: there are infinitely more primes than there are twin primes, in the same way that that there are infinitely more integers than there are prime integers.</p>
<p>Here I discuss the results of my experimental math research, based on big data, algorithms, machine learning, and pattern discovery. The level is accessible to all machine learning practitioners. I first discuss my experimentations in section 1, and then how it relates to the twin prime conjecture, in section 2. In section 3, I discuss a generalization. Mathematicians may be interested as well, as it leads to a potential new path to prove this conjecture. But machine learning readers with little time, not curious about the link to the mathematical aspects, can read section 1 and skip section 2.</p>
<p>I do not prove the twin prime conjecture (yet). Rather, based on data analysis, I provide compelling evidence (the strongest I have ever seen), supporting the fact that it is very likely to be true. It is not based on heuristic or probabilistic arguments (unlike <a href="https://en.wikipedia.org/wiki/Twin_prime#First_Hardy%E2%80%93Littlewood_conjecture" target="_blank" rel="noopener">this version</a> dating back to around 1920), but on hard counts and strong patterns.</p>
<p>This is not different from analyzing data and finding that smoking is strongly correlated to lung cancer: the relationship may not be causal as there might be confounding factors. In order to prove causality, more than data analysis is needed (in the case of smoking, of course causality has been firmly established long ago.)</p>
<p><span style="font-size: 14pt;"><strong>1. The Machine Learning Experiment</strong></span></p>
<p>We start with the following sieve-like algorithm. Let <em>S<span style="font-size: 8pt;">N</span></em> = { 1, 2, ..., <em>N </em>} be the finite set consisting of the first <em>N</em> strictly positive integers, and <em>p</em> be a prime number. Let <i>A<span style="font-size: 8pt;">p</span></i> be a strictly positive integer, smaller than <em>p</em>. Remove from <em>S<span style="font-size: 8pt;">N</span></em> all the elements of the form <em>A<span style="font-size: 8pt;">p</span></em>, <em>p</em> + <em>A<span style="font-size: 8pt;">p</span></em>, 2<em>p</em> + <em>A<span style="font-size: 8pt;">p</span></em>, 3<em>p</em> + <em>A<span style="font-size: 8pt;">p</span></em>, 4<em>p</em> + <em>A<span style="font-size: 8pt;">p</span></em> and so on. After this step, the number of elements left will be very close to <em>N</em> (<em>p</em> - 1) / <em>p</em> = <em>N</em> (1 - 1/<em>p</em>). Now, remove all elements of the form <em>p</em> - <em>A<span style="font-size: 8pt;">p</span></em>, 2<em>p</em> - <em>A<span style="font-size: 8pt;">p</span></em>, 3<em>p</em> - <em>A<span style="font-size: 8pt;">p</span></em>, 4<em>p</em> - <em>A<span style="font-size: 8pt;">p</span></em> and so on. After this step, the number of elements left will be very close to <em>N</em> (1 - 2/<em>p</em>). Now pick up another prime number <em>q</em> and repeat the same procedure. After this step, the number of elements left will be very close to <em>N</em> (1 - 2/<em>p</em>) (1 - 2/<em>q</em>), because <em>p</em> and <em>q</em> are co-prime (because they are prime to begin with.)</p>
<p>If you repeat this step for all prime numbers <em>p</em> between <em>p</em> = 5 and <em>p</em> = <em>M</em> (assuming <em>M</em> is a fixed prime number much smaller than <em>N</em>, and <em>N</em> is extremely large and you let <em>N</em> tends to infinity) you will be left with a number of elements that is still very close to</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9518419873?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9518419873?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>where the product is over prime numbers only.</p>
<p>Let us introduce the following notations:</p>
<ul>
<li><em>S</em>(<em>M</em>, <em>N</em>) is the set left after removing all the specified elements, using the above algorithm, from <em>S<span style="font-size: 8pt;">N</span></em></li>
<li><em>C</em>(<em>M</em>, <em>N</em>) is the actual number of elements in <em>S</em>(<em>M</em>, <em>N</em>) </li>
<li><em>D</em>(<em>M</em>, <em>N</em>) = <em>P</em>(<em>M</em>, <em>N</em>) - <em>C</em>(<em>M</em>, <em>N</em>)</li>
<li><em>R</em>(<em>M</em>, <em>N</em>) = <em>P</em>(<em>M</em>, <em>N</em>) / <em>C</em>(<em>M</em>, <em>N</em>)</li>
</ul>
<p>In the context of the twin prime conjecture, the issue is that <em>M</em> is a function of <em>N</em>, and the above very good approximation, that is, replacing <em>C</em>(<em>M</em>, <em>N</em>) by <em>P</em>(<em>M</em>, <em>N</em>), is no longer good. More specifically, in that context, <em>M</em> = 6 SQRT(<i>N</i>), and <em>A<span style="font-size: 8pt;">p</span></em> = INT(<em>p</em>/6 + 1/2) where INT is the integer part function. The ratio <em>R</em>(<em>M</em>, <em>N</em>) would still be very close to 1 for most choices of <em>Ap</em>, assuming <em>M</em> is not too large compared to <em>N</em>, unfortunately, <em>A<span style="font-size: 8pt;">p</span></em> = INT(<em>p</em>/6 + 1/2) is one of the very few for which the approximation fails. On the plus side, it is also one of the very few that leads to a smooth, predictable behavior for <em>R</em>(<em>M</em>, <em>N</em>). This is what makes me think it could lead to a proof of the twin prime conjecture. Note that if <em>M</em> is very large, much larger than <em>N</em>, say <em>M</em> = 6<em>N</em>, then <em>C</em>(<em>M</em>, <em>N</em>) = 0 and thus <em>R</em>(<em>M</em>, <em>N</em>) is infinite.</p>
<p>Below is a plot displaying <em>D</em>(<em>M</em>, <em>N</em>) at the top, and <em>R</em>(<em>M</em>, <em>N</em>) at the bottom, on the Y-axis, for <em>N</em> = 400,000 and <em>M</em> between 5 and 3,323 on the X-axis. Only prime values of <em>M</em> are included, and <em>A<span style="font-size: 8pt;">p</span></em> = INT(<em>p</em>/6 + 1/2).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9515503474?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9515503474?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>It shows the following patterns:</p>
<ul>
<li>For small values of <em>M</em>, <em>R</em>(<em>M</em>, <em>N</em>) is very close to 1.</li>
<li>Then as <em>M</em> increases, <em>R</em>(<em>M</em>, <em>N</em>) experiences a small dip, followed by a maximum at some location <em>M</em><span style="font-size: 8pt;">0</span> on the X-axis. Then it smoothly decreases well beyond the critical value <em>M</em><span style="font-size: 8pt;">1</span> = 6 SQRT(<em>N</em>). It reaches a minimum at some location <em>M</em><span style="font-size: 8pt;">2</span> (not shown in the plot) followed by a rebound, increasing again until <em>M</em><span style="font-size: 8pt;">3</span> = 6<em>N</em>, where <em>R</em>(<em>M</em>, <em>N</em>) is infinite. The value of <em>M</em><span style="font-size: 8pt;">0</span> is approximately 3 SQRT(<em>N</em>) / 2.</li>
</ul>
<p>To prove the twin prime conjecture, all is left if the following: proving that <em>M</em><span style="font-size: 8pt;">0</span> < <em>M</em><span style="font-size: 8pt;">1</span> (that is, the peak always takes place before <em>M</em><span style="font-size: 8pt;">1</span>, regardless of <em>N</em>) and that <em>R</em>(<em>M</em><span style="font-size: 8pt;">0</span>, <em>N</em>), as a function of <em>N</em>, does not grow too fast. It seems the growth is logarithmic, but even if <em>R</em>(<em>M</em><span style="font-size: 8pt;">0</span>, <em>N</em>) grows as fast as <em>N</em> / (log <em>N</em>)^3, this is slow enough to prove the twin prime conjecture. Detailed explanations are provided in section 2.</p>
<p>The same patterns are also present if you try other values of <em>N</em>. I tested it for various <em>N</em>'s, ranging from <em>N</em> = 200 to <em>N</em> = 3,000,000. The higher <em>N</em>, the smoother the curve, the stronger the patterns. It also occurs with some other peculiar choices for <em>A<span style="font-size: 8pt;">p</span></em>, such as <em>A<span style="font-size: 8pt;">p</span></em> = INT(<em>p</em>/2 + 1/2) or <em>A<span style="font-size: 8pt;">p</span> </em>= INT(<em>p</em>/3 + 1/2), but not in general, not even for <em>A<span style="font-size: 8pt;">p</span></em> = INT(<em>p</em>/5 + 1/2). </p>
<p>It is surprising that the curve is so smooth, given the fact that we work with prime numbers, which behave somewhat chaotically. There has to be a mechanism that causes this unexpected smoothness. A mechanism that could be the key to proving the twin prime conjecture. More about this in section 2.</p>
<p><span style="font-size: 14pt;"><strong>2. Connection to the Twin Prime Conjecture</strong></span></p>
<p>If <em>M</em> = 6 SQRT(<em>N</em>) and <em>A<span style="font-size: 8pt;">p</span></em> = INT(<em>p</em>/6 + 1/2), then the set <em>S</em>(<em>M</em>, <em>N</em>) defined in section 1, contains only elements <em>q</em> such that 6<em>q</em> - 1 and 6<em>q</em> + 1 are twin primes. This fact is easy to prove, see <a href="https://math.stackexchange.com/q/4204121" target="_blank" rel="noopener">here</a>. It misses a few of the twin primes (the smaller ones) but this is not an issue since we need to prove that <em>S</em>(<em>M</em>, <em>N</em>), as <em>N</em> tends to infinity, contains infinitely many elements. The number of elements in <em>S</em>(<em>M</em>, <em>N</em>) is denoted as <em>C</em>(<em>M</em>, <em>N</em>).</p>
<p>Let us define <em>R</em><span style="font-size: 8pt;">1</span>(<em>N</em>) = <em>R</em>(<em>M</em><span style="font-size: 8pt;"><strong>1</strong></span>, <em>N</em>) and <em>R</em><span style="font-size: 8pt;">0</span>(<em>N</em>) = <em>R</em>(<em>M</em><span style="font-size: 8pt;">0</span>, <em>N</em>). Here <em>M</em><span style="font-size: 8pt;">1</span> = 6 SQRT(<em>N</em>) and <em>M</em><span style="font-size: 8pt;">0</span> is defined in section 1, just below the plot. To prove the twin prime conjecture, one has to prove that <em>R</em><span style="font-size: 8pt;">1</span>(<em>N</em>) < R<span style="font-size: 8pt;">0</span>(<em>N</em>) and that <em>R</em><span style="font-size: 8pt;">0</span><em>(N</em>) does not grow too fast, as <em>N</em> tends to infinity. </p>
<p>The relationship <em>R</em><span style="font-size: 8pt;">1</span>(<em>N</em>) < <em>R</em><span style="font-size: 8pt;">0</span>(<em>N</em>) can be written as <em>P</em>(<em>M</em><span style="font-size: 8pt;">1</span>, <em>N</em>) / <em>R</em><span style="font-size: 8pt;">0</span>(<em>N</em>) < <em>C</em>(<em>M</em><span style="font-size: 8pt;">1</span>, <em>N</em>). If the number of twin primes is infinite, then <em>C</em>(<em>M</em><span style="font-size: 8pt;">1</span>, <em>N</em>) tends to infinity as <em>N</em> tends to infinity. Thus if <em>P</em>(<em>M</em><span style="font-size: 8pt;">1</span>, <em>N</em>) / <em>R</em><span style="font-size: 8pt;">0</span>(<em>N</em>) also tends to infinity, that is, if <em>R</em><span style="font-size: 8pt;">0</span>(<em>N</em>) / <em>P</em>(<em>M</em><span style="font-size: 8pt;">1</span>, <em>N</em>) tends to zero, then it would prove the twin prime conjecture. Note that <em>P</em>(<em>M</em><span style="font-size: 8pt;">1</span>, <em>N</em>) is asymptotically equivalent (up to a factor not depending on <em>N</em>) to <em>N</em> / (log <em>M</em><span style="font-size: 8pt;">1</span>)^2, that is, to <em>N</em> / (log <em>N</em>)^2. So if <em>R</em><span style="font-size: 8pt;">0</span>(<em>N</em>) grows more slowly than (say) <em>N</em> / (log <em>N</em>)^3, it would prove the twin prime conjecture. Empirical evidence suggests that <em>R</em><span style="font-size: 8pt;">0</span>(<em>N</em>) grows like log <em>N</em> at most, so it looks promising.</p>
<p>The big challenge here, to prove the twin prime conjecture, is that the observed patterns (found in section 1 and used in the above paragraph), however strong they are, may be very difficult to formally prove. Indeed, my argumentation still leaves open the possibility that there are only a finite number of twin primes: this could happen if <em>R</em><span style="font-size: 8pt;">0</span>(<em>N</em>) grows too fast.</p>
<p>The next step to make progress would be to look at small values of <em>N</em>, say <em>N</em> = 100, and try to understand, from a theoretical point of view, what causes the observed patterns. Then try to generalize to any larger <em>N</em> hoping the patterns can be formally explained via a mathematical proof. </p>
<p>The table below summarizes the main results of my computations. It is available <a href="https://storage.ning.com/topology/rest/1.0/file/get/9525882885?profile=original" target="_blank" rel="noopener">here</a>.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9525882495?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9525882495?profile=RESIZE_710x" width="450" class="align-center"/></a></p>
<p>Note that if <em>M</em><span style="font-size: 8pt;">1</span> = 6 SQRT(<em>N</em>), then the set <em>S</em>(<em>M</em><span style="font-size: 8pt;">1</span>, <em>N</em>) is a subset of the following sequence: <a href="https://oeis.org/A002822" target="_blank" rel="noopener">A002822</a>. In particular, if <em>N</em> = 3,068,200, then <em>S</em>(<em>M</em><span style="font-size: 8pt;">1</span>, <em>N</em>) contains all the 99,998 elements of A002822 (mapping to the first 99,998 twin primes if you ignore {3, 5}) up to 3,068,165, except for the first 215 entries. Thus <em>C</em>(<em>M</em><span style="font-size: 8pt;">1</span>, <em>N</em>) = 99,998 - 215 = 99,783 as shown in the above table. If <em>M</em><span style="font-size: 8pt;">1</span> < 6 SQRT(<em>N</em>), then <em>S</em>(<em>M</em><span style="font-size: 8pt;">1</span>, <em>N</em>) not only misses more elements of A002822, but it also includes elements that are not in A002822. Thus the reason to call <em>M</em><span style="font-size: 8pt;">1</span> = 6 SQRT(<em>N</em>) the critical point. The last element <em>q</em> = 3,068,165 corresponds to the twin primes 6<em>q</em> - 1 = 18,408,989 and 6<em>q</em> + 1 = 18,408,991. See also <a href="https://www.google.com/search?q=is+18%2C408%2C991+a+prime+number" target="_blank" rel="noopener">here</a>.</p>
<p><span style="font-size: 14pt;"><strong>3. Generalization</strong></span></p>
<p>The concepts discussed here also apply to cousin primes, sexy primes, prime numbers, and other related-prime numbers. This section is still under construction. In the meanwhile, I invite you to check my latest update on this topic, on MathOverflow, <a href="https://mathoverflow.net/questions/403372/counting-primes-twin-primes-cousin-primes-unusual-approach-connection-to-som" target="_blank" rel="noopener">here</a>. </p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>. A selection of the most recent ones can be found on <a href="https://www.vgranville.com/" target="_blank" rel="noopener">vgranville.com</a>. </em></span></p>
<p></p>The Inverse Problem in Random Dynamical Systemstag:www.datasciencecentral.com,2021-08-27:6448529:BlogPost:10648972021-08-27T04:00:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9484111501?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/9484111501?profile=RESIZE_710x" width="500"></img></a></p>
<p style="text-align: center;"><em>Dynamical system used in weather prediction (see <a href="https://math.duke.edu/research/pde-and-dynamical-systems" rel="noopener" target="_blank">here</a>)</em></p>
<p>We are dealing here with random variables recursively defined by <em>X</em><span style="font-size: 8pt;"><em>n</em>+1</span> =…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9484111501?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9484111501?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p style="text-align: center;"><em>Dynamical system used in weather prediction (see <a href="https://math.duke.edu/research/pde-and-dynamical-systems" target="_blank" rel="noopener">here</a>)</em></p>
<p>We are dealing here with random variables recursively defined by <em>X</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>g</em>(<em>X</em><span style="font-size: 8pt;"><em>n</em></span>), with <em>X</em><span style="font-size: 8pt;">1</span> being the initial condition. The examples discussed here are simple, discrete and one-dimensional: the purpose is to illustrate the concepts so that it can be understood and useful to a large audience, not just to mathematicians. I wrote many articles about dynamical systems, see for example <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">here</a>. The originality in this article is that the systems discussed are now random, as <em>X</em><span style="font-size: 8pt;">1</span> is a random variable. Applications include the design of non-periodic pseudorandom number generators, and cryptography. Also, such systems, especially more complex ones such as fully stochastic dynamical systems, are routinely used in financial modeling of commodity prices.</p>
<p>We focus on mappings <em>g</em> on the fixed interval [0, 1]. That is, the support domain of <em>X<span style="font-size: 8pt;">n</span></em> is [0, 1], and <em>g</em> is a many-to-one mapping onto [0,1]. The most trivial example, known as the dyadic or Bernoulli map, is when g(x) = 2<em>x</em> - INT(2<em>x</em>) = { 2<em>x</em> } where the curly brackets represent the fractional part function (see <a href="https://en.wikipedia.org/wiki/Fractional_part" target="_blank" rel="noopener">here</a>). This is sometimes denoted as <em>g</em>(<em>x</em>) = 2<em>x</em> mod 1. The most well-known and possibly oldest example is the logistic map (see <a href="https://www.datasciencecentral.com/profiles/blogs/logistic-map-chaos-randomness-and-quantum-algorithms" target="_blank" rel="noopener">here</a>) with <em>g</em>(<em>x</em>) = 4<em>x</em>(1 - <em>x</em>).</p>
<p>We start with a simple exercise that requires very little mathematical knowledge, but a good amount of out-of-the-box thinking. The solution is provided. The discussion is about a specific, original problem, referred to as the inverse problem, and introduced in section 2. The reasons for being interested in the inverse problem are also discussed. Finally, I provide an Excel spreadsheet with all my simulations, for replication purposes. Before discussing the inverse problem, we discuss the standard problem in section 1.</p>
<p><span style="font-size: 14pt;"><strong>1. The standard problem</strong></span></p>
<p>One of the main problems in dynamical systems is to find if the distribution of <em>X<span style="font-size: 8pt;">n</span></em> converges, and find the limit, called invariant measure, invariant distribution, fixed-point distribution, or attractor. The attractor, depending on <em>g</em>, is typically the same regardless of the initial condition <em>X</em><span style="font-size: 8pt;">1</span>, except for some special initial conditions causing problems (this set of bad initial conditions has Lebesgue measure zero, and we ignore it here). As an example, with the Bernoulli map <em>g</em>(<em>x</em>) = { 2<em>x</em> }, all rational numbers (and many other numbers) are bad initial conditions. They are however far outnumbered by good initial conditions. It is typically very difficult to determine if a specific initial condition is a good one. Proving that <span><em>π</em>/4 is a good initial condition for the Bernoulli map would be a major accomplishment, making you instantly famous in the mathematical community, and proving that the digits of <em>π</em> in base 2, behave exactly like independently and identically distributed Bernoulli random variables. Good initial conditions for the Bernoulli map are called <a href="https://en.wikipedia.org/wiki/Normal_number" target="_blank" rel="noopener">normal numbers</a> in base 2.</span></p>
<p>It is also assumed that the dynamical system is <a href="https://en.wikipedia.org/wiki/Ergodicity" target="_blank" rel="noopener">ergodic</a>: all systems investigated here are ergodic; I won't elaborate on this concept, but the curious, math-savvy reader can check the meaning on Wikipedia. Finding the attractor is a difficult problem, and it usually requires solving a stochastic integral equation. Except in rare occasions (discussed <a href="https://www.datasciencecentral.com/profiles/blogs/number-representation-systems-explained-in-one-picture" target="_blank" rel="noopener">here</a> and in my book, <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" target="_blank" rel="noopener">here</a>), no exact solution is known, and one needs to use numerical methods to find an approximation. This is illustrated in section 1.1., with the attractor found (approximately) using simulations in Excel. In section 2., we focus on the much easier inverse problem, which is the main topic of this article.</p>
<p><strong>1.1. Standard problem: example</strong></p>
<p>Let's start with <em>X</em><span style="font-size: 8pt;">1</span> defined as follows: <em>X</em><span style="font-size: 8pt;">1</span> = <em>U</em> / (1 - <em>U</em>)^<em><span>α</span></em>, where <em>U</em> is a uniform deviate on [0, 1], <em>α</em> = 0.25, and ^ denotes the power operator (2^3 = 8). We use <em>g</em>(<em>x</em>) = { 4<em>x</em>(1 - <em>x</em>) }, where { } denotes the fractional part function. Essentially, this is the logistic map. I produced 10,000 deviates for <em>X</em><span style="font-size: 8pt;">1</span>, and then applied the mapping <em>g</em> iteratively to each of these deviates, up to <em>X<span style="font-size: 8pt;">n</span></em> with <em>n</em> = 53. The scatterplot below represents the empirical percentile distribution function (PDF), respectively for <em>X</em><span style="font-size: 8pt;">3</span> in blue, and <em>X</em><span style="font-size: 8pt;">53</span> in orange. These PDF's, for <em>X</em><span style="font-size: 8pt;">2</span>, <em>X</em><span style="font-size: 8pt;">3</span>, and so on, slowly converge to a limit, corresponding to the attractor. The orange S-curve (<em>n</em> = 53) is extremely close to the limiting PDF, and additional iterations (that is, increasing <em>n</em>) barely provide any change. So we found the limit (approximately) using simulations. Note that the cumulative distribution function (CDF) is the inverse of the PDF. All this was done with Excel alone.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9483753458?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9483753458?profile=RESIZE_710x" width="250" class="align-center"/></a></p>
<p><span style="font-size: 14pt;"><strong>2. The inverse problem</strong></span></p>
<p>The inverse problem consists of finding <em>g</em>, assuming the attractor distribution (the orange curve in the above figure) is known. Typically, there are many possible solutions. One of the possible reasons for solving the inverse problem is to get a sequence of random variables <em>X</em><span style="font-size: 8pt;">1</span>, <em>X</em><span style="font-size: 8pt;">2</span>, and so on, that exhibits little or no auto-correlations. For instance, the lag-1 auto-correlation (between <em>X<span style="font-size: 8pt;">n</span></em> and <em>X</em><span style="font-size: 8pt;"><em>n</em>+1</span>) for the Bernoulli map, is 1/2, which is way too high depending on the applications you have in mind. It is important in cryptography applications to remove these auto-correlations. The solution proposed here also satisfies the following property: <em>X</em><span style="font-size: 8pt;">2</span> = <em>g</em>(<em>X</em><span style="font-size: 8pt;">1</span>), <em>X</em><span style="font-size: 8pt;">3</span> = <em>g</em>(<em>X</em><span style="font-size: 8pt;">2</span>), <em>X</em><span style="font-size: 8pt;">4</span> = <em>g</em>(<em>X</em><span style="font-size: 8pt;">3</span>) and so on, all have the same pre-specified attractor distribution, regardless of the (non-singular) distribution of <em>X</em><span style="font-size: 8pt;">1</span>. </p>
<p><strong>2.1. Exercise</strong></p>
<p>Before diving into a solution, if you have time, I ask you to solve the following simple inverse problem. </p>
<p>Find a mapping <em>g</em> such that if <em>X</em><span style="font-size: 8pt;">n+1</span> = <em>g</em>(X<span style="font-size: 8pt;">n</span>), the attractor distribution is uniform on [0, 1]. Can you find one yielding very low auto-correlations between the successive <em>X<span style="font-size: 8pt;">n</span></em>'s? Hint: <em>g</em> may not be continuous. </p>
<p><strong>2.2. A general solution to the inverse problem</strong></p>
<p>A potential solution to the problem in section 2.1 is <em>g</em>(<em>x</em>) = { <em>bx</em> } where <em>b</em> is an integer larger than 1. This is because the uniform distribution on [0, 1] is the attractor for this map. The case <em>b</em> = 2 corresponds to the Bernoulli map discussed earlier. Regardless of <em>b</em>, INT(<em>bX<span style="font-size: 8pt;">n</span></em>) represents the <em>n</em>-th digit of <em>X</em><span style="font-size: 8pt;">1</span>, in base <em>b</em>. The lag-1 autocorrelation between <em>X<span style="font-size: 8pt;">n</span></em> and X<span style="font-size: 8pt;"><em>n</em>+1</span>, is then equal to 1 / <em>b</em>. Thus, the higher <em>b</em>, the better. Note that if you use Excel for simulations, avoid even integer values for <em>b</em>, as Excel has an internal glitch that will make your simulations meaningless after <em>n</em> = 45 iterations or so. </p>
<p>Now, a general solution offered here, for any pre-specified attractor and any non-singular distribution for <em>X</em><span style="font-size: 8pt;">1</span>, is based on a result proved <a href="https://mathoverflow.net/questions/402341/invariant-distributions-for-iterated-random-variables-stochastic-dynamical-syst" target="_blank" rel="noopener">here</a>. If <em>g</em> is the solution in question, then all <em>X<span style="font-size: 8pt;">n</span></em> (with <em>n</em> > 1) have the same distribution as the pre-specified attractor. I provide an Excel spreadsheet showing how it works for a specific example.</p>
<p>First, let's assume that <em>g</em>* is a solution when the attractor is the uniform distribution on [0, 1]. For instance <em>g</em>*(<em>x</em>) = { <em>bx</em> } as discussed earlier. Let <em>F</em> be the CDF of the target attractor, and assume its support domain is [0, 1]. Then a solution <em>g</em> is given by</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9484094872?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9484094872?profile=RESIZE_710x" width="150" class="align-center"/></a></p>
<p>For instance, if <em>F</em>(<em>x</em>) = <em>x</em>^2, with <em>x</em> in [0, 1], then <em>g</em>(<em>x</em>) = SQRT( { <em>bx</em>^2 } ) works, assuming <em>b</em> is an integer larger than 1. The scatterplot below shows the empirical CDF of <em>X</em><span style="font-size: 8pt;">2</span> (blue dots, based on 10,000 deviates) versus the CDF of the target attractor with distribution <em>F</em> (red curve): they are almost indistinguishable. I used <em>b</em> = 3, and for <em>X</em><span style="font-size: 8pt;">1</span>, I used the same distribution as in section 1.1. The detailed computations are available in my spreadsheet, <a href="http://datashaping.com/Recur3.xlsx" target="_blank" rel="noopener">here</a> (13 MB download).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9484100060?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9484100060?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>The summary statistics and the above plot are found in columns BD to BH, in my spreadsheet.</p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>Orbits of Non-periodic Fourier Series: Simple Introduction, Cool Applicationstag:www.datasciencecentral.com,2021-08-18:6448529:BlogPost:10634752021-08-18T05:00:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9439966865?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/9439966865?profile=RESIZE_710x" width="600"></img></a></p>
<p>These Fourier series can be considered as bivariate time series (<em>X</em>(<em>t</em>), <em>Y</em>(<em>t</em>)) where <em>t</em> is the time, <em>X</em>(<em>t</em>) is a weighted sum of cosine terms of arbitrary periods, and <em>Y</em>(<em>t</em>) is the same sum, except that cosine is replaced by sine. The orbit at time <em>t</em> is…</p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9439966865?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9439966865?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p>These Fourier series can be considered as bivariate time series (<em>X</em>(<em>t</em>), <em>Y</em>(<em>t</em>)) where <em>t</em> is the time, <em>X</em>(<em>t</em>) is a weighted sum of cosine terms of arbitrary periods, and <em>Y</em>(<em>t</em>) is the same sum, except that cosine is replaced by sine. The orbit at time <em>t</em> is</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9438550076?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9438550076?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>where <em>n</em> can be finite or infinite, and <em>A<span style="font-size: 8pt;">k</span></em>, <em>B<span style="font-size: 8pt;">k</span></em> are the coefficients or weights. The shape of the orbit varies greatly depending on the coefficients: it can be periodic, smooth or chaotic, exhibits holes (or not), or fill dense areas of the plane. For instance, if <em>B<span style="font-size: 8pt;">k</span></em> = <em>k</em> - 1, we are dealing with standard Fourier series, and the orbit is periodic. Also, <em>X</em>(<em>t</em>) and <em>Y</em>(<em>t</em>) can be viewed respectively as the real and imaginary part of a function taking values in the complex plane, as in one of the examples discussed here.</p>
<p>The goal of this article is to feature two interesting applications, focusing on exploratory analysis rather than advanced mathematics, and to provide beautiful visualizations. There is no attempt at categorizing these orbits: this would be the subject of an entire book. Finally, a number of interesting, off-the-beaten-path exercises are provided, ranging from simple to very difficult.</p>
<p>The orbit is always symmetric with respect to the X-axis, since <em>Y</em>(-<em>t</em>) = -<em>Y</em>(<em>t</em>).</p>
<p><span style="font-size: 14pt;"><strong>1. Application in astronomy</strong></span></p>
<p>We are interested in the center of gravity (centroid) of <em>n</em> planets <em>P</em><span style="font-size: 8pt;">1</span>, ..., <em>P<span style="font-size: 8pt;">n</span></em> of various masses, rotating at various speeds, around a star located at the origin (0, 0), in a two-dimensional framework (the ecliptic plane). In this model, celestial bodies are assumed to be points, and gravitational forces between the planets are ignored. Also, for simplification, the orbit of each planet is circular rather than elliptic. Planet <em>P<span style="font-size: 8pt;">k</span></em> has mass <em>M<span style="font-size: 8pt;">k</span></em>, and its orbit is circular with radius <em>R<span style="font-size: 8pt;">k</span></em>. Its rotation period is 2<em>π</em> / <em>B<span style="font-size: 8pt;">k</span></em>. Also, at <em>t</em> = 0, all the planets are aligned on the X-axis. Let <em>M</em> = <em>M</em><span style="font-size: 8pt;">1</span> + ... + <em>M<span style="font-size: 8pt;">n</span></em>. Then the orbit of the centroid has the same formula as above, with <em>A<span style="font-size: 8pt;">k</span></em> = <em>R<span style="font-size: 8pt;">k</span> M<span style="font-size: 8pt;">k</span></em> / <em>M</em> for <em>k</em> = 1, ..., <em>n</em>.</p>
<p>In the figures below, the left part represents the orbit of the centroid between <em>t</em> = 0 and <em>t</em> = 1,000 while the right part represents the orbit between <em>t</em> = 0 and <em>t</em> = 10,000.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9439141491?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9439141491?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9439256471?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9439256471?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9439260853?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9439260853?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3</strong></p>
<p><strong>In figure 1</strong>, we have <em>n</em> = 100 planets, all the planets have the same mass, <em>B<span style="font-size: 8pt;">k</span></em> = <em>k</em> + 1, and <em>R<span style="font-size: 8pt;">k</span></em> = 1 / (<em>k</em> + 1)^0.7 [ that is, 1 / (<em>k</em> + 1) at power 0.7]. The orbit is periodic because the <em>B<span style="font-size: 8pt;">k</span></em>'s are integers, though the period involves numerous little loops due to the large number of planets. The periodicity is masked by the thickness of the blue curve, but would be obvious to the naked eye on the right part of figure 1, if we only had 10 planets. I chose 100 planets because it creates a more beautiful, original plot.</p>
<p><strong>Figure 2</strong> is the same as figure 1, except that planet <em>P</em><span style="font-size: 8pt;">50</span> has a mass 100 times bigger than all other planets. You would think that the orbit of the centroid should be close to the orbit of the dominant planet, and thus close to a circle. However this is not the case, and you need a much bigger "outlier planet" to get an orbit (for the centroid) close to a circle.</p>
<p><strong>In figure 3</strong>, <em>n</em> = 50, <em>M<span style="font-size: 8pt;">k</span></em> = 1 / SQRT(<em>k</em>+1), <em>A<span style="font-size: 8pt;">k</span></em> = 1.75^(<em>k</em>+1), and <em>B<span style="font-size: 8pt;">k</span></em> = log(<em>k</em>+1). This time, the orbit is non periodic. The area in blue on the right side becomes truly dense when <em>t</em> becomes infinite; it is not a visual effect. Note that in all our examples, there is a hole encompassing the origin. In many other examples (not shown here), there is no hole. Figure 3 is related to our discussion in section 2.</p>
<p>None of the above examples is realistic, as they violate both Kepler's third law (see <a href="https://en.wikipedia.org/wiki/Kepler%27s_laws_of_planetary_motion" target="_blank" rel="noopener">here</a>) specifying the periods of the planets given <em>R<span style="font-size: 8pt;">k</span></em> (thus determining <em>B<span style="font-size: 8pt;">k</span></em>), and Titius-Bode law (see <a href="https://en.wikipedia.org/wiki/Titius%E2%80%93Bode_law" target="_blank" rel="noopener">here</a>) specifying the distances <em>R<span style="font-size: 8pt;">k</span></em> between the star and its <em>k</em>-th planet. In other words, it applies either to a universe governed by laws other than gravity, or in the early process of planet formation when individual planet orbits are not yet in equilibrium. It would be an easy exercise to input the correct values of <em>A<span style="font-size: 8pt;">k</span></em> and <em>B<span style="font-size: 8pt;">k</span></em> corresponding to the solar system, and see the resulting non periodic orbit for the centroid of the planets.</p>
<p><span style="font-size: 14pt;"><strong>2. The Riemann Hypothesis</strong></span></p>
<p>The <a href="https://www.datasciencecentral.com/profiles/blogs/deep-visualizations-riemann-s-conjecture" target="_blank" rel="noopener">Riemann hypothesis</a> is one of the most famous unsolved mathematical conjectures. It states that the Riemann Zeta function has no zero in a certain area of the (complex) plane, or in other words, that there is a hole around the origin in its orbit, depending on the parameter <em>s</em>, just like in Figures 1, 2 and 3. Its orbit corresponds to <em>A<span style="font-size: 8pt;">k</span></em> = 1 / <em>k</em>^<em>s</em>, <em>B<span style="font-size: 8pt;">k</span></em> = log <em>k</em>, and <em>n</em> infinite. Unfortunately, the cosine and sine series <em>X</em>(<em>t</em>), <em>Y</em>(<em>t</em>) diverge if <em>s</em> is equal to or less than 1. So in practice, instead of working with the Riemann Zeta function, one works with its sister called the Dirichlet Eta function, replacing <em>X</em>(<em>t</em>) and <em>Y</em>(<em>t</em>) by their alternating version, that is <em>A<span style="font-size: 8pt;">k</span></em> = (-1)^(<em>k</em>+1) / <em>k</em>^<em>s</em>. Then we have convergence in the critical strip 0.5 < <em>s</em> < 1. Proving that there is a hole around the origin if 0.5 < s < 1 amounts to proving the Riemann Hypothesis. The non periodic orbit in question can be seen <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">in this article</a> as well as in figure 4.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9439782681?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9439782681?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 4</strong></p>
<p>Figure 4 shows the orbit, when <em>n</em> = 1,000. The right part seems to indicate that the orbit eventually fills the hole surrounding the origin, as <em>t</em> becomes large. However this is caused by using only <em>n</em> = 1,000 terms in the cosine and sine series. These series converge very slowly and in a chaotic way. Interestingly, if <em>n</em> = 4, there is a well defined hole, see figure 5. For larger values of <em>n</em>, the hole disappears, but it starts reappearing as <em>n</em> becomes very large, as shown in the left part of figure 4.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9439867880?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9439867880?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 5</strong></p>
<p>If <em>n</em> = 4 (corresponding to three planets in section 1 since the first term is constant here), a well defined hole appears, although it does not encompass the origin (see figure 5). Proving the existence of a non-vanishing hole encompassing the origin, regardless of how large <em>t</em> goes and regardless of <em>s</em> in ]0.5, 1[, when <em>n</em> is infinite, would prove the Riemann hypothesis. </p>
<p>Note the resemblance between the left parts of figure 3 and 4. This could suggest two possible paths to proving the Riemann Hypothesis:</p>
<ul>
<li>Approximating the orbit of figure 4 by a an orbit like that of figure 3, and obtain a bound on the approximation error. If the bound is small enough, it will result in a smaller hole in figure 4, but possibly still large enough to encompass the origin.</li>
<li>Find a topological mapping between the orbits of figure 3 and 4: one that preserves the existence of the hole, and preserves the fact that the hole encompasses the origin. </li>
</ul>
<p> <span style="font-size: 14pt;"><strong>3. Exercises</strong></span></p>
<p>Here are a few questions for further exploration. They are related to section 1.</p>
<ul>
<li>In section 1, all the planets are aligned when <em>t</em> = 0. Can this still happen again in the future if <em>n</em> = 3? What if <em>n</em> = 4? Assume that the orbit of the centroid is non periodic, and <em>n</em> is the number of planets.</li>
<li>What are the conditions necessary and sufficient to make the orbit of the centroid non periodic?</li>
<li>At the initial condition (<em>t</em> = 0), is the centroid always inside the limit domain of oscillations (the right part on each figure, colored in blue)? Or can the orbit permanently drift away from its location at <em>t</em> = 0, depending on the <em>Ak</em>'<span style="font-size: 8pt;">s</span> and <em>Bk</em>'<span style="font-size: 8pt;">s</span>?</li>
<li>Find an orbit that has no hole. </li>
<li>Make a video, showing the planets moving around the star, as well as the orbital movement of the centroid of the planets. Make it interactive (like an API), allowing the users to input some parameters.</li>
<li>Can you compute the shape of the hole is <em>n</em> = 3, and prove its existence?</li>
<li>Try to categorize all possible orbits when <em>n</em> = 3 or <em>n</em> = 4.</li>
</ul>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He recently opened <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">Paris Restaurant</a>, in Anacortes. You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>A Simple Regression Problemtag:www.datasciencecentral.com,2021-07-29:6448529:BlogPost:10599022021-07-29T05:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9326895084?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/9326895084?profile=RESIZE_710x" width="600"></img></a></p>
<p>This article is part of a new series featuring problems with solution, to help you hone your machine learning and pattern recognition skills. Try to solve this problem by yourself first, before looking at the solution. Today's problem also has an intriguing mathematical appeal and solution: this allows you to check if your solution found using machine…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9326895084?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9326895084?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p>This article is part of a new series featuring problems with solution, to help you hone your machine learning and pattern recognition skills. Try to solve this problem by yourself first, before looking at the solution. Today's problem also has an intriguing mathematical appeal and solution: this allows you to check if your solution found using machine learning techniques, is correct or not. The level is for beginners. </p>
<p>The problem is as follows. Let <em>X</em><span style="font-size: 8pt;">1</span>, <em>X</em><span style="font-size: 8pt;">2</span>, <em>X</em><span style="font-size: 8pt;">3</span> and so on be a sequence recursively defined by X<span style="font-size: 8pt;"><em>n</em>+1</span> = Stdev(X<span style="font-size: 8pt;">1</span>, ..., <em>X<span style="font-size: 8pt;">n</span></em>). Here <em>X</em><span style="font-size: 8pt;">1</span>, the initial condition, is a positive real number or random variable. Thus,</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9326797280?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9326797280?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>It is clear that <em>X<span style="font-size: 8pt;">n</span></em> = <em>A<span style="font-size: 8pt;">n</span> X<span style="font-size: 8pt;">1</span></em>, where <em>A<span style="font-size: 8pt;">n</span></em> is a number that does not depend on <em>X</em><span style="font-size: 8pt;">1</span>. So we can assume, without loss of generality, that <span style="font-size: 8pt;"><span style="font-size: 10pt;"><em>X</em></span>1</span> = 1. For instance, <em>A</em><span style="font-size: 8pt;">1</span> = 1 and <em>A</em><span style="font-size: 8pt;">2</span> = 0. The purpose here is to study the behavior of <em>A<span style="font-size: 8pt;">n</span></em> (for large <em>n</em>) using simple model fitting techniques. I plotted the first few values of <em>A<span style="font-size: 8pt;">n</span></em>, below. In the figure below, the X-axis represents <em>n</em>, and the Y-axis represents <em>A<span style="font-size: 8pt;">n</span></em>. The question is: how to approximate <em>A<span style="font-size: 8pt;">n</span></em> as a simple function of <em>n</em>? Of course, a linear regression won't work. What about a polynomial regression?</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9326801281?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9326801281?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>The first 600 values of <em>A<span style="font-size: 8pt;">n</span></em> are available <a href="http://datashaping.com/stdv.txt" target="_blank" rel="noopener">here</a>, as a text file.</p>
<p><span style="font-size: 14pt;"><strong>Solution</strong></span></p>
<p>A tool as basic as Excel is good enough to find the solution. However, if you use Excel, the built-in function Stdev has a correcting factor that needs to be taken care of. But you can just use the values of <em>A<span style="font-size: 8pt;">n</span></em> available in my text file mentioned above, to avoid this problem.</p>
<p>If you use Excel, you can try various types of trend lines to approximate the blue curve, and even compute the regression coefficients and the R-squared for each tested model. You will find very quickly that the power trend line is the best model by far, that is, <em>A<span style="font-size: 8pt;">n</span></em> is very well approximated (for large values of <em>n</em>) by <em>A<span style="font-size: 8pt;">n</span></em> = <i>b</i> <em>n</em>^<em>c</em>. Here <em>n</em>^<em>c</em> stands for <em>n</em> at power <em>c</em>; also, <em>b</em> and <em>c</em> are the regression coefficients. In other words, log <em>A<span style="font-size: 8pt;">n</span></em> = log <em>b</em> + <em>c</em> log <em>n</em> (approximately). </p>
<p>What is very interesting, is that using some mathematics, you can actually compute the exact value of <em>c</em>. Indeed, <em>c</em> is solution of the equation <em>c</em>^2 = (2<em>c</em> + 1) (<em>c</em> + 1)^2, see <a href="https://math.stackexchange.com/questions/4190405/asymptotic-behavior-of-recurrence-x-n1-mboxstdevx-1-dots-x-n" target="_blank" rel="noopener">here</a>. This is a polynomial equation of degree 3, so the exact value of <em>c</em> can be computed. The approximation is <em>c</em> = -0.3522011. It is however very hard to get the exact value of <em>b</em>. </p>
<p>It would interesting to plot the residual error for each estimated value of <em>A<span style="font-size: 8pt;">n</span></em>, and see if it shows some pattern. This could lead to a better approximation: <em>A<span style="font-size: 8pt;">n</span></em> = <em>b</em> <em>n</em>^<em>c</em> (1 + <em>d </em>/ <em>n</em>), with three parameters: <em>b</em>, <em>c</em> (unchanged) and <em>d</em>.</p>
<p></p>
<div class="postbody"><div class="xg_user_generated"><p><span style="font-size: 12pt;"><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span style="font-size: 12pt;"><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He recently opened <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">Paris Restaurant</a>, in Anacortes. You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
</div>
</div>Covid-19: Fundamental Statistics that are Ignoredtag:www.datasciencecentral.com,2021-07-19:6448529:BlogPost:10579882021-07-19T04:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9266959092?profile=original" rel="noopener" target="_blank"><img class="align-full" src="https://storage.ning.com/topology/rest/1.0/file/get/9266959092?profile=RESIZE_710x" width="720"></img></a></p>
<p>This is not a discussion as to whether the data is flawed or not, or whether we are comparing apples to oranges or not (the way statistics are gathered in different countries). These are of course fundamental questions, but here I will only use data (provided by Google) that everyone seem to more or less agree with, and I am not questioning it…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9266959092?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9266959092?profile=RESIZE_710x" width="720" class="align-full"/></a></p>
<p>This is not a discussion as to whether the data is flawed or not, or whether we are comparing apples to oranges or not (the way statistics are gathered in different countries). These are of course fundamental questions, but here I will only use data (provided by Google) that everyone seem to more or less agree with, and I am not questioning it here.</p>
<p>The discussion is about why some of that data makes the news every day, while some other critical parts of that same public data set is nowhere mentioned. I will focus here on data from United Kingdom, which epitomizes the trend that all media outlets cover on a daily basis: a new spike in Covid infections. It is less pronounced in most other countries, though it could take the same path in the future. </p>
<p>The three charts below summarize the situation. But only the first chart is discussed at length. Look at these three charts, and see if you can find the big elephant in the room. If you do, no need to read the remaining of my article! The data comes from <a href="https://news.google.com/covid19/map?hl=en-US&mid=%2Fm%2F07ssc&gl=US&ceid=US%3Aen" target="_blank" rel="noopener">this source</a>. You can do the same research for any country that provides reliable data.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9266616088?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9266616088?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9266617472?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9266617472?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9266618061?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9266618061?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>Of course, what nobody talks about is the low ratio of hospitalizations per case, which is significantly down by an order of magnitude, compared to previous waves. Even lower is the number of deaths per case. Clearly, hospitalizations are up, so there is really some worsening taking place. And deaths take 2 to 3 weeks to show up in the data. This is why I selected United Kingdom, as the new wave started a while back yet deaths are not materializing (thankfully!)</p>
<p>This brings a number of questions:</p>
<ul>
<li>Are more people getting tested because they are flying again around the world and vacationing, or asked to get tested by their employer?</li>
<li>Are vaccinated people testing positive but don't get sick besides 24 hours of feeling unwell just after vaccination?</li>
<li>Are people who recovered from Covid testing positive again, but like vaccinated people, experience a milder case, possibly explaining the small death rate?</li>
</ul>
<p>It is argued that 99% of those hospitalized today are unvaccinated. Among the hospitalized, how many are getting Covid for the first time? How many are getting Covid for the second time? Maybe the latter group behaves like vaccinated people, that is, very few need medical assistance. And overall, what proportion of the population is either vaccinated or recovered (or both)? At some point, most of the unvaccinated who haven't been infected yet will catch the virus. But no one seems to know what proportion of the population fits in that category. At least I don't. Some sources say that the number of new cases is probably much higher than reported, missing as much as 90% of new cases (see <a href="https://www.cbsnews.com/news/scott-gottlieb-delta-variant-covid-19-vaccines/" target="_blank" rel="noopener">here</a>) as many young people may not experience symptoms strong enough to get tested. If that is the case, we would reach herd immunity faster than expected, with fewer deaths than expected. It would be interesting to make a comparison with the <a href="https://en.wikipedia.org/wiki/Spanish_flu" target="_blank" rel="noopener">Spanish Flu</a>, though vaccination technology was less advanced back then.</p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He recently opened <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">Paris Restaurant</a>, in Anacortes. You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
<p></p>Central Limit Theorem for Non-Independent Random Variablestag:www.datasciencecentral.com,2021-07-16:6448529:BlogPost:10572412021-07-16T04:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9256085655?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/9256085655?profile=RESIZE_710x" width="600"></img></a></p>
<p>The original version of the central limit theorem (CLT) assumes <em>n</em> independently and identically distributed (i.i.d.) random variables <em>X</em><span style="font-size: 8pt;">1</span>, ..., <em>X<span style="font-size: 8pt;">n</span></em>, with finite variance. Let <em>S<span style="font-size: 8pt;">n</span></em> = <em>X…</em></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9256085655?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9256085655?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p>The original version of the central limit theorem (CLT) assumes <em>n</em> independently and identically distributed (i.i.d.) random variables <em>X</em><span style="font-size: 8pt;">1</span>, ..., <em>X<span style="font-size: 8pt;">n</span></em>, with finite variance. Let <em>S<span style="font-size: 8pt;">n</span></em> = <em>X</em><span style="font-size: 8pt;">1</span> + ... + <em>X<span style="font-size: 8pt;">n</span></em>. Then the CLT states that</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9255975469?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9255975469?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>that is, it follows a normal distribution with zero mean and unit variance, as <em>n</em> tends to infinity. Here <em>μ</em> is the expectation of <em>X</em><span style="font-size: 8pt;">1</span>.</p>
<p>Various generalizations have been discovered, including for weakly correlated random variables. Note that the absence of correlation is not enough for the CLT to apply (see counterexamples <a href="https://math.stackexchange.com/questions/2730696/central-limit-theorem-for-dependent-random-variables-with-covariance-condition" target="_blank" rel="noopener">here</a>). Likewise, even in the presence of correlations, the CLT can still be valid under certain conditions. If auto-correlations are decaying fast enough, some results are available, see <a href="https://en.wikipedia.org/wiki/Central_limit_theorem#CLT_under_weak_dependence" target="_blank" rel="noopener">here</a>. The theory is somewhat complicated. Here our goal is to show a simple example to help you understand the mechanics of the CLT in that context. The example involves observations <em>X</em><span style="font-size: 8pt;">1</span>, ..., <em>X<span><span style="font-size: 8pt;">n</span></span></em> that behave like a simple type of time series: AR(1), also known as autoregressive time series of order one, a well studied process (see section 3.2 in <a href="https://www.datasciencecentral.com/profiles/blogs/new-approach-to-linear-algebra-in-machine-learning" target="_blank" rel="noopener">this article</a>).</p>
<p><span style="font-size: 14pt;"><strong>1. Example</strong></span></p>
<p>The example in question consists of observations governed by the following time series model: <em>X</em><span style="font-size: 8pt;"><em>k</em>+1</span> = <span><em>ρX</em><span style="font-size: 8pt;"><em>k</em></span> + <em>Y</em><span style="font-size: 8pt;"><em>k</em>+1</span>, with <em>X</em><span style="font-size: 8pt;">1</span> = <em>Y</em><span style="font-size: 8pt;">1</span>, and <em>Y</em><span style="font-size: 8pt;">1</span>, ..., <em>Y</em><span style="font-size: 8pt;"><em>n</em></span> are i.i.d. with zero mean and unit variance. We assume that |<em>ρ</em>| < 1. It is easy to establish the following:</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/9255974463?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9255974463?profile=RESIZE_710x" width="300" class="align-center"/></a></span></p>
<p>Here "~" stands for "asymptotically equal to" as n tends to infinity. Note that the lag-<em>k</em> autocorrelation in the time series of observations <em>X</em><span style="font-size: 8pt;">1</span>, ..., <em>X</em><span style="font-size: 8pt;"><em>n</em></span> is asymptotically equal to <span><em>ρ</em>^<em>k</em> (<em>ρ</em> at power <em>k</em>), so autocorrelations are decaying exponentially fast. Finally, the adjusted CLT (the last formula above) now includes a factor 1 - <em>ρ</em>. If course if <em>ρ</em> = 0, it corresponds to the classic CLT when expected values are zero.</span></p>
<p><strong>1.1. More examples</strong></p>
<p>Let <em>X</em><span style="font-size: 8pt;">1</span> be uniform on [0, 1] and <em>X</em><span style="font-size: 8pt;"><em>k</em>+1</span> = FRAC(<em>bX<span style="font-size: 8pt;">k</span></em>) where <em>b</em> is an integer strictly larger than one, and FRAC is the <a href="https://en.wikipedia.org/wiki/Fractional_part" target="_blank" rel="noopener">fractional part function</a>. Then it is known that <em>X<span style="font-size: 8pt;">k</span></em> also has (almost surely, see <a href="https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness" target="_blank" rel="noopener">here</a>) a uniform distribution on [0, 1], but the <em>X<span style="font-size: 8pt;">k</span></em>'s are autocorrelated with exponentially decaying lag-<em>k</em> autocorrelations equal to 1 / <em>b</em>^<em>k</em>. So I expect that the CLT would apply to this case. </p>
<p>Now let <em>X</em><span style="font-size: 8pt;">1</span> be uniform on [0, 1] and <em>X</em><span style="font-size: 8pt;"><em>k</em>+1</span> = FRAC(<em>b</em>+<em>X<span style="font-size: 8pt;">k</span></em>) where <em>b</em> is a positive irrational number. Again, <em>X<span style="font-size: 8pt;">k</span></em> is uniform on [0, 1]. However this time we have strong, long-range autocorrelations, see <a href="https://www.datasciencecentral.com/profiles/blogs/long-range-correlation-in-time-series-tutorial-and-case-study" target="_blank" rel="noopener">here</a>. I will publish results about this case (as to whether or not CLT still applies) in a future article.</p>
<p><strong>1.2. Note</strong></p>
<p>In the first example in section 1.1, with <em>X</em><span style="font-size: 8pt;"><em>k</em>+1</span> = FRAC(<em>bX<span style="font-size: 8pt;">k</span></em>), let <em>Z<span style="font-size: 8pt;">k</span></em> = INT(<em>bX<span style="font-size: 8pt;">k</span></em>), where INT is the <a href="https://en.wikipedia.org/wiki/Floor_and_ceiling_functions" target="_blank" rel="noopener">integer part function</a>. Now <em>Z<span style="font-size: 8pt;">k</span></em> is the <em>k</em>-th digit of <em>X</em><span style="font-size: 8pt;">1</span> in base <em>b</em>. Surprisingly, if <em>X</em><span style="font-size: 8pt;">1</span> is uniformly distributed on [0, 1], then all the <em>Z<span style="font-size: 8pt;">k</span></em>'s, even though they all depend exclusively on <em>X</em><span style="font-size: 8pt;">1</span>, are (almost surely) independent. Indeed, if you pick up a number at random, there is a 100% chance that its successive digits are independent. However, there are infinitely many exceptions, for instance the sequence of digits of a rational number is periodic, thus violating the independence property. But such numbers are so rare (despite constituting a dense set in [0, 1]) that the probability to stumble upon one by chance is 0%. In any case, this explains why I used the word "almost surely".</p>
<p><span style="font-size: 14pt;"><strong>2. Results based on simulations</strong></span></p>
<p>The simulation consisted of generating 100,000 time series <em>X</em><span style="font-size: 8pt;">1</span>, ..., <em>X</em><span><em><span style="font-size: 8pt;">n</span> </em>as in section 1,<em> </em></span>with <span><em>ρ </em></span>= 1/2, each one with <em>n</em> = 10,000 observations, computing <em>S<span style="font-size: 8pt;">n</span></em> for each of them, and standardizing <em>S<span style="font-size: 8pt;">n</span></em> to see if it follows a <em>N</em>(0, 1) distribution. The empirical density follows a normal law with zero mean and unit variance very closely, as shown in the figure below. We used uniform variables with zero mean and unity variance to generate the deviates <em>Y<span style="font-size: 8pt;">k</span></em>.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9256083059?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9256083059?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>Below is one instance (realization) of these simulated time series, featuring the first <em>n</em> = 150 observations. The Y-axis represents <em>X<span style="font-size: 8pt;">k</span></em>, the X-axis represents <em>k</em>. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9256076454?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9256076454?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>The speed at which convergence to the normal distribution takes place has been studied. Typically, this is done using the Berry-Esseen theorem, see <a href="https://en.wikipedia.org/wiki/Berry%E2%80%93Esseen_theorem" target="_blank" rel="noopener">here</a>. A version of this theorem, for weakly correlated variables, also exists: see <a href="https://arxiv.org/pdf/1606.01617.pdf%20berry%20essen%20weak%20correlation" target="_blank" rel="noopener">here</a>. The Berry–Esseen theorem specifies the rate at which this convergence takes place by giving a bound on the maximal error of approximation between the normal distribution and the true distribution of the scaled sample mean. The approximation is measured by the Kolmogorov–Smirnov distance. It would be interesting to see, using simulations, if the convergence is slower when <span><em>ρ</em> is different from zero.</span></p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He recently opened <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">Paris Restaurant</a>, in Anacortes. You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>Salary Trends for Data Scientists and Machine Learning Professionalstag:www.datasciencecentral.com,2021-07-08:6448529:BlogPost:10561922021-07-08T04:49:58.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9219427086?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/9219427086?profile=RESIZE_710x" width="500"></img></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.burtchworks.com/2020/08/26/2020-salaries-and-demographic-trends-for-data-scientists-analytics-pros/" rel="noopener" target="_blank">here</a></em></p>
<p>If you are wondering how much a data scientist earns, whether you are a hiring manager or looking for a job, there are plenty of websites…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9219427086?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9219427086?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.burtchworks.com/2020/08/26/2020-salaries-and-demographic-trends-for-data-scientists-analytics-pros/" target="_blank" rel="noopener">here</a></em></p>
<p>If you are wondering how much a data scientist earns, whether you are a hiring manager or looking for a job, there are plenty of websites providing rather detailed information, broken down by area, seniority, and skills. Here I focus on the United States, offering a summary based on various trusted websites.</p>
<p>A starting point is LinkedIn. Sometimes, the salary attached to a position is listed, and LinkedIn will tell you how many people viewed the job ad, and how well you fit based on skill matching and experience. LinkedIn will even tell you which of your connections work for the company in question, so you may contact the most relevant ones. Positions with fewer views, that are two week old, are less competitive (but maybe less attractive too), but if you don't have much experience, they could be worth applying to. You probably receive such job ads in your mailbox, from LinkedIn, every week. If not, you need to work on your LinkedIn profile (or maybe you don't want to receive such emails).</p>
<p>Popular websites with detailed information include PayScale, GlassDoor, and Indeed. GlassDoor, based on 17,000 reported salaries (see <a href="https://www.glassdoor.com/Salaries/data-scientist-salary-SRCH_KO0,14.htm" target="_blank" rel="noopener">here</a>), mentions a range from $82k to $165k, with an average of $116k per year for a level-2 data scientist. It climbs to $140k for level-3. You can do a search by city or company. Some companies listed include:</p>
<ul>
<li><strong>Facebook</strong>: $153,000 based on 1,006 salaries. The range is $55K - $226K.</li>
<li><strong>Quora</strong>: $122,875 based on 509 salaries. The range is $113K - $164K.</li>
<li><strong>Oracle</strong>: $148,396 based on 457 salaries. The range $88K - $178K.</li>
<li><strong>IBM</strong>: $130,546 based on 382 salaries. The range is $58K - $244K.</li>
<li><strong>Google</strong>: $148,560 based on 246 salaries. The range is $23K - $260K.</li>
<li><strong>Microsoft</strong>: $134,042 based on 204 salaries. The range is $13K - $292K.</li>
<li><strong>Amazon</strong>: $125,704 based on 190 salaries. The range is $60K - $235K.</li>
<li><strong>Booz Allen Hamilton</strong>: $90,000 based on 186 salaries. The range is $66K - $215K.</li>
<li><strong>Walmart</strong>: $108,937 based on 185 salaries. The range is $78K - $186K.</li>
<li><strong>Cisco</strong>: $157,228 based on 166 salaries. The range is $79K - $186K.</li>
<li><strong>Uber</strong>: $143,661 based on 137 salaries. The range is $56K - $200K.</li>
<li><strong>Intel</strong>: $125,936 based on 129 salaries. The range is $58K - $180K.</li>
<li><strong>Apple</strong>: $153,885 based on 128 salaries. The range is $60K - $210K.</li>
<li><strong>Airbnb</strong>: $180,569 based on 122 salaries. The range is $99K - $242K.</li>
</ul>
<p>These are base salaries and do not include bonus, stock options, or other perks. Companies with many employees in the Bay Area offer bigger salaries due to the cost of living. These statistics may be somewhat biased as very senior employees are less likely to provide their salary information. A chief data scientist typically makes well above $200k a year, not including bonuses, and an $800k salary, at that level, at companies such as Microsoft or Deloitte (based on my experience), is not uncommon. On the low end, you have interns and part-time workers. If you visit Glassdoor, you can get much more granular data.</p>
<p>Below are statistics coming this time from Indeed (see <a href="https://www.indeed.com/career/data-scientist/salaries?salaryType=YEARLY&from=careers_serp" target="_blank" rel="noopener">here</a>). They offer a different perspective, with breakdown by type of expertise and area. The top 5 cities with highest salaries are San Francisco ($157,041), Santa Clara ($156,284), New York ($140,262), Austin ($133,562) and San Diego ($124,679). Surprisingly, the pay is lower in Seattle than in Houston. Note that if you work remotely for a company in the Bay Area, you may get a lower salary if you live in an area with lower cost of living. Still, you would be financially better off than your peers in San Francisco.</p>
<p>The kind of experience commanding the highest salary (20 to 40% above average) are Cloud Architecture, DevOps, CI/CD (<span>continuous delivery and/or continuous deployment)</span>, Microservices, and Performance Marketing. Finally, Indeed also displays salaries for related occupations, with the following averages:</p>
<ul>
<li><strong>Data Analyst</strong>, 27017 openings, $70,416</li>
<li><strong>Machine Learning Engineer</strong>, 27196 openings, $150,336</li>
<li><strong>Data Engineer</strong>, 10527 openings, $128,157</li>
<li><strong>Statistician</strong>, 1733 openings, $96,661</li>
<li><strong>Statistical Analyst</strong>, 15060 openings, $66,175</li>
<li><strong>Principal Scientist</strong>, 1644 openings, $143,266</li>
</ul>
<p>The average for Data Scientist is $119,444 according to Indeed. This number is similar to the one coming from Glassdoor. Note that some well-funded startups can offer large salaries. My highest salary was as chief scientist / co-founder at a company with less than 20 employees. And my highest compensation was for a company I created and funded myself, though I was not on a payroll and I did not assign myself a job title.</p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He recently opened <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">Paris Restaurant</a>, in Anacortes. You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>More Fun Math Problems for Machine Learning Practitionerstag:www.datasciencecentral.com,2021-06-19:6448529:BlogPost:10542292021-06-19T16:26:56.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/9114842491?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/9114842491?profile=RESIZE_710x" width="600"></img></a></span></p>
<p><span>This is part of a series featuring the following aspects of machine learning:</span></p>
<ul>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/fun-mathematics-for-machine-learning-practitioners" rel="noopener" target="_blank">Mathematics</a>, simulations, benchmarking algorithms based on synthetic data (in short,…</span></li>
</ul>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/9114842491?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9114842491?profile=RESIZE_710x" width="600" class="align-center"/></a></span></p>
<p><span>This is part of a series featuring the following aspects of machine learning:</span></p>
<ul>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/fun-mathematics-for-machine-learning-practitioners" target="_blank" rel="noopener">Mathematics</a>, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science)</span></li>
<li><span>Opinions, for instance about the value of a PhD in our field, or the use of some techniques</span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/more-machine-learning-tricks-recipes-and-statistical-models" target="_blank" rel="noopener">Methods, principles, rules of thumb, recipes, tricks</a></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-1" target="_blank" rel="noopener">Business analytics</a> </span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-2" target="_blank" rel="noopener">Core Techniques</a> </span></li>
</ul>
<p><span>This issue focuses on cool math problems that come with data sets, source code, and algorithms. See previous article <a href="https://www.datasciencecentral.com/profiles/blogs/fun-mathematics-for-machine-learning-practitioners" target="_blank" rel="noopener">here</a>. Many have a statistical, probabilistic or experimental flavor, and some are dealing with dynamical systems. They can be used to extend your math knowledge, practice your machine learning skills on original problems, or for curiosity. My articles, posted on Data Science Central, are always written in simple English and accessible to professionals with typically one year of calculus or statistical training, at the undergraduate level. They are geared towards people who use data but are interesting in gaining more practical analytical experience. The style is compact, geared towards people who do not have a lot of free time. </span></p>
<p><span>Despite these restrictions, state-of-the-art, of-the-beaten-path results as well as machine learning trade secrets and research material are frequently shared. References to more advanced literature (from myself and other authors) is provided for those who want to dig deeper in the interested topics discussed. </span></p>
<p><span><strong>1. Fun Math Problems for Machine Learning Practitioners</strong></span></p>
<p><span>These articles focus on techniques that have wide applications or that are otherwise fundamental or seminal in nature.</span></p>
<ol>
<li><a href="https://www.datasciencecentral.com/forum/topics/new-mathematical-conjecture" target="_blank" rel="noopener">New Mathematical Conjecture?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/cool-problems-in-probabilistic-number-theory" target="_blank" rel="noopener">Cool Problems in Probabilistic Number Theory and Set Theory</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/weird-mathematical-object-fractional-exponential" target="_blank" rel="noopener">Fractional Exponentials - Dataset to Benchmark Statistical Tests</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/two-beautiful-mathematical-results-part-2" target="_blank" rel="noopener">Two Beautiful Mathematical Results - Part 2</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/two-beautiful-mathematical-results" target="_blank" rel="noopener">Two Beautiful Mathematical Results</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/four-interesting-math-problems" target="_blank" rel="noopener">Four Interesting Math Problems</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/number-theory-nice-generalization-of-the-waring-conjecture" target="_blank" rel="noopener">Number Theory: Nice Generalization of the Waring Conjecture</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/amazing-random-sequences-with-cool-applications" target="_blank" rel="noopener">Fascinating Chaotic Sequences with Cool Applications</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/new-representation-of-numbers-with-very-fast-converging-fractions" target="_blank" rel="noopener">Representation of Numbers with Incredibly Fast Converging Fractions</a></li>
<li><a href="http://www.analyticbridge.com/forum/topics/self-replicating-programs" target="_blank" rel="noopener">Yet Another Interesting Math Problem - The Collatz Conjecture</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/simple-proof-of-prime-number-theorem" target="_blank" rel="noopener">Simple Proof of the Prime Number Theorem</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/factoring-massive-numbers-a-new-machine-learning-approach" target="_blank" rel="noopener">Factoring Massive Numbers: Machine Learning Approach</a></li>
<li><a href="http://www.datasciencecentral.com/forum/topics/challenge-representation-of-numbers-as-infinite-products" target="_blank" rel="noopener">Representation of Numbers as Infinite Products</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/a-beautiful-probability-theorem" target="_blank" rel="noopener">A Beautiful Probability Theorem</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/prime-numbers-interesting-distribution-and-density-results" target="_blank" rel="noopener">Fascinating Facts and Conjectures about Primes and Other Special Nu...</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/four-original-math-challenges" target="_blank" rel="noopener">Three Original Math and Proba Challenges, with Tutorial</a></li>
<li><a href="http://www.datasciencecentral.com/group/resources/forum/topics/best-kept-secret-about-data-science-competitions" target="_blank" rel="noopener">Challenges of the week</a></li>
</ol>
<p><span><strong>2. Free books</strong></span></p>
<ul>
<li><span><b>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning">here</a>. In about 300 pages and 28 chapters it covers many new topics, offering a fresh perspective on the subject, including rules of thumb and recipes that are easy to automate or integrate in black-box systems, as well as new model-free, data-driven foundations to statistical science and predictive analytics. The approach focuses on robust techniques; it is bottom-up (from applications to theory), in contrast to the traditional top-down approach.</span></p>
<p><span>The material is accessible to practitioners with a one-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications with numerous illustrations, is aimed at practitioners, researchers, and executives in various quantitative fields.</span></p>
</li>
<li><span><b>Applied Stochastic Processes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">here</a>. Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems (104 pages, 16 chapters.) This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject.</span></p>
<p><span>It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</span></p>
</li>
</ul>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He recently opened <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">Paris Restaurant</a>, in Anacortes. You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>The Machine Learning Process in 7 Stepstag:www.datasciencecentral.com,2021-06-13:6448529:BlogPost:10533822021-06-13T04:00:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9084500862?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/9084500862?profile=RESIZE_710x" width="600"></img></a></p>
<p></p>
<p>In this article, I describe the various steps involved in managing a machine learning process from beginning to end. Depending on which company you work for, you may or may not be involved in all the steps. In larger companies, you typically focus on one or two specialized aspects of a project. In small companies, you may be involved in all…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9084500862?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9084500862?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p></p>
<p>In this article, I describe the various steps involved in managing a machine learning process from beginning to end. Depending on which company you work for, you may or may not be involved in all the steps. In larger companies, you typically focus on one or two specialized aspects of a project. In small companies, you may be involved in all the steps. Here the focus is on large projects, such as developing a taxonomy, as opposed to ad-hoc or one-time analyses. I also mention all the people involved, besides machine learning professionals.</p>
<p><span style="font-size: 14pt;"><strong>Steps involved in machine learning projects</strong></span></p>
<p>In chronological order, here are the main steps. Sometimes it is necessary to recognize errors in the process and move back and start again at an earlier step. This is by no mean a linear process, but more like trial and error experimentation. </p>
<p><strong>1</strong>. <strong>Defining the problem</strong> and the metrics (also called features) that we want to track. Assessing the data available (internal and third party sources) or the databases that need to be created, as well as database architecture for optimum storing and processing. Discuss cloud architectures to choose from, data volume (potential future scaling issues), and data flows. Do we need real-time data? How much can safely be outsourced? Do we need to hire some staff? Discuss costs, ROI, vendors, and timeframe. Decision makers and business analysts are heavily involved, and data scientists and engineers may participate in the discussion.</p>
<p><strong>2. Defining goals</strong> and types of analyses to be performed. Can we monetize the data? Are we going to use the data for segmentation, customer profiling and better targeting, to optimize some processes such as pricing or supply chain, for fraud detection, taxonomy creation, to increase sales, for competitive or marketing intelligence, or to improve user experience for instance via a recommendation engine or better search capacities? What are the most relevant goals? Who will be the main users?</p>
<p><b>3</b>. <strong>Collecting the data</strong>. Assessing who has access to the data (and which parts of the data, such as summary tables versus life databases), and in what capacity. Here privacy and security issues are also discussed. The IT team, legal team and data engineers are typically involved. Dashboard design is also discussed, with the purpose of designing good dashboards for end-users such as decision makers, product or marketing team, or customers. </p>
<p><strong>4. Exploratory data analysis</strong>. Here data scientists are more heavily involved, though this step should be automated as much as possible. You need to detect missing data and how to handle it (using imputation methods), identify outliers and what they mean, summarize and visualize the data, find erroneously coded data and duplicates, find correlations, perform preliminary analyses, find best predicting features and optimum binning techniques (see section 4 <a href="https://www.datasciencecentral.com/profiles/blogs/decomposition-of-statistical-distributions-using-mixture-models-a" target="_blank" rel="noopener">in this article</a>). This could lead to the discovery of data flaws, and may force you to revisit and start again from a previous step, to fix any significant issue.</p>
<p><strong>5. The true machine learning / modeling step</strong>. At this point, we assume that the data collected is stable enough, and can be used for its original purpose. Predictive models are being tested, neural networks or other algorithms / models are being trained with goodness-of-fit tests and cross-validation. The data is available for various analyses, such as post-mortem, fraud detection, or proof of concept. Algorithms are prototyped, automated and eventually implemented in production mode. Output data is stored in auxiliary tables for further use, such as email alerts or to populate dashboards. External data sources may be added and integrated. As this point, major data issues have been fixed.</p>
<p><strong>6. Creation of end-user platform</strong>. Typically, it comes as dashboards featuring visualizations and summary data that can be exported in standardized formats, even spreadsheets. This provides the insights that can be acted upon by decision makers. The platform can be used for A/B testing. It can also come as a system of email alerts sent to decision makers, customers, or anyone who need to be informed.</p>
<p><strong>7. Maintenance</strong>. The models need to be adapted to changing data, changing patterns, or changing definitions of core metrics. Some satellite database tables must be updated, for instance every six months. Maybe a new platform able to store more data is needed, and data migration must be planned. Audits are performed to keep the system sound. New metrics may be introduced, as new sources of data are collected. Old data may be archived. Now we should get a good idea of the long-term yield (ROI) of the project, what works well and what needs to be improved. </p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>The Pros and Cons of Working for a Startuptag:www.datasciencecentral.com,2021-06-04:6448529:BlogPost:10524882021-06-04T03:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9032472853?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/9032472853?profile=RESIZE_710x" width="600"></img></a></p>
<p></p>
<p>As a machine learning professional, I have worked for several startups ranging from zero to 600 employees, as well as companies such as eBay, Wells Fargo, Visa and Microsoft. Here I share my experience. A brief summary can be found in my conclusions, at the bottom of this article.</p>
<p>It is not easy to define what a startup is. The first…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9032472853?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9032472853?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p></p>
<p>As a machine learning professional, I have worked for several startups ranging from zero to 600 employees, as well as companies such as eBay, Wells Fargo, Visa and Microsoft. Here I share my experience. A brief summary can be found in my conclusions, at the bottom of this article.</p>
<p>It is not easy to define what a startup is. The first one I worked for was NBCi, a spinoff of CNET, and had 600 employees almost on day one, and nearly half a billion dollars in funding, from GE. The pay was not great especially for San Francisco, I had stock options but the company went the way many startups go: it was shut down after two years when the Internet bubble popped, so I was able to only cash one year worth of salary from my stock options. Still not bad, but a far cry from what most people imagine. I was essentially the only statistician in the company, though they had a big IT team, with many data engineers, and were collecting a lot of data. I quickly learned that my best allies were in the IT department, and I was the bridge between the IT and the marketing department. I was probably the only "neutral" employee who could talk to both departments, as they were at war against each other (my boss was the lead of the marketing department). I also interacted a lot with the sales, product, and finance teams, and executives. I really liked that situation though, and the fact that there was a large turnover, allowing me to work with many new people (thus new connections and friends) on a regular basis, and on many original projects. The drawback: I was the only statistician. It was not an issue for me.</p>
<p>When people think about startups, many think about a company starting from scratch, with 20 employees, and funded with VC money. I also experienced that situation, and again, I was the only statistician (actually chief scientist and co-founder) though we also had a strong IT team. It lasted a few years until the 2008 crash, I had a great salary, and great stock options that never materialized. But they eventually bought one of my patents. I was hired as co-founder because I was (back then) the top expert in my field: click fraud detection, and scoring Internet traffic for advertisers and publishers. Again, I was the only machine learning guy, and not involved with live databases other than to set the rules and analyze the data. And to conceptually design the dashboard platform for our customers. I was interacting with various people from various small teams, occasionally even with clients, and prototyping solutions and working on proofs of concept - some helped us win a few large customers. I was in all the big meetings involving large, new clients, sometimes flying to the client's location. This is one of the benefits of working as a sole data scientist. Another one, especially if you have specialized, hard-to-find skills (earned by running small businesses on the side), is that I worked remotely, from home. </p>
<p>Yet another startup, the last one I co-founded, structured as a S-corp, had zero employee, no payroll, no funding, no CEO, and no office or headquarter (the official address, needed for tax purposes, was listed as my home address). It had no home-made Internet platform or database: this was inexpensively outsourced. We were working with people in different countries, our IT team (a one-man operation) was in Eastern Europe. This is the one that was acquired recently by a tech publisher, and my most successful exit. It still continues to grow very nicely today, despite (or thanks to) Covid. It started bare-bone unlike the other ones, making its survival more likely to happen, with 50% profit margins. However, people working with us were well paid, offered a lot of flexibility, and of course everyone was always working from home. We only met face-to-face when visiting a client. No stock options were ever issued; I made money in a different way. I was interacting mostly with sales, and also contributing content and automatically growing our membership using proprietary techniques of my own, that outsmarted all the competitors.</p>
<p>As for the big companies I worked for, I will say this. At Wells Fargo, I was part of a small group (about 100 people) with open office, relatively low hierarchy, and all the feelings of working for a startup. I was told that this was a special Wells Fargo experiment that the company reluctantly tried, in order to hire a different type of talent. It is unusual to be in such a working environment at Wells Fargo. To the contrary, Visa looked more like a big corporation, with many machine learning people each working on very specialized tasks, and a heavier hierarchy. Still I loved the place, and it really helped grow my career. The data sets were quite big, which pleased me. One of the benefits of working for such a company is the career opportunities that it provides. Finally, it is possible to work for a startup within a big company, in what is called a corporate startup. My first example, NBCi, illustrates this concept; in the end I was indirectly working for GE or NBC and even met with the GE auditing team and their six-sigma philosophy. Many of the folks they brought to the company were actually GE and NBC internal employees. </p>
<p><strong>Conclusion</strong></p>
<p>Finding a job at a startup may be easier than applying for positions at big companies. If you have solid expertise, the salary might even be better. Stock options could prove to be elusive. The job is usually more flexible and requires creativity; you might be the only machine learning employee in the company, interacting with various teams and even with clients. Projects can potentially be more varied and interesting, and the environment is usually fast-paced. Working from home is usually an option. You may report directly to the CEO; the hierarchy is typically less heavy. It requires adaptation and may not be a good fit for everyone. You can also work for a startup within a big corporation: it is called a corporate startup. Working for a big company may be a better move for your career, especially if your plan is to work for big companies in the future. Of course, startups also try to attract talent from big companies. </p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>Simple Introduction to Public-Key Cryptography and Cryptanalysis: Illustration with Random Permutationstag:www.datasciencecentral.com,2021-06-02:6448529:BlogPost:10520642021-06-02T04:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9022337069?profile=original" rel="noopener" target="_blank"><img class="align-full" src="https://storage.ning.com/topology/rest/1.0/file/get/9022337069?profile=RESIZE_710x" width="720"></img></a></p>
<p></p>
<p>In this article, I illustrate the concept of asymmetric key with a simple example. Rather than discussing algorithms such as RSA, (still widely used, for instance to set up a secure website) I focus on a system easier to understand, based on random permutations. I discuss how to generate these random permutations and compound them, and how to…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9022337069?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9022337069?profile=RESIZE_710x" width="720" class="align-full"/></a></p>
<p></p>
<p>In this article, I illustrate the concept of asymmetric key with a simple example. Rather than discussing algorithms such as RSA, (still widely used, for instance to set up a secure website) I focus on a system easier to understand, based on random permutations. I discuss how to generate these random permutations and compound them, and how to enhance such a system, using steganography techniques. I also explain why permutation-based cryptography is not good for public key encryption. In particular, I show how such as system can be reverse-engineered, no matter how sophisticated it is, using cryptanalysis methods. This article also features some nontrivial, interesting asymptotic properties of permutations (usually no taught in math classes) as well as the connection with a specific kind of matrices, yet using simple English rather than advanced math, so that this article can be understood by a wide audience.</p>
<p><span style="font-size: 14pt;"><strong>1. Description of my public key encryption system</strong></span></p>
<p>Here <em>x</em> is the original message created by the sender, and <em>y</em> is the encrypted version that the receiver gets. The original message can be described as sequence of bits (zeros and ones). This is the format in which is it is internally encoded on a computer or when traveling through the Internet, be it encrypted or not, as computers only deal with bits (we are not talking about quantum computers or quantum Internet here, which operate differently). </p>
<p>The general system can be broken down into three main-components:</p>
<ul>
<li>Pre-processing: blurring the message to make it appear like random noise</li>
<li>Encryption via bit-reshuffling </li>
<li>Decryption</li>
</ul>
<p>We now explain these three steps. Note that the whole system processes information by blocks, each block (say 2048 bits) being processed separately.</p>
<p><strong>1.1. Blurring the message</strong></p>
<p>This steps consist in adding random bits at the end of each block (sometimes referred to as <em>padding</em>), then perform a XOR to further randomize the message. The bits to be added consist of zeroes and ones in such a proportion that the resulting, extended block has roughly 50 percent of zeroes and ones. For instance, if the original block contains 2048 bits, the extended blocks may contain up to 4096 bits.</p>
<p>Then, use a random string of bits, for instance 4096 binary digits of square root of two, and do a bitwise XOR (see <a href="https://en.wikipedia.org/wiki/Exclusive_or" target="_blank" rel="noopener">here</a>) with the 4096 bits obtained in the previous step. The resulting bit string is the input for the next step. </p>
<p><strong>1.2. Actual encryption step</strong></p>
<p>The block to be encoded is still denoted as <em>x</em>, though it is assumed to be the output of the previous step discussed in section 1.1, not part of the original message. The encryption step transforms <em>x</em> into <em>y</em>, and the general transformation can be described by</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9021486485?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9021486485?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>Here * is an <a href="https://en.wikipedia.org/wiki/Associative_property" target="_blank" rel="noopener">associative operator</a>, typically the matrix multiplication or the <a href="https://en.wikipedia.org/wiki/Function_composition" target="_blank" rel="noopener">composition operator</a> between two functions, the latter one usually denoted as o as in (<em>f</em> o <em>g</em>)(<em>x</em>) = <em>f(g</em>(<em>x</em>)). The transforms <em>K</em> and <em>L</em> can be seen as <a href="https://en.wikipedia.org/wiki/Permutation_matrix" target="_blank" rel="noopener">permutation matrices</a>. In our case they are actual permutations whose purpose is to reshuffle the bits of <em>x</em>, but permutations can be represented by matrices. The crucial element here is that <em>L</em> * <em>K</em> = <em>L</em>^<em>n</em> = <em>I</em> (that is, <em>L</em> at power <em>n</em> is the identity operator): this allows us to easily decrypt the message. Indeed, <em>x</em> = <em>L</em> * <em>y</em>. We need to be very careful in our choice of <em>L</em>, so that the smallest <em>n</em> satisfying <em>L</em>^<em>n</em> = <em>I</em> is very large. More on this in section 2. This is related to the mathematical theory of finite groups, but the reader does not need to be familiar with <a href="https://en.wikipedia.org/wiki/Group_theory" target="_blank" rel="noopener">group theory</a> to understand the concept. It is enough to know that permutations can be multiplied (composed), elevated to any power, or inversed, just like matrices. More about this can be found <a href="https://en.wikipedia.org/wiki/Permutation_group" target="_blank" rel="noopener">here</a>.</p>
<p>That said, the public and private keys are:</p>
<ul>
<li><strong>Public key</strong>: <em>K</em> (this all the sender needs to know to encrypt the block <em>x</em> as as <em>y</em> = <em>K</em> * <em>x</em>)</li>
<li><strong>Private keys</strong>: <em>n</em> and <em>L</em> (kept secret by the recipient); the decrypted block is <em>x</em> = <em>L</em> * <em>y</em></li>
</ul>
<p><strong>1.3. Decryption step</strong></p>
<p>I explained how to retrieve the block <em>x</em> in section 1.2 when you actually receive <em>y</em>. Once a block is decrypted, you still need to reverse the step described in section 1.1. This is accomplished by applying to <em>x</em> the same XOR as in section 1.1, then by removing the padding (the extra bits that were added to pre-process the message).</p>
<p><span style="font-size: 14pt;"><strong>2. About the random permutations</strong></span></p>
<p>Many algorithms are available to reshuffle the bits of <em>x</em>, see for instance <a href="https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle" target="_blank" rel="noopener">here</a>. Our focus is to explain the most simple one, and to discuss some interesting background about permutations, in order to reverse-engineer our encryption system (see section 3).</p>
<p><strong>2.1. Permutation algebra: basics</strong></p>
<p>Let's begin with basic definitions. A permutation <em>L</em> of <em>m</em> elements can be represented by a <em>m</em>-dimensional vector. For instance <em>L</em> = (5, 4, 1, 2, 3) means that the first element of your bitstream is moved to position 5, the second one to position 4, the third one to position 1, and so forth. This can be written as <em>L</em>(1) = 5 , <em>L</em>(2) = 4, <em>L</em>(3) = 1, <em>L</em>(4) = 2, and <em>L</em>(5) = 3. Now the square of <em>L</em> is simply <em>L</em>(<em>L</em>), and the <em>n</em>-th power is <em>L</em>(<em>L</em>(<em>L</em>...))) where <em>L</em> appears <em>n</em> times in that expression. The <strong><em>order</em></strong> of a permutation (see <a href="http://mathonline.wikidot.com/the-order-of-a-permutation" target="_blank" rel="noopener">here</a>) is the smallest <em>n</em> such that <em>L</em>^<em>n</em> is the identity permutation.</p>
<p>Each permutation is made up of a number of usually small sub-cycles, themselves treated as sub-permutations. For instance, in our example, <em>L</em>(1) = 5, <em>L</em>(5) = 3, <em>L</em>(3) = 1. This constitutes a sub-cycle of length 3. The other cycle, of length 2, is <em>L</em>(2) = 4, <em>L</em>(4) = 2. To compute the order of a permutation, compute the orders of each sub-cycle. The least common multiple of these orders is the order of your permutation. The successive powers of a permutation have the same sub-cycle structure. As a result, if <em>K</em> is a power of <em>L</em>, and <em>L</em> has order <em>n</em>, then both <em>L</em>^<em>n</em> and <em>K</em>^<em>n</em> are the identity permutation. This fact is of crucial importance to reverse-engineer this encryption system. </p>
<p>Finally, the power of a permutation can be computed very fast, using the <a href="https://en.wikipedia.org/wiki/Exponentiation_by_squaring" target="_blank" rel="noopener">exponentiation by squaring algorithm</a>, applied to permutations. Thus even if the order <em>n</em> is very large, it is easy to compute <em>K</em> (the public key). Unfortunately, the same algorithm can be used by a hacker to discover the private key <em>L</em>, and the order <em>n</em> (kept secret) of the permutation in question, once she has discovered the sub-cycles of <em>K</em> (which is easy to do, as illustrated in my example). For the average length of a sub-cycle in a random permutation, see <a href="https://math.stackexchange.com/questions/1409862/average-length-of-a-cycle-in-a-n-permutation" target="_blank" rel="noopener">this article</a>.</p>
<p><strong>2.2. Main asymptotic result</strong></p>
<p>The expected order <em>n</em> of a random permutation of length <em>m</em> (that is, when reshuffling <em>m</em> bits) is</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9022071859?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9022071859?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>For details, see <a href="https://en.wikipedia.org/wiki/Random_permutation_statistics#Order_of_a_random_permutation" target="_blank" rel="noopener">here</a>. For instance, if <em>m</em> = 4,096 then <em>n</em> is approximately equal to 6 x 10^10. If <em>m</em> = 65,536, then <em>n</em> is approximately equal to 2 x 10^37. It is possible to add many bits all equal to zero to the block being encrypted, to increase its size <em>m</em> and thus <em>n</em>, without increasing too much the size of the encrypted message after compression. However, if used with a public key, this encryption system has a fundamental flaw discussed in section 3, no matter how large <em>n</em> is.</p>
<p><strong>2.3. Random permutations</strong></p>
<p>The easiest way to produce a random permutation of <em>m</em> elements is as follows.</p>
<ul>
<li>Generate <em>L</em>(1) as a pseudo random integer between 1 and <em>m</em>. If <em>L</em>(1) = 1, repeat until <em>L</em>(1) is different from 1.</li>
<li>Assume that <em>L</em>(1), ..., <em>L</em>(<em>k</em>-1) have been generated. Generate <em>L</em>(<em>k</em>) as a pseudo random integer between 1 and <em>m</em>. If <em>L</em>(<em>k</em>) is equal to one of the previous <em>L</em>(1), ..., <em>L</em>(<em>k</em>-1), or if it is equal to <em>k</em>, repeat until this is no longer the case.</li>
<li>Stop after generating the last entry, <em>L</em>(<em>m</em>).</li>
</ul>
<p>I use binary digits of irrational numbers, stored in a large table, to simulate random integers, but there are better (faster) solutions. Also, the Fisher-Yates algorithm (see <a href="https://en.wikipedia.org/wiki/Random_permutation#Fisher-Yates_shuffles" target="_blank" rel="noopener">here</a>) is more efficient. </p>
<p><span style="font-size: 14pt;"><strong>3. Reverse-engineering the system: cryptanalysis</strong></span></p>
<p>To reverse-engineer my system, you need to be able to decrypt the encrypted block <em>y</em> if you only know the public key <em>K</em>, but not the private key <em>L</em> nor <em>n</em>. As discussed in section 2, the first step is to identify all the sub-cycles in the permutation <em>K</em>. This is easily done, see example in section 2.1. Once this is accomplished, compute all the orders of these sub-cycle permutations and compute the least common multiple of these orders. Again, this is easy to do, and this allows you to retrieve <em>n</em> even though it was kept secret. Now you know that <em>K</em>^<em>n</em> is the identity permutation. Compute <em>K</em> at power <em>n</em>-1, and apply this new permutation to the encrypted block <em>y</em>. Since <em>y</em> = <em>K</em> * <em>x</em>, you get the following:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/9022314263?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/9022314263?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>Now you've found <em>x</em>, problem solved. You can compute <em>K</em> at the power <em>n</em>-1 very fast even if <em>n</em> is very large, using the exponentiation by squaring algorithm mentioned in section 2.1. Of course you also need to undo the step discussed in section 1.1 to really fully decrypt the message, but that is another problem. The goal here was simply to break the step described in section 1.2.</p>
<p>In order to make a secure system, one must choose a transform <em>K</em> that is very difficult to invert, and permutations or permutation matrices (which can be hacked using the same technique) do not fit the bill. Permutation-based encryption may still be a good idea for symmetric key systems, that is, when no public key is involved.</p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>Could Machine Learning Practitioners Prove Deep Math Conjectures?tag:www.datasciencecentral.com,2021-05-26:6448529:BlogPost:10517912021-05-26T04:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8980996868?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8980996868?profile=RESIZE_710x" width="600"></img></a></p>
<p></p>
<p>Many of us have solid foundations in math or have an interest in learning more, and are passionate about solving difficult problems during our free time. Of course, most of us are not professional mathematicians, but we may bring some value to help solve some of the most challenging mathematical conjectures, especially the ones that can be…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8980996868?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8980996868?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p></p>
<p>Many of us have solid foundations in math or have an interest in learning more, and are passionate about solving difficult problems during our free time. Of course, most of us are not professional mathematicians, but we may bring some value to help solve some of the most challenging mathematical conjectures, especially the ones that can be stated in rather simple words. In my opinion, the less math-trained you are (up to some extent), the more likely you could come up with original, creative solutions. Not that we could end up proving the Riemann hypothesis or other problems of the same caliber and popularity: the short answer is no. But we might think of a different path, a potential new approach to tackle these problems, and discover new theories, models and techniques along the way, some applicable to data analysis and real business problems. And sharing our ideas with professional mathematicians could have benefits for them and for us. Working on these problems during our leisure time could also benefit our machine learning career, if anything. In this article, I elaborate on these various points.</p>
<p><strong>The less math you learned, the more creative you could be</strong></p>
<p>Of course, this is true only up to some extent. You need to know much more than just high school math. When I started my PhD studies and asked my mentor if I should attend some classes or learn material that I knew was missing in my education, his answer was no: he said that the more you learn, the more you can get stuck in one particular way of thinking, and it can hurt creativity. He meant to say that acquiring deep vertical knowledge too fast, may not help; of course acquiring horizontal knowledge in various relevant fields broadens your horizon and can be very useful. That said, you still need to know a minimum (that is, acquiring a decent, deep enough vertical knowledge about the problem you are trying to solve), and these days it is very easy to self-learn advanced math by reading articles, using tools such as <a href="https://oeis.org/" target="_blank" rel="noopener">OEIS</a> or <a href="https://www.wolframalpha.com/" target="_blank" rel="noopener">Wolfram Alpha</a> (Mathematica) and posting questions on websites such as MathOverflow (see my profile and my posted questions <a href="https://mathoverflow.net/users/140356/vincent-granville" target="_blank" rel="noopener">here</a>), which are frequented by professional, research-level mathematicians. The drawback by not reading the classics (you should read them) is that you are bound to re-invent the wheel time and over, though in my case, that's the best way I learn new things. In addition to re-inventing the wheel, your knowledge will have big gaps, and it will show up.</p>
<p>Professionals with a background in physics, computer science, probability theory, statistics, pure math, or quantitative finance, may have a competitive advantage. Most importantly, you need to be passionate about your own private research, have a lot of modesty, perseverance, and patience as you fill face many disappointments, and not expect fame or financial rewards - in short, not any different than starting a PhD program. Some companies like Google may allow you to work on pet projects, and experimental research in number theory geared towards applications, may fit the bill. After all, some of the people who computed trillions of digits of the number Pi (and analyzed them) did it during their tenure at Google, and in the process contributed to the development of high performance computing. Some of them also contributed to deepen the field of number theory.</p>
<p>In my case, it was never my goal to prove any big conjecture. I stumbled time and over upon them while working on otherwise un-related math projects. It peeked my interest, and over time, I spent a lot of energy trying to understand the depth of these conjectures and why they may be true. And I got more and more interested in trying to pierce their mystery. This is true for the Riemann hypothesis (RH), a tantalizing conjecture with many implications if true, and relatively easy to understand. Even quantum physicists have worked on it, and obtained promising results. I know I will never prove RH, but if I can find a new direction to prove it, that is all I am asking for. Then I will work with mathematicians who know much more than I do, if my scenario for a proof is worth exploring, and enroll them to work on my foundations (likely to involve brand new math). The hope is that they can finish a work that I started myself, but that I can not complete due to my somewhat limited mathematical knowledge.</p>
<p>In the end, many top mathematicians made stellar discoveries in their thirties, out-performing their peers that were 30 years older despite the fact that their knowledge was limited because of their young age. This is another example that if you know too much, it might not necessarily help you.</p>
<p>Note that to get a job, "the less you know, the better" does not work, as employers expect you to know everything that is needed to work properly in their company. You can and should continue to learn a lot on the job, but you must master the basics just to be offered a job, and to be able to keep it. </p>
<p><strong>What I learned from working on these math projects: the benefits</strong></p>
<p>To begin with, not being affiliated with a professional research lab or the academia has some benefits: you don't have to publish, you choose your research project yourself, you work at your own pace (it better be much faster than in the academia), you don't have to face politics, and you don't have to teach. Yet you have access to similar resources (computing power, literature, and so on). You can even teach if you want to; in my case I don't really teach, but I write a lot of tutorials to get more people interested in the subject, and I will probably self-publish books in the future, which could become a source of revenue. My math questions on MathOverflow get a lot of criticism and some great answers too, which serves as peer-review, and readers even point me to some literature that I should read, as well as new, state-of-the-art yet unpublished research results. On occasions, I correspond with well known university professors, which further helps me not going in the wrong direction. </p>
<p>The top benefits I've found working on these problems is the incredible opportunities it offers to hone your machine learning skills. The biggest data sets I ever worked on come from these math projects. It allows you to test and benchmark various statistical models, discover new probability distributions with applications to real-world problems (see <a href="https://www.datasciencecentral.com/profiles/blogs/hurwitz-riemann-zeta-and-other-special-probability-distributions" target="_blank" rel="noopener">this example</a>), new visualizations (see <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">here</a>), develop new statistical tests of randomness and new probabilistic games (see <a href="https://www.datasciencecentral.com/profiles/blogs/data-science-foundations-for-a-new-stock-market" target="_blank" rel="noopener">here</a>), and even discover interesting math theory, sometimes truly original: for instance complex random variables with applications (see <a href="https://www.datasciencecentral.com/profiles/blogs/introduction-to-complex-random-variables-with-applications" target="_blank" rel="noopener">here</a>), lattice points distribution in the infinite-dimensional simplex (yet unpublished), or advanced matrix algebra asymptotics (infinite matrices, yet unpublished, but similar to <a href="https://arxiv.org/abs/1511.08154" target="_blank" rel="noopener">this article</a>) and a new type of Dirichlet functions. Still, 90% of my research never gets published. I only share peer-reviewed, usually new results. The rest goes to garbage, which is always the case when you do research. For those interested, much of what I wrote and that I consider worth sharing, can be found in the math section, <a href="http://datashaping.com/free-articles.html" target="_blank" rel="noopener">here</a>.</p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
<div id="insideblog"><div class="dscAdAppear"></div>
</div>Fun Math Problems for Machine Learning Practitionerstag:www.datasciencecentral.com,2021-05-20:6448529:BlogPost:10510762021-05-20T03:26:33.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8947380099?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8947380099?profile=RESIZE_710x" width="500"></img></a></p>
<p><span>This is part of a series featuring the following aspects of machine learning:</span></p>
<ul>
<li><span>Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science)</span></li>
<li><span>Opinions, for instance about the value of a PhD in our field, or the use of some…</span></li>
</ul>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8947380099?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8947380099?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p><span>This is part of a series featuring the following aspects of machine learning:</span></p>
<ul>
<li><span>Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science)</span></li>
<li><span>Opinions, for instance about the value of a PhD in our field, or the use of some techniques</span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/more-machine-learning-tricks-recipes-and-statistical-models" target="_blank" rel="noopener">Methods, principles, rules of thumb, recipes, tricks</a></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-1" target="_blank" rel="noopener">Business analytics</a> </span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-2" target="_blank" rel="noopener">Core Techniques</a> </span></li>
</ul>
<p><span>This issue focuses on cool math problems that come with data sets, source code, and algorithms. Many have a statistical, probabilistic or experimental flavor, and some are dealing with dynamical systems. They can be used to extend your math knowledge, practice your machine learning skills on original problems, or for curiosity. My articles, posted on Data Science Central, are always written in simple English and accessible to professionals with typically one year of calculus or statistical training, at the undergraduate level. They are geared towards people who use data but are interesting in gaining more practical analytical experience. The style is compact, geared towards people who do not have a lot of free time. </span></p>
<p><span>Despite these restrictions, state-of-the-art, of-the-beaten-path results as well as machine learning trade secrets and research material are frequently shared. References to more advanced literature (from myself and other authors) is provided for those who want to dig deeper in the interested topics discussed. </span></p>
<p><span><strong>1. Fun Math Problems for Machine Learning Practitioners</strong></span></p>
<p><span>These articles focus on techniques that have wide applications or that are otherwise fundamental or seminal in nature.</span></p>
<ol>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/introduction-to-complex-random-variables-with-applications">Fascinating Facts About Complex Random Variables and the Riemann Hypothesis</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/more-beautiful-math-images" target="_blank" rel="noopener">More Surprising Math Images</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/beautiful-mathematical-images" target="_blank" rel="noopener">Beautiful Mathematical Images</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/deep-visualizations-riemann-s-conjecture" target="_blank" rel="noopener">Deep visualizations to Help Solve Riemann's Conjecture</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">Spectacular Visualization: The Eye of the Riemann Zeta Function</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-probabilistic-approach-to-factoring-big-numbers" target="_blank" rel="noopener">New Probabilistic Approach to Factoring Big Numbers</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/simple-trick-to-dramatically-improve-speed-of-convergence" target="_blank" rel="noopener">Simple Trick to Dramatically Improve Speed of Convergence</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/state-of-the-art-statistical-science-to-address-famous-number-the" target="_blank" rel="noopener">State-of-the-Art Statistical Science to Tackle Famous Number Theory Conjectures</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-perspective-on-fermat-s-last-theorem" target="_blank" rel="noopener">New Perspective on Fermat's Last Theorem</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/math-fun-infinite-nested-radicals-of-random-variables" target="_blank" rel="noopener">Fun Math: Infinite Nested Radicals of Random Variables</a> - Connection with Fractals and Brownian Motions</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/surprising-uses-of-synthetic-random-data-sets" target="_blank" rel="noopener">Surprising Uses of Synthetic Random Data Sets</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/two-new-deep-conjectures-in-probabilistic-number-theory" target="_blank" rel="noopener">Two New Deep Conjectures in Probabilistic Number Theory</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/extreme-events-modeling-using-continued-fractions" target="_blank" rel="noopener">Extreme Events Modeling Using Continued Fractions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-strange-family-of-statistical-distributions" target="_blank" rel="noopener">A Strange Family of Statistical Distributions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/some-fun-with-the-golden-ratio-time-series-and-number-theory" target="_blank" rel="noopener">Some Fun with Gentle Chaos, the Golden Ratio, and Stochastic Number Theory</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness" target="_blank" rel="noopener">Fascinating New Results in the Theory of Randomness</a></li>
<li><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/from-infinite-matrices-to-new-integration-formula" target="_blank" rel="noopener">From Infinite Matrices to New Integration Formula</a></li>
</ol>
<p><span><strong>2. Free books</strong></span></p>
<ul>
<li><span><b>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning">here</a>. In about 300 pages and 28 chapters it covers many new topics, offering a fresh perspective on the subject, including rules of thumb and recipes that are easy to automate or integrate in black-box systems, as well as new model-free, data-driven foundations to statistical science and predictive analytics. The approach focuses on robust techniques; it is bottom-up (from applications to theory), in contrast to the traditional top-down approach.</span></p>
<p><span>The material is accessible to practitioners with a one-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications with numerous illustrations, is aimed at practitioners, researchers, and executives in various quantitative fields.</span></p>
</li>
<li><span><b>Applied Stochastic Processes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">here</a>. Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems (104 pages, 16 chapters.) This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject.</span></p>
<p><span>It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</span></p>
</li>
</ul>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He recently opened <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">Paris Restaurant</a>, in Anacortes. You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>Fascinating Facts About Complex Random Variables and the Riemann Hypothesistag:www.datasciencecentral.com,2021-05-09:6448529:BlogPost:10498592021-05-09T17:00:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8907977684?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8907977684?profile=RESIZE_710x" width="500"></img></a></p>
<p style="text-align: center;"><em>Orbit of the Riemann zeta function in the complex plane (see also <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" rel="noopener" target="_blank">here</a>)</em></p>
<p>Despite my long statistical and machine learning career both in academia and in…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8907977684?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8907977684?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p style="text-align: center;"><em>Orbit of the Riemann zeta function in the complex plane (see also <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">here</a>)</em></p>
<p>Despite my long statistical and machine learning career both in academia and in the industry, I never heard of complex random variables until recently, when I stumbled upon them by chance while working on some number theory problem. However, I learned that they are used in several applications, including signal processing, quadrature amplitude modulation, information theory and actuarial sciences. See <a href="https://en.wikipedia.org/wiki/Complex_random_variable" target="_blank" rel="noopener">here</a> and <a href="https://www.casact.org/sites/default/files/database/forum_15fforum_halliwell_complex.pdf" target="_blank" rel="noopener">here</a>. </p>
<p>In this article, I provide a short overview of the topic, with application to understanding why the Riemann hypothesis (arguably the most famous unsolved mathematical conjecture of all times) might be true, using probabilistic arguments. Stat-of-the-art, recent developments about this conjecture are discussed in a way that most machine learning professionals can understand. The style of my presentation is very compact, with numerous references provided as needed. It is my hope that this will broaden the horizon of the reader, offering new modeling tools to her arsenal, and an off-the-beaten-path reading. The level of mathematics is rather simple and you need to know very little (if anything) about complex numbers. After all, these random variables can be understood as bivariate vectors (<em>X</em>, <em>Y</em>) with <em>X</em> representing the real part and <em>Y</em> the imaginary part. They are typically denoted as <em>Z</em> = <em>X</em> + <em>iY</em>, where the complex number <em>i</em> (whose square is equal to -1) is the <a href="https://en.wikipedia.org/wiki/Imaginary_unit" target="_blank" rel="noopener">imaginary unit</a>. There are some subtle differences with bivariate real variables, and the interested reader can find more details <a href="https://en.wikipedia.org/wiki/Complex_random_variable" target="_blank" rel="noopener">here</a>. The complex Gaussian variable (see <a href="https://en.wikipedia.org/wiki/Complex_normal_distribution" target="_blank" rel="noopener">here</a>) is of course the most popular case.</p>
<p><span style="font-size: 14pt;"><strong>1. Illustration with damped complex random walks</strong></span></p>
<p>Let (<em>Z<span style="font-size: 8pt;">k</span></em>) be an infinite sequence of identically and independently distributed random variables, with <em>P</em>(<em>Z<span style="font-size: 8pt;">k</span></em> = 1) = <em>P</em>(<em>Z<span style="font-size: 8pt;">k</span></em> = -1) = 1/2. We define the damped sequence as </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8906629896?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8906629896?profile=RESIZE_710x" width="120" class="align-center"/></a></p>
<p>The originality here is that <em>s</em> = <em>σ</em> + <em>it</em> is a complex number. The above sequence clearly converges if the real part of <em>s</em> (the real number <em>σ</em>) is strictly above 1. The computation of the variance (first for the real part of <em>Z</em>(<em>s</em>), then for the imaginary part, then the full variance) yields:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8906638864?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8906638864?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>Here <span><em>ζ</em> is the <a href="https://en.wikipedia.org/wiki/Riemann_zeta_function" target="_blank" rel="noopener">Riemann zeta function</a>. See also <a href="https://www.datasciencecentral.com/page/search?q=riemann+zeta" target="_blank" rel="noopener">here</a>. So we are dealing with a Riemann-zeta type of distribution; other examples of such distributions are found in one of my previous articles, <a href="https://www.datasciencecentral.com/profiles/blogs/hurwitz-riemann-zeta-and-other-special-probability-distributions" target="_blank" rel="noopener">here</a>. The core result is that the damped sequence not only converges if <em>σ</em> > 1 as announced earlier, but even if <em>σ</em> > 1/2 when you look at the variance: <em>σ</em> > 1/2 keeps the variance of the infinite sum <em>Z</em>(<em>s</em>), finite. This result, due to the fact that we are manipulating complex rather than real numbers, will be of crucial importance in the next section, focusing on an application. </span></p>
<p><span>It is possible to plot the distribution of <em>Z</em>(<em>s</em>) depending on the complex parameter <em>s</em> (or equivalently, depending on two real parameters <em>σ</em> and <em>t</em>), using simulations. You can also compute its distribution numerically, using the inverse Fourier transform of its characteristic function. The characteristic function computed for <em>τ</em> being a real number, is given by the following surprising product:</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8906825294?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8906825294?profile=RESIZE_710x" width="250" class="align-center"/></a></span></p>
<p><strong>1.1. Smoothed random walks and distribution of runs</strong></p>
<p><span>This sub-section is useful for the application discussed in section 2, and also for its own sake. If you don't have much time, you can skip it, and come back to it later.</span></p>
<p><span>The sum of the first <em>n</em> terms of the series defining <em>Z</em>(<em>s</em>) represents a random walk (assuming <em>n</em> represents the time), with zero mean and variance equal to <em>n</em> (thus growing indefinitely with <em>n</em>) if <em>s</em> = 0; it can take on positive or negative values, and can stay positive (or negative) for a very long time, though it will eventually oscillate infinitely many times between positive and negative values (see <a href="https://mathworld.wolfram.com/PolyasRandomWalkConstants.html" target="_blank" rel="noopener">here</a>) if <em>s</em> = 0. The case s = 0 corresponds to the classic random walk. We define the smoothed version <em>Z*</em>(<em>s</em>) as follows:</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8906746501?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8906746501?profile=RESIZE_710x" width="300" class="align-center"/></a></span></p>
<p><span>A <em>run</em> of length <em>m</em> is defined as a maximum subsequence <em>Z<span style="font-size: 8pt;">k</span></em><span style="font-size: 8pt;">+1</span>, ..., <em>Z<span style="font-size: 8pt;">k</span></em><span style="font-size: 8pt;">+<em>m</em></span> all having the same sign: that is, <em>m</em> consecutive values all equal to +1, or all equal to -1. The probability for a run to be of length <em>m</em> > 0, in the original sequence (<em>Z<span style="font-size: 8pt;">k</span></em>), is equal to 1 / 2^<em>m</em>. Here 2^<em>m</em> means 2 at power <em>m</em>. In the smoothed sequence (<em>Z*<span style="font-size: 8pt;">k</span></em>), after removing the zeroes, that probability is now 2 / 3^<em>m</em>. While by construction the <em>Z<span style="font-size: 8pt;">k</span></em>'s are independent, note that the <em>Z*k</em>'s are not independent anymore. After removing all the zeroes (representing 50% of the <em>Z*<span style="font-size: 8pt;">k</span></em>'s), the runs in the sequence (<em>Z*<span style="font-size: 8pt;">k</span></em>) tend to be much shorter than those in (<em>Z<span style="font-size: 8pt;">k</span></em>). This implies that the associated random walk (now actually less random) based on the <em>Z*<span style="font-size: 8pt;">k</span></em>'s is better controlled, and can't go up and up (or down and down) for so long, unlike in the original random walk based on the <em>Z<span style="font-size: 8pt;">k</span></em>'s. A classic result, known as the <a href="https://en.wikipedia.org/wiki/Law_of_the_iterated_logarithm" target="_blank" rel="noopener">law of the iterated logarithm</a>, states that</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8906801290?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8906801290?profile=RESIZE_710x" width="200" class="align-center"/></a></span></p>
<p><span>almost surely (that is, with probability 1). The definition of "lim sup" can be found <a href="https://en.wikipedia.org/wiki/Limit_inferior_and_limit_superior" target="_blank" rel="noopener">here</a>. Of course, this is no longer true for the sequence (<em>Z*<span style="font-size: 8pt;">k</span></em>) even after removing the zeroes.</span></p>
<p><span style="font-size: 14pt;"><strong>2. Application: heuristic proof of the Riemann hypothesis</strong></span></p>
<p><span>The Riemann hypothesis, one of the most famous unsolved mathematical problems, is discussed <a href="https://en.wikipedia.org/wiki/Riemann_hypothesis" target="_blank" rel="noopener">here</a>, and in the DSC article entitled <a href="https://www.datasciencecentral.com/profiles/blogs/will-bigdata-solve-the-riemann-hypothesis" target="_blank" rel="noopener">Will big data solved the Riemann hypothesis</a>. We approach this problem using a function <em>L</em>(<em>s</em>) that behaves (to some extent) like the <em>Z</em>(<em>s</em>) defined in section 1. We start with the following definitions:</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8908523659?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8908523659?profile=RESIZE_710x" width="450" class="align-center"/></a></span></p>
<p>where</p>
<ul>
<li><span><em>Ω</em>(<em>k</em>) is the <a href="https://en.wikipedia.org/wiki/Prime_omega_function" target="_blank" rel="noopener">prime omega function</a>, counting the number of primes (including multiplicity) dividing <em>k</em>,</span></li>
<li><span><em>λ</em>(<em>k</em>) is the <a href="https://en.wikipedia.org/wiki/Liouville_function" target="_blank" rel="noopener">Liouville function</a></span><span>,</span></li>
<li><span><em>p</em><span style="font-size: 8pt;">1</span>, <em>p</em><span style="font-size: 8pt;">2</span>, and so on (with <em>p</em><span style="font-size: 8pt;">1</span> = 2) are the prime numbers.</span></li>
</ul>
<p><span>Note that <em>L</em>(<em>s</em>, 1) = <em>ζ</em>(<em>s</em>) is the Riemann zeta function, and <em>L</em>(<em>s</em>) = <em>ζ</em>(2<em>s</em>) / <em>ζ</em>(<em>s</em>). Again, <em>s</em> = <em>σ</em> + <em>it</em> is a complex number. We also define <em>L<span style="font-size: 8pt;">n</span></em> = <em>L<span style="font-size: 8pt;">n</span></em>(0) and <em>ρ</em> = <em>L</em>(0, 1/2). We have <em>L</em>(1) = 0. The series for <em>L</em>(<em>s</em>) converges for sure if <em>σ</em> > 1.</span></p>
<p><strong>2.1. How to prove the Riemann hypothesis?</strong></p>
<p><span>Any of the following conjectures, if proven, would make the Riemann hypothesis true:</span></p>
<ul>
<li><span>The series for <em>L</em>(<em>s</em>) also converges if <em>σ</em> > 1/2: this is what we investigate in section 2.2. If it were to converge only if <em>σ</em> is larger than (say) <em>σ</em><span style="font-size: 8pt;">0</span> = 0.65, it would imply that the Riemann Hypothesis (RH) is not true in the critical strip 1/2 < <em>σ</em> < 1, but only in <em>σ</em><span style="font-size: 8pt;">0</span> < <em>σ </em> < 1. It would still be a major victory, allowing us to get much more precise estimates about the distribution of prime numbers, than currently known today. RH is equivalent to the fact that <em>ζ</em>(<em>s</em>) has no zero if 1/2 < <em>σ</em> < 1.</span></li>
<li><span>The number <em>ρ</em> is a <a href="https://en.wikipedia.org/wiki/Normal_number" target="_blank" rel="noopener">normal number</a> in base 2 (this would prove the much stronger Chowla conjecture, see <a href="https://mathoverflow.net/questions/391736/normal-numbers-liouville-function-and-the-riemann-hypothesis" target="_blank" rel="noopener">here</a>)</span></li>
<li><span>The sequence (<em>λ</em>(<em>k</em>)) is ergodic (this would also prove the much stronger Chowla conjecture, see <a href="https://arxiv.org/abs/1611.09338" target="_blank" rel="noopener">here</a>)</span></li>
<li><span>The sequence <em>x</em>(<em>n</em>+1) = 2<em>x</em>(<em>n</em>) - INT(2<em>x</em>(<em>n</em>)), with <em>x</em>(0) = (1 + <em>ρ</em>) / 2, is ergodic. This is equivalent to the previous statement. Here INT stands for the integer part function, and the <em>x</em>(<em>n</em>)'s are iterates of the <a href="https://en.wikipedia.org/wiki/Dyadic_transformation" target="_blank" rel="noopener">Bernoulli map</a>, one of the simple chaotic discrete dynamical systems (see Update 2 <a href="https://mathoverflow.net/questions/391736/normal-numbers-liouville-function-and-the-riemann-hypothesis" target="_blank" rel="noopener">in this post</a>) with its main invariant distribution being uniform on [0, 1]</span></li>
<li><span>The function 1 / <em>L</em>(<em>s</em>) = <em>ζ</em>(<em>s</em>) / <em>ζ</em>(2<em>s</em>) has no zero if 1/2 < <em>σ </em> < 1</span></li>
<li><span>The numbers <em>λ</em>(<em>k</em>)'s behave in a way that is random enough, so that for any <em>ε</em> > 0, we have: (see <a href="https://mathoverflow.net/questions/391736/normal-numbers-liouville-function-and-the-riemann-hypothesis" target="_blank" rel="noopener">here</a>)</span><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8906956661?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8906956661?profile=RESIZE_710x" width="250" class="align-center"/></a></span></li>
</ul>
<p>Note that the last statement is weaker than the law of the iterated logarithm mentioned in section 1.1. The coefficient <em>λ</em>(<em>k</em>) plays the same role as <em>Z<span style="font-size: 8pt;">k</span></em> in section 1, however because <em>λ</em>(<i>mn</i>) = <em>λ</em>(<i>m</i>)<em>λ</em>(<i>n</i>), they can't be independent, not even <a href="https://projecteuclid.org/download/pdf_1/euclid.lnms/1215465639" target="_blank" rel="noopener">asymptotically independent</a>, unlike the <em>Z<span style="font-size: 8pt;">k</span></em>'s. Clearly, the sequence (<em>λ</em>(<em>k</em>)) has weak dependencies. That in itself does not prevent the law of the iterated logarithm from applying (see examples <a href="https://projecteuclid.org/journals/annals-of-probability/volume-5/issue-3/A-Functional-Law-of-the-Iterated-Logarithm-for-Empirical-Distribution/10.1214/aop/1176995795.full" target="_blank" rel="noopener">here</a>) nor does it prevent <em>ρ</em> from being a normal number (see <a href="https://arxiv.org/abs/1804.02844" target="_blank" rel="noopener">here</a> why). But it is conjectured that the law of the iterated logarithm does not apply to the sequence (<em>λ</em>(<em>k</em>)), due to another conjecture by <span>Gonek (see <a href="https://arxiv.org/abs/math/0310381" target="_blank" rel="noopener">here</a>).</span></p>
<p><strong>2.2. Probabilistic arguments in favor of the Riemann hypothesis</strong></p>
<p><span>The deterministic </span>sequence (<span><em>λ</em>(<em>k</em>)), consisting of +1 and -1 in a ratio 50/50, appears to behave rather randomly (if you look at its limiting empirical distribution), just like the sequence (<em>Z<span style="font-size: 8pt;">k</span></em>) in section 1 behaves perfectly randomly. Thus, one might think that the series defining <em>L</em>(<em>s</em>) would also converge for <em>σ </em> > 1/2, not just for <em>σ </em> > 1. Why this could be true is because the same thing happens to <em>Z</em>(<em>s</em>) in section 1, for the same reason. And if it is true, then the Riemann hypothesis is true, because of the first statement in the bullet list in section 2.1. Remember, <em>s</em> = <em>σ </em>+ <em>it</em>, or in other words, <em>σ </em>is the real part of the complex number <em>s</em>. </span></p>
<p><span>However, there is a big caveat, that maybe could be addressed to make the arguments more convincing. This is the purpose of this section. As noted at the bottom of section 2.1, the sequence (<em>λ</em>(<em>k</em>)), even though it passes all the randomness tests that I have tried, is much less random than it appears to be. It is obvious that it has weak dependencies since the function <em>λ</em> is multiplicative: <em>λ</em>(<i>mn</i>) = <em>λ</em>(<i>m</i>)<em>λ</em>(<i>n</i>). This is related to the fact that prime numbers are not perfectly randomly distributed. Another disturbing fact is that <em>L<span style="font-size: 8pt;">n</span></em>, the equivalent of the random walk defined in section 1, seems biased towards negative values. For instance, (except for <em>n</em> = 1), it is negative up to <em>n</em> = 906,150,257, a fact proved in 1980, and thus disproving Polya's conjecture (see <a href="https://en.wikipedia.org/wiki/P%C3%B3lya_conjecture" target="_blank" rel="noopener">here</a>). One way to address this is to work with Rademacher multiplicative random functions instead of (<em>Z<span style="font-size: 8pt;">k</span></em>) in section 1: see <a href="https://londmathsoc.onlinelibrary.wiley.com/doi/full/10.1112/jlms.12421" target="_blank" rel="noopener">here</a> for an example that would make the last item in the bullet list in section 2.1, be true. Or see <a href="https://www.ams.org/journals/proc/2013-141-02/S0002-9939-2012-11332-2/" target="_blank" rel="noopener">here</a> for an example that preserves the law of the iterated logarithm (which itself would also imply the Riemann Hypothesis). </span></p>
<p>Finally, working with a smoothed version of <em>L</em>(<em>s</em>) or <em>L<span style="font-size: 8pt;">n</span></em> using the smoothing technique described in section 1.1, may lead to results easier to obtain, with a possibility that it would bring new insights for the original series <em>L</em>(<em>s</em>). The smoothed version <em>L</em>*(<em>s</em>) is defined, using the same technique as in section 1.1, as</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8908556871?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8908556871?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>The function <em>η</em>(<em>s</em>) is the <a href="https://en.wikipedia.org/wiki/Dirichlet_eta_function" target="_blank" rel="noopener">Dirichlet eta function</a>, and <em>L</em>*(<em>s</em>) can be computed in Mathematica using (DirichletEta[s] + Zeta[2s] / Zeta[s]) / 2. Mathematica uses the <a href="https://en.wikipedia.org/wiki/Analytic_continuation" target="_blank" rel="noopener">analytic continuation</a> of the <em>ζ</em> function if <em>σ</em> < 1. For instance, see computation of <em>L</em>*(0.7) = -0.237771..., <a href="https://www.wolframalpha.com/input/?i=%3D%28DirichletEta%5B0.7%5D%2BZeta%5B1.4%5D%2FZeta%5B0.7%5D%29%2F2" target="_blank" rel="noopener">here</a>. A table of the first million Liouville numbers <em>λ</em>(<i>k</i>) can be produced in Mathematica in just a few seconds, using the command Table[LiouvilleLambda[n], {n, 1, 1000000}]. For convenience, I stored them in a text file, <a href="http://www.datashaping.com/Liouville4b.txt" target="_blank" rel="noopener">here</a>. It would be interesting to see how good (or bad) they are at producing a pseudorandom number generator.</p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He recently opened <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">Paris Restaurant</a>, in Anacortes. You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>What I Learned From 25 Years of Machine Learningtag:www.datasciencecentral.com,2021-05-04:6448529:BlogPost:10490942021-05-04T06:00:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8890975463?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8890975463?profile=RESIZE_710x" width="600"></img></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.zeolearn.com/magazine/what-is-machine-learning" rel="noopener" target="_blank">here</a></em></p>
<p>Here is what I learned from practicing machine learning in business settings for over two decades, and prior to that in the academia. Back in the nineties, it was known as computational…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8890975463?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8890975463?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.zeolearn.com/magazine/what-is-machine-learning" target="_blank" rel="noopener">here</a></em></p>
<p>Here is what I learned from practicing machine learning in business settings for over two decades, and prior to that in the academia. Back in the nineties, it was known as computational statistics in some circles, and some problems such as image analysis were already popular. Of course a lot of progress has been made since, thanks in part to the power of modern computers, the cloud, and large data sets now being ubiquitous. The trend has evolved towards more robust and model-free, data-driven techniques, sometimes designed as black boxes: for instance, deep neural networks. Text analysis (NLP) has also seen substantial progress. I hope that the advice I provide below, will be helpful in your data science job. </p>
<p><span style="font-size: 14pt;"><strong>11 pieces of advice</strong></span></p>
<ul>
<li>The biggest achievement in my career was to automate most of the data cleaning / data massaging / outlier detection and exploratory analysis, allowing me to focus on tasks that truly justified my salary. I had to write of few re-usable scripts to take care of that, but it was well worth the effort. </li>
<li>Be friend with the IT department. In one company, much of my job consisted in producing and blending various reports for decision makers. I got it all automated (which required direct access via Perl code to sensitive databases) and I even told my boss about it. He said that I did not work a lot (compared to hard-workers) but understood and was happy to always receive the reports on time automatically delivered to his mailbox, even when I was in vacation.</li>
<li>Leverage API's. In one company, a big project consisted of creating and maintaining a list of the top 95% keywords searched for on the web, and attach a value / yield to each of them. The list had about one million keywords. I started by querying internal databases, then scraping the web, and develop yield models. There was a lot of NLP involved. Until I found out that I could get all that information from Google and Microsoft by accessing their API's. It was not free, but not expensive either, and initially I used my own credit card to pay for the services, which saved me a lot of time. Eventually my boss adopted my idea, and the company reimbursed me for these paid API calls. They continued to use them, under my own personal accounts, long after I was gone. </li>
<li>Document your code, your models, every core tasks you do, with enough details, and in such a way that other people understand your documentation. Without it, you might not even remember what a piece of your own code is doing 3 years down the road, and have to re-write it from scratch. Use simple English as much as possible. It is also good practice, as it will help you train your replacement when you leave.</li>
<li>When blending data from different sources, adjust the metrics accordingly, for each data source; metrics are likely to not be fully compatible or some of them missing, as things are probably measured in different ways depending on the source. Even over time, the same metric in the same database can evolve to the point of not being compatible anymore with historical data. I actually have a patent that addresses this issue.</li>
<li>Be wary of job interviews for a supposedly wonderful data science job requiring a lot of creativity. I was misled quite a few times, the job eventually turned out to be a coding job. It can be a dead-end, boring job. I like doing the job of a software engineer, but only as long as it helps me automate and optimize my tasks.</li>
<li>Working remotely can have many rewards, especially financial ones. Sometimes it also means less time spent in corporate meetings. I had to travel every single week between Seattle and San Francisco, for years. I did not like it, but I saved a lot of money (not the least because there is no employment tax in Washington state, and real estate is much less expensive). Also, walking from your hotel to your workplace is less painful than commuting, and it saves a lot of time. Nowadays telecommute makes it even easier. </li>
<li>Embrace simple models. Use synthetic or simulated data to test them. For instance, I implemented various statistical tests, and used artificial data (many times from number theory experiments) to fine-tune and assess the validity of my tests / models on datasets for which the exact answer is known. It was a win-win: working on a topic I love (experimental and probabilistic number theory) and at the same time producing good models and algorithms with applications to real business processes.</li>
<li>Being a generalist rather than a specialist offers more career opportunities, within your company (horizontal move) or anywhere. You still need to be an expert in at least one or two areas. As a generalist, it will be easier for you to become a consultant or start your own company, should you decide to go that route. Also, it may help you understand the real problems that decision makers are facing in your company, and have a better, closer relationship with them. Or with any department (sales, finance, marketing, IT).</li>
<li>In data we trust. I disagree with that statement. I remember a job at Wells Fargo where I was analyzing user sessions of corporate clients doing online transactions. The sessions were extremely short. I decided to have my boss do a simulated session with multiple transactions, and analyze it the next day. It turned out that the session was broken down into multiple sessions, as the tracking services (powered by Tealeaf back then) started a new session anytime an HTTP request (by the same user) came from a different server (that is, pretty much for every user request). The Tealeaf issue was fixed when notified by Wells Fargo, and I am sure this was my most valuable contribution at the bank. In a different company, reports from a third party were totally erroneous, missing most page views in their count: it turned out that their software was cutting every URL that contained a comma: a glitch caused by bad programming by some software engineer at that third party company, combined with the fact that 95% of our URL's contained commas. If you miss those massive glitches (even though in some ways it is not your job to detect them), your analyses will be totally worthless. One way to detect these glitches is to rely on more than just one single data source.</li>
<li>Get very precise definitions of the metrics you are dealing with. The fact that there are so many fake news nowadays is probably because the concept of fake news has never been properly defined, rather than a data / modeling issue.</li>
</ul>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He recently opened <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">Paris Restaurant</a>, in Anacortes. You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
<p></p>More Machine Learning Tricks, Recipes, and Statistical Modelstag:www.datasciencecentral.com,2021-04-30:6448529:BlogPost:10491602021-04-30T03:57:32.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8873246459?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8873246459?profile=RESIZE_710x" width="500"></img></a></span></p>
<p style="text-align: center;"><em>Source for picture: <a href="https://www.forbes.com/sites/kalevleetaru/2019/01/15/why-machine-learning-needs-semantics-not-just-statistics" rel="noopener" target="_blank">here</a></em></p>
<p><span>The first part of this list was published…</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8873246459?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8873246459?profile=RESIZE_710x" width="500" class="align-center"/></a></span></p>
<p style="text-align: center;"><em>Source for picture: <a href="https://www.forbes.com/sites/kalevleetaru/2019/01/15/why-machine-learning-needs-semantics-not-just-statistics" target="_blank" rel="noopener">here</a></em></p>
<p><span>The first part of this list was published <a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-tricks-recipes-and-statistical-mod" target="_blank" rel="noopener">here</a>. These are articles that I wrote in the last few years. The whole series will feature articles related to the following aspects of machine learning:</span></p>
<ul>
<li><span>Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science)</span></li>
<li><span>Opinions, for instance about the value of a PhD in our field, or the use of some techniques</span></li>
<li><span>Methods, principles, rules of thumb, recipes, tricks</span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-1" target="_blank" rel="noopener">Business analytics</a> </span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-2" target="_blank" rel="noopener">Core Techniques</a> </span></li>
</ul>
<p><span>My articles are always written in simple English and accessible to professionals with typically one year of calculus or statistical training, at the undergraduate level. They are geared towards people who use data but are interesting in gaining more practical analytical experience. Managers and decision makers are part of my intended audience. The style is compact, geared towards people who do not have a lot of free time. </span></p>
<p><span>Despite these restrictions, state-of-the-art, of-the-beaten-path results as well as machine learning trade secrets and research material are frequently shared. References to more advanced literature (from myself and other authors) is provided for those who want to dig deeper in the interested topics discussed. </span></p>
<p><span><strong>1. Machine Learning Tricks, Recipes and Statistical Models</strong></span></p>
<p><span>These articles focus on techniques that have wide applications or that are otherwise fundamental or seminal in nature.</span></p>
<ol>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/one-trillion-random-digits" target="_blank" rel="noopener">One Trillion Random Digits</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-perspective-on-central-limit-theorem-and-related-stats-topics" target="_blank" rel="noopener">New Perspective on the Central Limit Theorem and Statistical Testing</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/feature-selection-a-simple-solution?xg_source=activity" target="_blank" rel="noopener">Simple Solution to Feature Selection Problems</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/scale-invariant-clustering-and-regression" target="_blank" rel="noopener">Scale-Invariant Clustering and Regression</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/deep-dive-into-polynomial-regression-and-overfitting" target="_blank" rel="noopener">Deep Dive into Polynomial Regression and Overfitting</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/stochastic-processes-new-tests-for-randomness-application-to-numb" target="_blank" rel="noopener">Stochastic Processes and New Tests of Randomness</a> - Application to Cool Number Theory Problem</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-simple-introduction-to-complex-stochastic-processes-part-2" target="_blank" rel="noopener">A Simple Introduction to Complex Stochastic Processes - Part 2</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-simple-introduction-to-complex-stochastic-processes" target="_blank" rel="noopener">A Simple Introduction to Complex Stochastic Processes</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/high-precision-computing-benchmark-examples-and-tutorial" target="_blank" rel="noopener">High Precision Computing: Benchmark, Examples, and Tutorial</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/logistic-map-chaos-randomness-and-quantum-algorithms" target="_blank" rel="noopener">Logistic Map, Chaos, Randomness and Quantum Algorithms</a></li>
<li><a href="https://www.bigdatanews.datasciencecentral.com/profiles/blogs/graph-theory-six-degrees-of-separation-problem" target="_blank" rel="noopener">Graph Theory: Six Degrees of Separation Problem</a></li>
<li><a href="http://www.analyticbridge.datasciencecentral.com/profiles/blogs/interesting-probability-problem-for-serious-geeks" target="_blank" rel="noopener">Interesting Problem for Serious Geeks: Self-correcting Random Walks</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/9-off-th-beaten-path-statistical-science-topics" target="_blank" rel="noopener">9 Off-the-beaten-path Statistical Science Topics with Interesting Applications</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/data-science-method-to-discover-large-prime-numbers" target="_blank" rel="noopener">Data Science Method to Discover Large Prime Numbers</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/nice-generalization-of-the-k-nn-clustering-algorithm" target="_blank" rel="noopener">Nice Generalization of the K-NN Clustering Algorithm</a> - Also Useful for Data Reduction</li>
<li><a href="http://www.analyticbridge.datasciencecentral.com/profiles/blogs/mysterious-sequences-that-look-random-with-surprising-properties" target="_blank" rel="noopener">How to Detect if Numbers are Random or Not</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/how-and-why-decorrelate-time-series" target="_blank" rel="noopener">How and Why: Decorrelate Time Series</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/distribution-of-arrival-times-of-extreme-events" target="_blank" rel="noopener">Distribution of Arrival Times of Extreme Events</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/why-zipf-s-law-explains-so-many-big-data-and-physics-phenomenons" target="_blank" rel="noopener">Why Zipf's law explains so many big data and physics phenomenons</a></li>
</ol>
<p><span><strong>2. Free books</strong></span></p>
<ul>
<li><span><b>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning">here</a>. In about 300 pages and 28 chapters it covers many new topics, offering a fresh perspective on the subject, including rules of thumb and recipes that are easy to automate or integrate in black-box systems, as well as new model-free, data-driven foundations to statistical science and predictive analytics. The approach focuses on robust techniques; it is bottom-up (from applications to theory), in contrast to the traditional top-down approach.</span></p>
<p><span>The material is accessible to practitioners with a one-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications with numerous illustrations, is aimed at practitioners, researchers, and executives in various quantitative fields.</span></p>
</li>
<li><span><b>Applied Stochastic Processes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">here</a>. Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems (104 pages, 16 chapters.) This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject.</span></p>
<p><span>It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</span></p>
</li>
</ul>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He recently opened <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">Paris Restaurant</a>, in Anacortes. You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>Unusual Opportunities for AI, Machine Learning, and Data Scientiststag:www.datasciencecentral.com,2021-04-20:6448529:BlogPost:10480192021-04-20T01:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8812460479?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8812460479?profile=RESIZE_710x" width="500"></img></a></p>
<p>Here some off-the-beaten-path options to consider, when looking for a first job, a new job or extra income by leveraging your machine learning experience. Many were offers that came to my mailbox at some point in the last 10 years, mostly from people looking at my LinkedIn profile. Thus the importance of growing your network and visibility, write…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8812460479?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8812460479?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>Here some off-the-beaten-path options to consider, when looking for a first job, a new job or extra income by leveraging your machine learning experience. Many were offers that came to my mailbox at some point in the last 10 years, mostly from people looking at my LinkedIn profile. Thus the importance of growing your network and visibility, write blogs, and show to the world some of your portfolio and accomplishments (code that you posted on GitHub, etc.) If you do it right, after a while, you will never have to apply for a job ever again: hiring managers and other opportunities will come to you, rather than the other way around.</p>
<p><span style="font-size: 14pt;"><strong>1. For beginners</strong></span></p>
<p>Participating in Kaggle and other competitions. Being a teacher for one of the many online teaching companies or data camps, such as Coursera. Writing, self-publishing, and selling your own books: an example is Jason Brownlee (see <a href="https://machinelearningmastery.com/" target="_blank" rel="noopener">here</a>) who found his niche by selling tutorials explaining data science in simple words, to software engineers. I am moving in the same direction as well, see <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>. Another option is to develop an API, for instance to offer trading signals (buy / sell) to investors, who pay a fee to subscribe to your service - one thing I did in the past and it earned me a little bit of income, more than I had expected. I also created a website where recruiters can post data science job ads for a fee: it still exists (see <a href="https://www.analytictalent.com/" target="_blank" rel="noopener">here</a>) thought it was acquired; you need to aggregate jobs from multiple websites, build a large mailing list of data scientists, and charge a fee only for <em>featured jobs</em>. Many of these ideas require that you promote your services for free, using social media: this is the hard part. A starting point is to create and grow your own groups on social networks. All this can be done while having a full-time job at the same time. </p>
<p>You can also become a contributor/writer for various news outlets, though initially you may have to do it for free. But as you gain experience and notoriety, it can become a full time, lucrative job. And finally, raising money with a partner to start your own company. </p>
<p><span style="font-size: 14pt;"><strong>2. For mid-career and seasoned professionals</strong></span></p>
<p>You can offer consulting services, especially to your former employers to begin with. Here are some unusual opportunities I was offered. I did not accept all of them, but I was still able to maintain a full time job while getting decent side income.</p>
<ul>
<li>Expert witness - get paid by big law firms to show up in court and help them win big money for their clients (and for themselves, and you along the way.) Or you can work for a company specializing in statistical litigation, such as <a href="https://www.wecker.com/" target="_blank" rel="noopener">this one</a>.</li>
<li>Become a part-time, independent recruiter. Some machine learning recruiters are former machine learning experts. You can still keep your full-time job.</li>
<li>Get involved in patent reviews (pertaining to machine learning problems that you know very well.)</li>
<li>Help Venture Capital companies do their due diligence on startups they could potentially fund, or help them find new startups worthy to invest in. The last VC firm that contacted me offered $1,000 per report, each requiring 2-3 hours of work. </li>
<li>I was once contacted to be the data scientist for an Indian Tribe. Other unusual job offers came from the adult industry (fighting advertising fraud on their websites, they needed an expert) and even working for the casino industry. I eventually created my own very unique lottery system, see <a href="https://www.datasciencecentral.com/profiles/blogs/data-science-foundations-for-a-new-stock-market" target="_blank" rel="noopener">here</a>. I plan to either sell the intellectual property or work with some existing lottery companies (governments or casinos) to make it happen and monetize it. If you own some IP (intellectual property) think about monetizing it if you can. </li>
</ul>
<p>There are of course plenty of other opportunities, such as working for a consulting firm or governments to uncover tax fraudsters via data mining techniques, just to give an example. Another idea is to obtain a realtor certification if you own properties, to save a lot of money by selling yourself without using a third party. And use your analytic acumen to buy cheap and sell high at the right times. And working from home in (say) Nevada, for an employer in the Bay Area, can also save you a lot of money. </p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>Simple Machine Learning Approach to Testing for Independencetag:www.datasciencecentral.com,2021-04-08:6448529:BlogPost:10466222021-04-08T06:00:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8771488658?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8771488658?profile=RESIZE_710x" width="500"></img></a></p>
<p>We describe here a methodology that applies to any statistical test, and illustrated in the context of assessing independence between successive observations in a data set. After reviewing a few standard approaches, we discuss our methodology, its benefits, and drawbacks. The data used here for illustration purposes, has known theoretical…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8771488658?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8771488658?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>We describe here a methodology that applies to any statistical test, and illustrated in the context of assessing independence between successive observations in a data set. After reviewing a few standard approaches, we discuss our methodology, its benefits, and drawbacks. The data used here for illustration purposes, has known theoretical auto-correlations. Thus it can be used to benchmark various statistical tests. Our methodology also applies to data with high volatility, in particular, to time series models with undefined autocorrelations. Such models (see for instance Figure 1 <a href="https://www.datasciencecentral.com/profiles/blogs/defining-and-measuring-chaos-in-data-sets-why-and-how-in-simple-w" target="_blank" rel="noopener">in this article</a>) are usually ignored by practitioners, despite their interesting properties.</p>
<p>Independence is a stronger concept than all autocorrelations being equal to zero. In particular, some functional non-linear relationships between successive data points may result in zero autocorrelation even though the observations exhibit strong auto-dependencies: a classic example is points randomly located on a circle centered at the origin; the correlation between the <em>X</em> and <em>Y</em> variables may be zero, but of course <em>X</em> and <em>Y</em> are not independent.</p>
<p><span style="font-size: 14pt;"><strong>1. Testing for independence: classic methods</strong></span></p>
<p>The most well known test is the Chi-Square test, see <a href="http://mlwiki.org/index.php/Chi-Squared_Test_of_Independence" target="_blank" rel="noopener">here</a>. It is used to test independence in contingency tables or between two time series. In the latter case, it requires binning the data, and works only if each bin has enough observations, usually more than 5. Its exact statistic under the assumption of independence has a known distribution: Chi-Squared, itself well approximated by a normal distribution for moderately sized data sets, see <a href="https://en.wikipedia.org/wiki/Chi-square_distribution#Asymptotic_properties" target="_blank" rel="noopener">here</a>. </p>
<p>Another test is based on the Kolmogorov-Smirnov statistics. It is typically used to measure <a href="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test" target="_blank" rel="noopener">goodness of fit</a>, but can be adapted to assess independence between two variables (or columns, in a data set). See <a href="https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-8/issue-2/A-Kolmogorov-Smirnov-type-test-for-independence-between-marks-and/10.1214/14-EJS961.full" target="_blank" rel="noopener">here</a>. Convergence to the exact distribution is slow. Our test described in section 2 is somewhat similar, but it is entirely data-driven, model free: our confidence intervals are based on re-sampling techniques, not on tabulated values of known statistical distributions. Our test was first discussed in section 2.3 of a previous article entitled <em>New Tests of Randomness and Independence for Sequences of Observations</em>, and available <a href="https://www.datasciencecentral.com/profiles/blogs/a-new-test-of-independence" target="_blank" rel="noopener">here</a>. In section 2 of this article, a better and simplified version is presented, suitable for big data. In addition, we discuss how to build confidence intervals, in a simple way that will appeal to machine learning professionals.</p>
<p>Finally, rather than testing for independence in successive observations (say, a time series) one can look at the square of the observed autocorrelations of lag-1, lag-2 and so on, up to lag-<em>k</em> (say <em>k</em> = 10). The absence of autocorrelations does not imply independence, but this test is easier to perform than a full independence test. The Ljung-Box and the Box-Pierce tests are the most popular ones used in this context, with Ljung-Box converging faster to the limiting (asymptotic) Chi-Squared distribution of the test statistic, as the sample size increases. See <a href="https://en.wikipedia.org/wiki/Ljung%E2%80%93Box_test" target="_blank" rel="noopener">here</a>.</p>
<p><span style="font-size: 14pt;"><strong>2. Our Test</strong></span></p>
<p>The data consists of a time series <em>x</em><span style="font-size: 8pt;">1</span>, <em>x</em><span style="font-size: 8pt;">2, ...<span style="font-size: 10pt;">, <em>x</em><span style="font-size: 8pt;"><em>n</em></span></span></span>. We want to test whether successive observations are independent or not, that is, whether <em>x</em><span style="font-size: 8pt;">1</span>, <em>x</em><span style="font-size: 8pt;">2</span>, ..., x<span style="font-size: 8pt;"><em>n</em>-1</span> and <em>x</em><span style="font-size: 8pt;">2</span>, <em>x</em><span style="font-size: 8pt;">3</span>, ..., x<span style="font-size: 8pt;"><em>n</em></span> are independent or not. It can be generalized to a broader test of independence (see section 2.3 <a href="https://www.datasciencecentral.com/profiles/blogs/a-new-test-of-independence" target="_blank" rel="noopener">here</a>) or to bivariate observations: <em>x</em><span style="font-size: 8pt;">1</span>, <em>x</em><span style="font-size: 8pt;">2</span>, ..., <em>x<span style="font-size: 8pt;">n</span></em> versus <em>y</em><span style="font-size: 8pt;">1</span>, <em>y</em><span style="font-size: 8pt;">2</span>, ..., <em>y</em><span style="font-size: 8pt;"><em>n</em></span>. For the sake of simplicity, we assume that the observations are in [0, 1].</p>
<p><strong>2.1. Step #1: Computing some probabilities</strong></p>
<p>The first step to perform the test, consists in computing the following statistics:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8779418488?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8779418488?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>for <em>N</em> vectors (<em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em><span>, </span><span lang="el" title="Greek-language text" xml:lang="el"><em>β</em>)<em>'s,</em></span> where <em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em><span>, </span><span lang="el" title="Greek-language text" xml:lang="el"><em>β</em> </span>are randomly sampled or equally spaced values in [0, 1], and <em>χ</em> is the indicator function: <em>χ</em>(<em>A</em>) = 1 if <em>A</em> is true, otherwise <em>χ</em>(<em>A</em>) = 0. The idea behind the test is intuitive: if <em>q</em>(<em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em><span>, </span><span lang="el" title="Greek-language text" xml:lang="el"><em>β</em></span>) is statistically different from zero for one or more of the randomly chosen (<em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em><span>, </span><span lang="el" title="Greek-language text" xml:lang="el"><em>β</em></span>)'s, then successive observations can not possibly be independent, in other words, <em>x<span style="font-size: 8pt;">k</span></em> and <em>x</em><span style="font-size: 8pt;"><em>k</em>+1</span> are not independent. </p>
<p>In practice, I chose <em>N</em> = 100 vectors (<em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em>, <span lang="el" title="Greek-language text" xml:lang="el"><em>β</em>)</span> <span lang="el" title="Greek-language text" xml:lang="el">evenly distributed on the unit square [0, 1] x [0, 1], assuming that the <em>x<span style="font-size: 8pt;">k</span></em>'s take values in [0, 1] and that <em>n</em> is much larger than <em>N</em>, say n = 25 <em>N</em>. </span></p>
<p><strong>2.2. Step #2: The statistic associated with the test</strong></p>
<p>Two natural statistics for the test are</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8779295860?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8779295860?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>The first one <em>S</em>, once standardized, should asymptotically have a Kolmogorov-Smirnov distribution. The second one <em>T</em>, once standardized, should asymptotically have a normal distribution, despite the fact that the various <em>q</em>(<em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em><span>, </span><span lang="el" title="Greek-language text" xml:lang="el"><em>β</em>)'s are never independent. However, we do not care about the theoretical (asymptotic) distribution, thus moving away from the classic statistical approach. We use a methodology that is typical of machine learning, and described in section 2.3.</span></p>
<p><span lang="el" title="Greek-language text" xml:lang="el">Nevertheless, the principle is the same in both cases: the higher the value of <em>S</em> or <em>T</em> computed on the data set, the most likely we must reject the assumption of independence. Among the two statistics, <em>T</em> has less volatility than <em>S</em>, and may be preferred. But <em>S</em> is better at detecting very small departures from independence.</span></p>
<p><strong>2.3. Step #3: Assessing statistical significance</strong></p>
<p>The technique described here is very generic, intuitive, and simple. It applies to any statistical test of hypotheses, not just for testing independence. It is somewhat similar to cross-validation. It consists or reshuffling the observations in various ways (see the <a href="https://en.wikipedia.org/wiki/Resampling_(statistics)" target="_blank" rel="noopener">resampling entry</a> in Wikipedia to see how it actually works) and compute <em>S</em> (or <em>T</em>) for each of the 10 different reshuffled time series. After reshuffling, one would assume that any serial, pairwise independence has been lost, and thus you get an idea of the distribution of <em>S</em> (or <em>T</em>) in case of independence. Now compute <em>S</em> on the original time series. Is it higher than the 10 values you computed on the reshuffled time series? If yes, you have a 90% chance that the original time series exhibits serial, pairwise dependency. </p>
<p>A better but more complicated method consists of computing the empirical distribution of the <em>x<span style="font-size: 8pt;">k</span></em>'s, then generate 10 <em>n</em> independent deviates with that distribution. This constitutes 10 time series, each with <em>n</em> independent observations. Compute <em>S</em> for each of these time series, and compare with the value of <em>S</em> computed on the original time series. If the value computed on the original time series is higher, then you have a 90% chance that the original time series exhibits serial, pairwise dependency. This is the preferred method if the original time series has strong, long-range autocorrelations.</p>
<p><strong>2.4. Test data set and results</strong></p>
<p>I tested the methodology on an artificial data set (a discrete dynamical system) created as follows: <em>x</em><span style="font-size: 8pt;">1</span> = log(2) and <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>b</em> <em>x<span style="font-size: 8pt;">n</span></em> - INT(<em>b x<span style="font-size: 8pt;">n</span></em>). Here <em>b</em> is an integer larger than 1, and INT is the integer part function. The data generated behaves like any real time series, and has the following properties.</p>
<ul>
<li>The theoretical distribution of the <em>x<span style="font-size: 8pt;">k</span></em>'s is uniform on [0, 1]</li>
<li>The lag-<em>k</em> autocorrelation is known and equal to 1 / <em>b</em>^<em>k</em> (<em>b</em> at power <em>k</em>)</li>
</ul>
<p>It is thus easy to test for independence and to benchmark various statistical tests: the larger <em>b</em>, the closer we are to independence. With a pseudo-random number generator, one can generate a time series consisting of independently and identically distributed deviates, with a uniform distribution on [0, 1], to check the distribution of <em>S</em> (or <em>T</em>) and its expectation, in case of true independence, and compare it with values of <em>S</em> (or <em>T</em>) computed on the artificial data, using various values of <em>b</em>. In this test with <em>N</em> = 100 <em>n</em> = 2500, <em>b</em> = 4, (corresponding to an autocorrelation of 0.25) the value of <i>S</i> is 6 times larger than the one obtained for full independence. For <em>b</em> = 8, (corresponding to an autocorrelation of 0.125), <i>S</i> is 3 times larger than the one obtained for full independence. This validates the test described here at least on this kind of dataset, as it correctly detects lack of independence by yielding abnormally high values of <em>T</em> when the independence assumption is violated.</p>
<p><strong>Note</strong>: Another interesting feature of the dataset used here is this: using <em>b</em>^<em>k</em> (<em>b</em> at power <em>k</em>) instead of <em>b</em>, is equivalent to checking lag-<em>k</em> independence, that is, independence between <em>x</em><span style="font-size: 8pt;">1</span>, <em>x</em><span style="font-size: 8pt;">2</span>, ... and <em>x</em><span style="font-size: 8pt;">1+<em>k</em></span>, <em>x</em><span style="font-size: 8pt;">2+<em>k</em></span>, ... in the original time series corresponding to <em>b</em>. The reason being that in the original series (corresponding to <em>b</em>), we have x<span style="font-size: 8pt;"><i>n</i>+<em>k</em></span> = <em>b</em>^<em>k</em> x<span style="font-size: 10.6667px;"><i>n</i></span> - INT(<em>b</em>^<em>k</em> <em>x<span style="font-size: 10.6667px;">n</span></em>).</p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><a href="https://www.datasciencecentral.com/profiles/blogs/a-new-test-of-independence"></a></p>A Plethora of Machine Learning Tricks, Recipes, and Statistical Modelstag:www.datasciencecentral.com,2021-04-06:6448529:BlogPost:10463272021-04-06T03:59:22.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8760416479?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8760416479?profile=RESIZE_710x" width="400"></img></a></p>
<p style="text-align: center;"><em>Source: See article #5, in section 1</em></p>
<p><span>Part 2 of this short series focused on fundamental techniques, see <a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-2" rel="noopener" target="_blank">here</a>. In this Part 3, you will find several…</span></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8760416479?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8760416479?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: See article #5, in section 1</em></p>
<p><span>Part 2 of this short series focused on fundamental techniques, see <a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-2" target="_blank" rel="noopener">here</a>. In this Part 3, you will find several machine learning tricks and recipes, many with a statistical flavor. These are articles that I wrote in the last few years. The whole series will feature articles related to the following aspects of machine learning:</span></p>
<ul>
<li><span>Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science)</span></li>
<li><span>Opinions, for instance about the value of a PhD in our field, or the use of some techniques</span></li>
<li><span>Methods, principles, rules of thumb, recipes, tricks</span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-1" target="_blank" rel="noopener">Business analytics</a> </span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-2" target="_blank" rel="noopener">Core Techniques</a> </span></li>
</ul>
<p><span>My articles are always written in simple English and accessible to professionals with typically one year of calculus or statistical training, at the undergraduate level. They are geared towards people who use data but are interesting in gaining more practical analytical experience. Managers and decision makers are part of my intended audience. The style is compact, geared towards people who do not have a lot of free time. </span></p>
<p><span>Despite these restrictions, state-of-the-art, of-the-beaten-path results as well as machine learning trade secrets and research material are frequently shared. References to more advanced literature (from myself and other authors) is provided for those who want to dig deeper in the interested topics discussed. </span></p>
<p><span><strong>1. Machine Learning Tricks, Recipes and Statistical Models</strong></span></p>
<p><span>These articles focus on techniques that have wide applications or that are otherwise fundamental or seminal in nature.</span></p>
<ol>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/defining-and-measuring-chaos-in-data-sets-why-and-how-in-simple-w">Defining and Measuring Chaos in Data Sets: Why and How, in Simple Words</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/hurwitz-riemann-zeta-and-other-special-probability-distributions">Hurwitz-Riemann Zeta And Other Special Probability Distributions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/maximum-runs-in-bernoulli-trials">Maximum runs in Bernoulli trials: simulations and results</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/moving-averages-natural-weights-iterated-convolutions-and-central" target="_blank" rel="noopener">Moving Averages: Natural Weights, Iterated Convolutions, and Central Limit Theorem</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/things-you-did-not-know-you-could-do-with-excel" target="_blank" rel="noopener">Amazing Things You Did Not Know You Could Do in Excel</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-new-test-of-independence" target="_blank" rel="noopener">New Tests of Randomness and Independence for Sequences of Observations</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/interesting-application-of-the-poisson-binomial-distribution" target="_blank" rel="noopener">Interesting Application of the Poisson-Binomial Distribution</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/alternative-to-the-arithmetic-geometric-and-harmonic-means" target="_blank" rel="noopener">Alternative to the Arithmetic, Geometric, and Harmonic Means</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/bernouilli-lattice-models-connection-to-poisson-processes" target="_blank" rel="noopener">Bernouilli Lattice Models - Connection to Poisson Processes</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/simulating-distributions-with-one-line-of-code" target="_blank" rel="noopener">Simulating Distributions with One-Line Formulas, even in Excel</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/simplified-logistic-regression" target="_blank" rel="noopener">Simplified Logistic Regression</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/simple-trick-to-normalize-correlations-r-squared-and-so-on" target="_blank" rel="noopener">Simple Trick to Normalize Correlations, R-squared, and so on</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/simple-trick-to-remove-serial-correlation-in-regression-models" target="_blank" rel="noopener">Simple Trick to Remove Serial Correlation in Regression Models</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-beautiful-result-in-probability-theory" target="_blank" rel="noopener">A Beautiful Result in Probability Theory</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/long-range-correlation-in-time-series-tutorial-and-case-study" target="_blank" rel="noopener">Long-range Correlations in Time Series: Modeling, Testing, Case Study</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-correlation-and-regression-in-statistics" target="_blank" rel="noopener">Difference Between Correlation and Regression in Statistics</a></li>
</ol>
<p><span><strong>2. Free books</strong></span></p>
<ul>
<li><span><b>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning">here</a>. In about 300 pages and 28 chapters it covers many new topics, offering a fresh perspective on the subject, including rules of thumb and recipes that are easy to automate or integrate in black-box systems, as well as new model-free, data-driven foundations to statistical science and predictive analytics. The approach focuses on robust techniques; it is bottom-up (from applications to theory), in contrast to the traditional top-down approach.</span></p>
<p><span>The material is accessible to practitioners with a one-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications with numerous illustrations, is aimed at practitioners, researchers, and executives in various quantitative fields.</span></p>
</li>
<li><span><b>Applied Stochastic Processes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">here</a>. Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems (104 pages, 16 chapters.) This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject.</span></p>
<p><span>It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</span></p>
</li>
</ul>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>Defining and Measuring Chaos in Data Sets: Why and How, in Simple Wordstag:www.datasciencecentral.com,2021-03-29:6448529:BlogPost:10456352021-03-29T00:00:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8735877694?profile=original" rel="noopener" target="_blank"><img class="align-full" src="https://storage.ning.com/topology/rest/1.0/file/get/8735877694?profile=RESIZE_710x" width="720"></img></a></p>
<p>There are many ways chaos is defined, each scientific field and each expert having its own definitions. We share here a few of the most common metrics used to quantify the level of chaos in univariate time series or data sets. We also introduce a new, simple definition based on metrics that are familiar to everyone. Generally speaking, chaos…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8735877694?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8735877694?profile=RESIZE_710x" width="720" class="align-full"/></a></p>
<p>There are many ways chaos is defined, each scientific field and each expert having its own definitions. We share here a few of the most common metrics used to quantify the level of chaos in univariate time series or data sets. We also introduce a new, simple definition based on metrics that are familiar to everyone. Generally speaking, chaos represents how predictable a system is, be it the weather, stock prices, economic time series, medical or biological indicators, earthquakes, or anything that has some level of randomness. </p>
<p>In most applications, various statistical models (or data-driven, model-free techniques) are used to make predictions. Model selection and comparison can be based on testing various models, each one with its own level of chaos. Sometimes, time series do not have an auto-correlation function due to the high level of variability in the observations: for instance, the theoretical variance of the model is infinite. An example is provided in section 2.2 <a href="https://www.datasciencecentral.com/profiles/blogs/hurwitz-riemann-zeta-and-other-special-probability-distributions" target="_blank" rel="noopener">in this article</a> (see picture below), used to model extreme events. In this case, chaos is a handy metric, and it allows you to build and use models that are otherwise ignored or unknown by practitioners. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8725268092?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8725268092?profile=RESIZE_710x" width="450" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>Time series with indefinite autocorrelation; instead, chaos is used to measure predictability</em></p>
<p>Below are various definitions of chaos, depending on the context they are used for. References about how to compute these metrics, are provided in each case.</p>
<p><strong>Hurst exponent</strong></p>
<p>The <a href="https://en.wikipedia.org/wiki/Hurst_exponent" target="_blank" rel="noopener">Hurst exponent</a> <em>H</em> is used to measure the level of smoothness in time series, and in particular, the level of long-term memory. <em>H</em> takes on values between 0 and 1, with <em>H</em> = 1/2 corresponding to the Brownian motion, and <em>H</em> = 0 corresponding to pure white noise. Higher values correspond to smoother time series, and lower values to more rugged data. Examples of time series with various values of <em>H</em> are found <a href="https://www.datasciencecentral.com/profiles/blogs/long-range-correlation-in-time-series-tutorial-and-case-study" target="_blank" rel="noopener">in this article</a>, see picture below. In the same article, the relation to the <em>detrending moving average</em> (another metric to measure chaos) is explained. Also, <em>H</em> is related to the fractal dimension. Applications include stock price modeling.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8725551894?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8725551894?profile=RESIZE_710x" width="350" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>Time series with H = 1/2 (top), and H close to 1 (bottom)</em></p>
<p><strong>Lyapunov exponent</strong></p>
<p>In dynamical systems, the Lyapunov exponent is used to quantify how a system is sensitive to initial conditions. Intuitively, the more sensitive to initial conditions, the more chaotic the system is. For instance, the system <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> - INT(<em>x<span style="font-size: 8pt;">n</span></em>), where INT represents the integer function, is very sensitive to the initial condition <em>x</em><span style="font-size: 8pt;">0</span>. A very small change in the value of <em>x</em><span style="font-size: 8pt;">0</span> results in values of <em>x<span style="font-size: 8pt;">n</span></em> that are totally different even for <em>n</em> as low as 45. See how to compute the Lyapunov exponent, <a href="https://en.wikipedia.org/wiki/Lyapunov_exponent" target="_blank" rel="noopener">here</a>.</p>
<p><strong>Fractal dimension</strong></p>
<p>A one-dimensional curve can be defined parametrically by a system of two equations. For instance <em>x</em>(<em>t</em>) = sin(<em>t</em>), <em>y</em>(<em>t</em>) = cos(<em>t</em>) represents a circle of radius 1, centered at the origin. Typically, <em>t</em> is referred to as the time, and the curve itself is called an orbit. In some cases, as <em>t</em> increases, the orbit fills more and more space in the plane. In some cases, it will fill a dense area, to the point that it seems to be an object with a dimension strictly between 1 and 2. An example is provided in section 2 <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">in this article</a>, and pictured below. A formal definition of fractal dimension can be found <a href="https://en.wikipedia.org/wiki/Fractal_dimension" target="_blank" rel="noopener">here</a>.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8725489684?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8725489684?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3</strong>: <em>Example of a curve filling a dense area (fractal dimension > 1)</em></p>
<p>The picture in figure 3 is related to the Riemann hypothesis. Any meteorologist who sees the connection to hurricanes and their eye, could bring some light about how to solve this infamous mathematical conjecture, based on the physical laws governing hurricanes. Conversely, this picture (and the underlying mathematics) could also be used as statistical model for hurricane modeling and forecasting. </p>
<p><strong>Approximate entropy</strong></p>
<p>In statistics, the approximate entropy is a metric used to quantify regularity and predictability in time series fluctuations. Applications include medical data, finance, physiology, human factors engineering, and climate sciences. See the Wikipedia entry, <a href="https://en.wikipedia.org/wiki/Approximate_entropy" target="_blank" rel="noopener">here</a>.</p>
<p>It should not be confused with <a href="https://en.wikipedia.org/wiki/Entropy" target="_blank" rel="noopener">entropy</a>, which measures the amount of information attached to a specific probability distribution (with the uniform distribution on [0, 1] achieving maximum entropy among all continuous distributions on [0, 1], and the normal distribution achieving maximum entropy among all continuous distributions defined on the real line, with a specific variance). Entropy is used to compare the efficiency of various encryption systems, and has been used in feature selection strategies in machine learning, see <a href="https://www.datasciencecentral.com/profiles/blogs/feature-selection-a-simple-solution" target="_blank" rel="noopener">here</a>.</p>
<p><strong>Independence metric </strong></p>
<p>Here I discuss some metrics that are of interest in the context of dynamical systems, offering an alternative to the Lyapunov exponent to measure chaos. While the Lyapunov exponents deals with sensitivity to initial conditions, the classic statistics mentioned here deals with measuring predictability for a single instance (observed time series) of a dynamical systems. However, they are most useful to compare the level of chaos between two different dynamical systems with similar properties. A dynamical system is a sequence <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>T</em>(<em>x<span style="font-size: 8pt;">n</span></em>), with initial condition <em>x</em><span style="font-size: 8pt;">0</span>. Examples are provided in my last two articles, <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">here</a> and <a href="https://www.datasciencecentral.com/profiles/blogs/hurwitz-riemann-zeta-and-other-special-probability-distributions" target="_blank" rel="noopener">here</a>. See also <a href="https://www.datasciencecentral.com/profiles/blogs/beautiful-mathematical-images" target="_blank" rel="noopener">here</a>. </p>
<p>A natural metric to measure chaos is the maximum autocorrelation in absolute value, between the sequence (<em>x<span style="font-size: 8pt;">n</span></em>), and the shifted sequences (<em>x</em><span style="font-size: 8pt;"><em>n</em>+<em>k</em></span>), for <em>k</em> = 1, 2, and so on. Its value is maximum and equal to 1 in case of periodicity, and minimum and equal to 0 for the most chaotic cases. However, some sequences attached to dynamical systems, such as the digit sequence pictured in Figure 1 in this article, do not have theoretical autocorrelations: these autocorrelations don't exist because the underlying expectation or variance is infinite or does not exist. A possible solution with positive sequences is to compute the autocorrelations on <em>y<span style="font-size: 8pt;">n</span></em> = log(<em>x<span style="font-size: 8pt;">n</span></em>) rather than on the <em>x<span style="font-size: 8pt;">n</span></em>'s.</p>
<p>In addition, there may be strong non-linear dependencies, and thus high predictability for a sequence (<em>x<span style="font-size: 8pt;">n</span></em>), even if autocorrelations are zero. Thus the desire to build a better metric. In my next article, I will introduce a metric measuring the level of independence, as a proxy to quantifying chaos. It will be similar in some ways to the Kolmogorov-Smirnov metric used to test independence and illustrated <a href="https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-8/issue-2/A-Kolmogorov-Smirnov-type-test-for-independence-between-marks-and/10.1214/14-EJS961.full" target="_blank" rel="noopener">here</a>, however, without much theory, essentially using a machine learning approach and data-driven, model-free techniques to build confidence intervals and compare the amount of chaos in two dynamical systems: one fully chaotic versus one not fully chaotic. Some of this is discussed <a href="https://math.stackexchange.com/questions/4079669/question-about-a-special-test-of-independence-autocorrelation" target="_blank" rel="noopener">here</a>.</p>
<p>I did not include the variance as a metric to measure chaos, as the variance can always be standardized by a change of scale, unless it is infinite.</p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>Hurwitz-Riemann Zeta And Other Special Probability Distributionstag:www.datasciencecentral.com,2021-03-22:6448529:BlogPost:10448132021-03-22T05:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691835652?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8691835652?profile=RESIZE_710x" width="600"></img></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.datasciencecentral.com/profiles/blogs/babar-mimou" rel="noopener" target="_blank">here</a></em></p>
<p>In my previous article <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" rel="noopener" target="_blank">here</a>, I discussed a…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691835652?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691835652?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.datasciencecentral.com/profiles/blogs/babar-mimou" target="_blank" rel="noopener">here</a></em></p>
<p>In my previous article <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">here</a>, I discussed a simple way to solve complex optimization problems in machine learning. This was illustrated in the case of complex dynamical systems, involving non-linear equations in infinite dimensions, known as functional equations. These equations were solved using a fixed point algorithm, of which the Newton–Raphson method is a well known, widely used example.</p>
<p>These equations are typically solved numerically, as no theoretical solution is known in most cases. Nevertheless, in our case, a few examples have an exact, known solution. These examples with known solution are very useful, in the sense that they allow you to test your numerical algorithm and assess how fast it converges, or not. All the solutions were probability distributions, and in this article we introduce an even larger, generic class of problems (chaotic discrete dynamical systems) with known solution. The distributions presented here can thus be used as tests to benchmark optimization algorithms, but they also have their own interest for statistical modeling purposes, especially in risk management and extreme event modeling.</p>
<p>Each dynamical system discussed here (or in my previous article) comes with two distributions:</p>
<ul>
<li>A continuous one on [0, 1], known as the <em>invariant distribution</em>.</li>
<li>A discrete one taking on strictly positive integer values, known as the <em>digit distribution</em>.</li>
</ul>
<p>Besides, these distributions are very useful in number theory, though this will not be discussed here. The name Hurwitz and Riemann-Zeta is just a reminder of their strong connection to number theory problems such as continued fractions, approximation of irrational numbers by rational ones, the construction and distribution of the digits of random numbers in various numeration systems, and the famous <a href="https://en.wikipedia.org/wiki/Riemann_hypothesis" target="_blank" rel="noopener">Riemann Hypothesis</a> that has a one million dollar prize attached to it. Some of this is discussed <a href="https://mathoverflow.net/questions/383925/about-generalized-continued-fractions" target="_blank" rel="noopener">here</a> and in some of my past MathOverflow questions. However, our focus here is applications in machine learning.</p>
<p><span style="font-size: 14pt;"><strong>1. The Hurwitz-Riemann Zeta distribution</strong></span></p>
<p>Without diving into the details, let me first briefly discuss other Riemann-related distributions invented by different authors. For a definition of the Hurwitz function, see <a href="https://en.wikipedia.org/wiki/Hurwitz_zeta_function" target="_blank" rel="noopener">here</a>. It generalizes the <a href="https://en.wikipedia.org/wiki/Riemann_zeta_function" target="_blank" rel="noopener">Riemann Zeta function</a>. The most well known probability distribution related to these functions is the discrete <a href="https://en.wikipedia.org/wiki/Zipf%27s_law" target="_blank" rel="noopener">Zipf distribution</a>. It is well known by machine learning practitioners, and used to model phenomena such as "the top 10 websites amount to (say) 95% of the Internet traffic". Another example, this time continuous over the set of all positive real numbers, can be found <a href="https://benthamopen.com/FULLTEXT/TOSPJ-7-53" target="_blank" rel="noopener">here</a>. The paper is entitled <em>A New Class of Distributions Based on Hurwitz Zeta Function with Applications for Risk Management</em>. The author defines a family of distributions that generalizes the exponential power, normal, gamma, Weibull, Rayleigh, Maxwell-Boltzmann and chi-squared distributions, with applications in actuarial sciences. Finally, there is also a well known example (for mathematicians) defined on the complex plane, see <a href="https://arxiv.org/pdf/1504.03438.pdf" target="_blank" rel="noopener">here</a>. The paper is entitled <em>A complete Riemann zeta distribution and the Riemann hypothesis</em>.</p>
<p>Our Hurwitz-Riemann Zeta distribution is yet another example arising this time from discrete dynamical systems, continuous on [0, 1]. It also has a sister discrete distribution attached to it, useful for statistical modeling. It is defined as follows.</p>
<p><strong>1.1. Our Hurwitz-Riemann Zeta distribution</strong></p>
<p>The distribution discussed here is the most basic example, from the generic family described in section 2. It depends on one parameter <em>s</em> > 0, and the support domain is [0, 1]. The construction mechanism is defined in section 2, for the general case. Our Hurwitz-Riemann zeta distribution has the following density:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8699635072?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8699635072?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>where <span><em>ζ</em>(<em>s</em>, <em>x</em>) is the Hurwitz function, see <a href="https://en.wikipedia.org/wiki/Hurwitz_zeta_function" target="_blank" rel="noopener">here</a>. It has the following two first moments:</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691286058?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691286058?profile=RESIZE_710x" width="550" class="align-center"/></a></span></p>
<p>where <em>ζ</em>(<em>s</em>) = <em>ζ</em>(<em>s</em>, 1) is the Riemann Zeta function. This allows you to compute its variance. Higher moments can also be computed exactly. The cases <em>s</em> = 0, 1 or 2 are limiting cases, with the limit as <em>s</em> tends to zero, corresponding to the uniform density on [0, 1]. Particular values (<em>s</em> = 1, 2), empirically verified, are:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691307680?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691307680?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>Here <span><em>γ</em> = 0.57721... is the Euler-Mascheroni constant, see <a href="https://en.wikipedia.org/wiki/Euler%E2%80%93Mascheroni_constant" target="_blank" rel="noopener">here</a>. </span></p>
<p><strong>1.2. The discrete version</strong></p>
<p>These systems also have a discrete distribution attached to them, called the digit distribution, and described in section 2. For the Hurwitz-Riemann case, the probability that a digit is equal to <em>k</em>, is </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691322267?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691322267?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>The expectation is finite only if <em>s</em> > 1. Likewise, the variance is finite only if <em>s</em> > 2. By contrast, the Zipf distribution has <em>P</em>(<em>k</em>) = (1 / <em>ζ</em>(<em>s</em>)) * 1 / <em>k</em>^<em>s</em>.</p>
<p><span style="font-size: 14pt;"><strong>2. A generic family of distributions, with applications</strong></span></p>
<p><span>We are dealing with a particular type of discrete dynamical system defined by </span><em>x</em><span><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>) - INT(<em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>)), where INT is the integer part function, and <em>x</em><span style="font-size: 8pt;">0</span> in [0, 1] is the initial condition. The function <em>p</em>, defined for real numbers in [0, 1], is strictly decreasing and invertible, with <em>p</em>(1) = 1 and <em>p</em>(0) infinite. The results discussed here are valid for the vast majority of initial conditions, nevertheless there are infinitely many exceptions, for instance <em>x</em><span style="font-size: 8pt;">0</span> = 0. These systems are discussed in details in my previous article, <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">here</a>. In this section, only the main results are presented. These systems have the following properties:</span></p>
<ul>
<li><span>The <em>n</em>-th digit of <em>x</em><span style="font-size: 8pt;">0</span> is <em>d<span style="font-size: 8pt;">n</span></em> = INT(<em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>)). These digits are called <a href="https://www.tandfonline.com/doi/abs/10.1080/026811199282100?journalCode=cdss19" target="_blank" rel="noopener">codewords</a> in the context of dynamical systems. The probability that a digit is equal to <em>k</em> (<em>k</em> = 1, 2, 3 and so on) is <em>F</em>(<em>q</em>(<em>k</em>)) - <em>F</em>(<em>q</em>(<em>k</em>+1)) where <em>F</em> and <em>q</em> are defined below. If you know the digits, you can retrieve <em>x</em><span style="font-size: 8pt;">0</span> using the algorithm described in my previous article. </span></li>
<li><span>The invariant distribution <em>F</em>, which is the limit of the empirical distribution of the <em>x<span style="font-size: 8pt;">n</span></em>'s, satisfies the following functional equation: <a href="https://storage.ning.com/topology/rest/1.0/file/get/8691388861?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691388861?profile=RESIZE_710x" width="250" class="align-center"/></a></span></li>
</ul>
<p><span>where <em>q</em> is the inverse of the function <em>p, q</em>' denotes the derivative of <em>q</em>, and <em>f</em> (the invariant density) is the derivative of <em>F</em>. We focus only on the results that are of interest to machine learning professionals. </span></p>
<p><span>Typically numerical methods are needed to solve the above functional equation, however here we are dealing with a large class of dynamical systems for which the theoretical solution is known. The purpose is to test numerical algorithms to check how well and how fast they can approach the exact solution, as discussed in section 2 <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">in my previous article</a>. The invariant distribution <em>F</em> discussed below is far more general than the ones described in my earlier article. </span></p>
<p><strong>2.1. Generalized Hurwitz-Riemann Zeta distribution</strong></p>
<p><span>One way to find a dynamical system with know invariant distribution is to specify that distribution upfront, and then compute the resulting function <em>p</em>(<em>x</em>) that defines the system in question. Based on theory discussed <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">here</a> and <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">here</a>, one can proceed as follows. Try a monotonic increasing function <em>r</em>(<em>x</em>) with <em>r</em>(2) = 1 + <em>r</em>(1). Let <em>F</em>(<em>x</em>) = <em>r</em>(<em>x</em>+1) - <em>r</em>(1), and <em>R</em>(<em>x</em>) = <em>r</em>(<em>x</em>+1) - <em>r</em>(<em>x</em>). Then <em>R</em>(<em>x</em>) = <em>F</em>(<em>q</em>(<em>x</em>)), that is, <em>R</em>(<em>p</em>(<em>x</em>)) = <em>F</em>(<em>x</em>) since <em>q</em>(<em>p</em>(<em>x</em>)) = <em>x</em>. You can retrieve <em>p</em>(<em>x</em>) by inverting <em>R</em>(<em>x</em>). </span></p>
<p><span>A simple but generic example is </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691691652?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691691652?profile=RESIZE_710x" width="190" class="align-center"/></a></span></p>
<p><span>where <em>ψ</em> is a strictly decreasing function with <em>ψ</em>(∞) = 0, <em>ψ</em>(1) = 1, and <em>ψ</em>(0) = ∞. Then you have</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691705091?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691705091?profile=RESIZE_710x" width="280" class="align-center"/></a></span></p>
<p><span>It is easy to show that <em>R</em>(<em>x</em>) = <em>ψ</em>(<em>x</em>), thanks to a careful choice for the function <em>r</em>(<em>x</em>). This explains why the system has a simple theoretical solution; it was indeed built for that purpose. As a consequence, the probability for a digit to be equal to <em>k</em> (<em>k</em> = 1, 2, and so on) is simply equal to <em>P</em>(<em>k</em>) = <em>ψ</em>(<i>k</i>) - <em>ψ</em>(<i>k</i>+1). For more details, see Example 5 <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">in this article</a>, in the section <em>Appendix 1: Exact solution for various 1-D dynamical systems</em>.</span></p>
<p><span>The Hurwitz-Riemann particular case in section 1.1 corresponds to</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691709297?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691709297?profile=RESIZE_710x" width="300" class="align-center"/></a></span></p>
<p>Another particular case corresponds to <span><em>ψ</em>(<em>x</em>) = log<span style="font-size: 8pt;">2</span>(1 + 1/x), where log<span style="font-size: 8pt;">2</span> represents the logarithm in base 2. The associated dynamical system is known as the Gauss map and related to continued fractions. Its digits are the coefficients of continued fractions, and are known to follow a <a href="https://en.wikipedia.org/wiki/Gauss%E2%80%93Kuzmin_distribution" target="_blank" rel="noopener">Gauss-Kuzmin distribution</a>. Also, <em>p</em>(<em>x</em>) = <em>q</em>(x) = 1/<em>x</em>. It is discussed <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">in my previous article</a>. See also Example 2 <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">in this article</a>, in the section <em>Appendix 1: Exact solution for various 1-D dynamical systems</em>.</span></p>
<p><strong>2.2. Application</strong></p>
<p><span>Besides being useful to test optimization algorithms against the exact solution (such as solving the above functional equation), the digits of the system have applications in simulations, encoding, random number generation, and statistical modeling. In particular, below is a picture featuring the typical behavior of the first 2,000 values of <em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>), starting with <em>x</em><span style="font-size: 8pt;">0</span> = 0.5. Depending on the choice of the function <em>ψ</em>,<em> </em>these values may or may not be highly autocorrelated, and in some cases expectation and/or variance are infinite, which implies that the autocorrelation does not exist. The picture below features the Hurwitz-Riemann case with <em>s</em> = 2 (expectation for the digits is finite and equal to <em>ζ</em>(2) = π^2 / 6, but variance is infinite).</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691827873?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691827873?profile=RESIZE_710x" width="500" class="align-center"/></a></span></p>
<p><span>Other special distributions are discussed in my previous articles:</span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-family-of-generalized-gaussian-distributions" target="_blank" rel="noopener">New Family of Generalized Gaussian Distributions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/interesting-application-of-the-poisson-binomial-distribution" target="_blank" rel="noopener">Interesting Application of the Poisson-Binomial Distribution</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-strange-family-of-statistical-distributions" target="_blank" rel="noopener">A Strange Family of Statistical Distributions</a></li>
</ul>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
<p></p>An Easy Way to Solve Complex Optimization Problems in Machine Learningtag:www.datasciencecentral.com,2021-03-08:6448529:BlogPost:10426552021-03-08T03:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641667893?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8641667893?profile=RESIZE_710x" width="400"></img></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.wikiwand.com/en/Test_functions_for_optimization" rel="noopener" target="_blank">here</a></em></p>
<p>There are numerous examples in machine learning, statistics, mathematics and deep learning, requiring an algorithm to solve some complicated equations: for instance, maximum likelihood…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641667893?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641667893?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.wikiwand.com/en/Test_functions_for_optimization" target="_blank" rel="noopener">here</a></em></p>
<p>There are numerous examples in machine learning, statistics, mathematics and deep learning, requiring an algorithm to solve some complicated equations: for instance, maximum likelihood estimation (think about logistic regression or the EM algorithm) or gradient methods (think about stochastic or swarm optimization). Here we are dealing with even more difficult problems, where the solution is not a set of optimal parameters (a finite dimensional object), but a function (an infinite dimensional object).</p>
<p>The context is discrete, chaotic dynamical systems, with applications to weather forecasting, population growth models, complex econometric systems, image encryption, chemistry (mixtures), physics (how matter reaches an equilibrium temperature), astronomy (how celestial man-made or natural bodies end up having stable or unstable orbits), or stock market prices, to name a few. These are referred to as complex systems.</p>
<p>The solutions to the problems discussed here requires numerical methods, as usually no exact solution is known. The type of equation to be solved is called <em>functional equation</em> or <em>stochastic integral</em>. We explore a few cases where the exact solution is actually known: this helps assess the efficiency, accuracy and speed of convergence of the numerical methods discussed in this article. These methods are based on the fixed-point algorithm applied to infinite dimensional problems.</p>
<p><span style="font-size: 14pt;"><strong>1. The general problem</strong></span></p>
<p>We are dealing with a discrete dynamical system defined by <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <i>T</i>(<em>x<span style="font-size: 8pt;">n</span></em>), where <i>T</i> is a real-valued function, and <em>x</em><span style="font-size: 8pt;">0</span> is the initial condition. For the sake of simplicity, we restrict ourselves to the case where <em>x<span style="font-size: 8pt;">n</span></em> is in [0, 1]. Generalizations, for instance with <em>x<span style="font-size: 8pt;">n</span></em> being a vector, are described <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">here</a>. The most well known example is the <a href="https://en.wikipedia.org/wiki/Logistic_map" target="_blank" rel="noopener">logistic map</a>, with <i>T</i>(<em>x</em>) = <em>λx</em>(1-<em>x</em>), exhibiting a chaotic behavior or not, depending on the value of the parameter <em><span>λ</span></em>.</p>
<p>In our case, the function <i>T</i>(<em>x</em>) takes the following form: <i>T</i>(<em>x</em>) = <em>p</em>(<em>x</em>) - INT(<em>p</em>(<em>x</em>)), where INT denote the integer part function, <em>p</em>(<em>x</em>) is positive, monotonic, continuous and decreasing (thus bijective) with <em>p</em>(1) = 1 and <em>p</em>(0) infinite. For instance <em>p</em>(<em>x</em>) = 1 / <em>x</em> corresponds to the Gauss map associated with continued fractions; it is the most fundamental and basic example, and I discuss it <a href="https://mathoverflow.net/questions/383925/about-generalized-continued-fractions" target="_blank" rel="noopener">here</a> as well as below in this article. Another example is the Hurwitz-Riemann map, discussed <a href="https://www.datasciencecentral.com/profiles/blogs/hurwitz-riemann-zeta-and-other-special-probability-distributions" target="_blank" rel="noopener">here</a>. </p>
<p><strong>1.1. Invariant distribution and ergodicity</strong></p>
<p>The <em>invariant distribution</em> of the system is the one followed by the successive <em>x<span style="font-size: 8pt;">n</span></em>'s, or in other words, the limit of the empirical distribution attached to the <em>x<span style="font-size: 8pt;">n</span></em>'s, given an initial condition <em>x</em><span style="font-size: 8pt;">0</span>. A lot of interesting properties can be derived if the invariant density <em>f</em>(<em>x</em>) (the derivative of the invariant distribution) is known, assuming it exists. This only works with <a href="https://en.wikipedia.org/wiki/Ergodicity" target="_blank" rel="noopener">ergodic systems</a>. All systems under consideration here are <em>ergodic</em>. The invariant distribution applies to almost all initial conditions <em>x</em><span style="font-size: 8pt;">0</span>, though some <span style="font-size: 8pt;"><span style="font-size: 12pt;"><em>x</em></span>0</span>'s called exceptions, violate the law. This is a typical feature of all these systems. For some systems (the <a href="https://en.wikipedia.org/wiki/Dyadic_transformation" target="_blank" rel="noopener">Bernoulli map</a> for instance), the <em>x</em><span style="font-size: 8pt;">0</span>'s that are not exceptions are called <a href="https://en.wikipedia.org/wiki/Normal_number" target="_blank" rel="noopener">normal numbers</a>. </p>
<p>By ergodic, I mean that for almost any initial condition <em>x</em><span style="font-size: 8pt;">0</span>, the sequence (<em>x<span style="font-size: 8pt;">n</span></em>) eventually visits all parts of [0, 1], in a uniform and random sense. This implies that the average behavior of the system can be deduced from the trajectory of a "typical" sequence (<em>x<span style="font-size: 8pt;">n</span></em>) attached to an initial condition <em>x</em><span style="font-size: 8pt;">0</span>. Equivalently, a sufficiently large collection of random instances of the process (also called orbits) can represent the average statistical properties of the entire process.</p>
<p>Invariant distributions are also called equilibrium or attractor distributions in probability theory.</p>
<p><strong>1.2. The functional equation to be solved</strong></p>
<p>Let us assume that the invariant distribution <em>F</em>(<em>x</em>) can be written as <em>F</em>(<em>x</em>) = <em>r</em>(<em>x</em>+1) − r(1) for some function <i>r</i>. The support domain for <em>F</em>(<em>x</em>) is [0, 1], thus <em>F</em>(0) = 0, <em>F</em>(1) = 1, <em>F</em>(<em>x</em>) = 0 if x < 0, and <em>F</em>(<em>x</em>) = 1 if <em>x</em> > 1. Define <em>R</em>(<em>x</em>) = <em>r</em>(<em>x</em>+1) − <em>r</em>(<em>x</em>). Then we can retrieve <em>p</em>(<em>x</em>) (under some conditions) using the formula</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641305083?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641305083?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>Thus <em>r</em>(<em>x</em>) must be increasing on [1,2] and <em>r</em>(2) = 1 + <em>r</em>(1). Not any function can be an invariant distribution.</p>
<p>In practice, you know <em>p</em>(<em>x</em>) and you try to find the invariant distribution <em>F</em>(<em>x</em>). So the above formula is not useful, except that it helps you create a table of dynamical systems, defined by their function <em>p</em>(<em>x</em>), with known invariant distribution. Such a table is available <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">here</a>, see Appendix 1 in that article, in particular example 5 featuring a Riemann zeta system. It is useful to test the fixed point algorithm described in section 2, when the exact solution is known. </p>
<p>If you only know <em>p</em>(<em>x</em>), to retrieve <em>F</em>(<em>x</em>) or its derivative <em>f</em>(<em>x</em>), you need to solve the following functional equation, whose unknown is the function <em>f</em>. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641363282?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641363282?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>where <em>q</em> is the inverse of the function <em>p</em>. Note that <em>R</em>(<em>x</em>) = <em>F</em>(<em>q</em>(<em>x</em>)) or alternatively, <em>R</em>(<em>p</em>(<em>x</em>)) = <em>F</em>(<em>x</em>), with <em>p</em>(<em>q</em>(<em>x</em>)) = <em>q</em>(<em>p</em>(<em>x</em>)) = <em>x</em>. Also, here <em>x</em> is in [0, 1]. In practice, you get a good approximation if you use the first 1,000 terms in the sum. Typically, the invariant density <em>f</em> is bounded, and the weights |<em>q</em>'(<em>x</em>+<em>k</em>)| are decaying relatively fast as <em>k</em> increases. </p>
<p>The theory behind this is beyond the scope of this article. It is based on the <a href="https://en.wikipedia.org/wiki/Transfer_operator" target="_blank" rel="noopener">transfer operator</a>, and also briefly discussed in one of my previous articles, <a href="https://mathoverflow.net/questions/383925/about-generalized-continued-fractions/383997#383997" target="_blank" rel="noopener">here</a>: see section "Functional equation for <em>f</em>". The invariant density is the eigenfunction of the transfer operator, corresponding to the eigenvalue 1. Also, if <em>x</em> is replaced by a vector (for instance, if working with bivariate dynamical systems), the above formula can be generalized, involving two variables <em>x</em>, <em>y</em>, and the derivative of the (joint) distribution is replaced by a Jacobian. </p>
<p><span style="font-size: 14pt;"><strong>2. Numerical solution via the fixed point algorithm</strong></span></p>
<p>The last formula in section 1.2. suggests a simple iterative algorithm to solve this type of equation. You need to start with an initial function <em>f</em><span style="font-size: 8pt;">0</span>, and in this case, the uniform distribution on [0, 1] is usually a good starting point. That is, <span style="font-size: 12pt;"><em>f</em></span><span style="font-size: 8pt;">0</span>(<em>x</em>) = 1 if <em>x</em> is in [0, 1], and 0 elsewhere. The iterative step is as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641383454?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641383454?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>with <em>x</em> in [0, 1]. Each iteration <em>n</em> generates a whole new function <em>f<span style="font-size: 8pt;">n</span></em> on [0, 1], and the hope is that the algorithm converges as <em>n</em> tends to infinity. If convergence occurs, the limiting function must be the invariant density of the system. This is an example of the <a href="https://en.wikipedia.org/wiki/Fixed-point_iteration" target="_blank" rel="noopener">fixed point algorithm</a>, in infinite dimension.</p>
<p>In practice, you compute <em>f</em>(<em>x</em>) for only (say) 10,000 values of <em>x</em> evenly spaced between 0 and 1. If for instance, <em>f</em><span style="font-size: 8pt;"><em>n</em>+1</span>(0.5) requires the computation of (say) <em>f<span style="font-size: 8pt;">n</span></em>(0.879237...) and the closest value in your array is <em>f<span style="font-size: 8pt;">n</span></em>(0.8792), you replace <em>f<span style="font-size: 8pt;">n</span></em>(0.879237...) by <em>f<span style="font-size: 8pt;">n</span></em>(0.8792) or you use interpolation techniques. This is more efficient than using a function defined recursively in a programming language. Surprisingly the convergence is very fast and in the examples tested, the error between the true solution and the one obtained after 3 iterations, is very small, see picture below.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641440290?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641440290?profile=RESIZE_710x" width="400" class="align-center"/></a>In the above picture, <em>p</em>(<em>x</em>) = <em>q</em>(<em>x</em>) = 1 / <em>x</em>, and the invariant distribution is known: <em>f</em>(<em>x</em>) = 1 / ((1+<em>x</em>)(log 2)). It is pictured in red, and it is related to the <a href="https://en.wikipedia.org/wiki/Gauss%E2%80%93Kuzmin_distribution" target="_blank" rel="noopener">Gauss-Kuzmin distribution</a>. Note that we started with the uniform distribution <em>f</em><span style="font-size: 8pt;">0</span> pictured in black (the flat line). The first iterate <em>f</em><span style="font-size: 8pt;">1</span> is in green, the second one <em>f</em><span style="font-size: 8pt;">2</span> is in grey, and the third one <em>f</em><span style="font-size: 8pt;">3</span> is in orange, and almost undistinguishable from the exact solution in red (I need magnifying glasses to see it). Source code for these computations is available <a href="http://datashaping.com/solve2b.txt" target="_blank" rel="noopener">here</a>. In the source code, there are two extra parameters <span><em>α</em>, <em>λ</em>. When <em>α</em> = <em>λ</em> = 1, it corresponds to the classic case <em>p</em>(<em>x</em>) = 1 / <em>x</em>.</span></p>
<p><span style="font-size: 14pt;"><strong>3. Applications</strong></span></p>
<p>One interesting concept associated with these dynamical systems is that of <em>digit</em>. The <em>n</em>-th digit <em>d<span style="font-size: 8pt;">n</span></em> is defined as INT(<em>p</em>(<em>x</em><span style="font-size: 8pt;">n</span>)) where INT is the integer part function. I call it "digit" because all these systems have a numeration system attached to them, generalizing standard numeration systems which are just a particular case. If you know the digits attached to an initial condition <em>x</em><span style="font-size: 8pt;">0</span>, you can retrieve <em>x</em><span style="font-size: 8pt;">0</span> with a simple algorithm. Start with <em>n</em> = <em>N</em> large enough and <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;"><em>+1</em></span> = 0 (you will get about <em>N</em> digits of accuracy for <em>x</em><span style="font-size: 8pt;">0</span>), and compute iteratively <em>x<span style="font-size: 8pt;">n</span></em> backward from <em>n</em> = <em>N</em> to <em>n</em> = 0 using the recursion <em>x<span style="font-size: 8pt;">n</span></em> = <em>q</em>(<em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> + <em>d<span style="font-size: 8pt;">n</span></em>) - INT(<em>q</em>(<span style="font-size: 8pt;"><span style="font-size: 10pt;">x</span><em>n</em>+1</span> + <span style="font-size: 10pt;"><em>d<span style="font-size: 8pt;">n</span></em></span>)). These digits can be used in encryption systems.</p>
<p>This will be described in detail in my upcoming book <em>Gentle Introduction to Discrete Dynamical Systems</em>. However, the interesting part discussed here is related to statistical modeling. As a starter, let's look at the digits of <em>x</em><span style="font-size: 8pt;">0</span> = <span>π - 3 in two different dynamical systems:</span></p>
<ul>
<li><span><strong>Continued fractions</strong>. Here <em>p</em>(<em>x</em>) = 1 / <em>x</em>. The first 20 digits are 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 3, 3, 23, 1, 1, 7, 4, 35, see <a href="https://oeis.org/A001203" target="_blank" rel="noopener">here</a>. </span></li>
<li><strong>A less chaotic dynamical system</strong>. Here <em>p</em>(<em>x</em>) = (-1 + SQRT(5 +4/<em>x</em>)) / 2. <span>The first 20 digits are </span>2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 26, 1, 3, 1, 10, 1, 1. We also have <em>F</em>(x) = 2<em>x</em> / (<em>x</em>+1).</li>
</ul>
<p>The distribution of the digits is known in both cases. For continued fractions, it is the <a href="https://en.wikipedia.org/wiki/Gauss%E2%80%93Kuzmin_distribution" target="_blank" rel="noopener">Gauss-Kuzmin distribution</a>. For the second system, the probability that a digit is equal to <em>k</em>, is 4 / (<em>k</em>(<em>k</em>+1)(<em>k</em>+2)), see Example 1 <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">in this article</a>. In general, the probability in question is equal to <em>F</em>(<em>q</em>(<em>k</em>)) - <em>F</em>(<em>q</em>(<em>k</em>+1)) for <em>k</em> = 1, 2, and so on. Clearly, the distribution of these digits can be used to quantify the level of chaos in the system. For continued fractions, the expected value of an arbitrary digit is infinite (though it is finite and well known for the logarithm of a digit, see <a href="https://en.wikipedia.org/wiki/Khinchin%27s_constant" target="_blank" rel="noopener">here</a>), while it is finite (equal to 2) for the second system. Yet each system, given enough time, will shoot arbitrarily large digits. Another way to quantify chaos in a dynamical system is to look at the auto-correlation structure of the sequence (<em>x<span style="font-size: 8pt;">n</span></em>). Auto-correlations very close to zero, decaying very fast, are associated with highly chaotic systems. In the case of continued fraction, the lag-1 auto-correlation, defined as the limit of the empirical auto-correlation on a sequence starting with (say) <em>x</em><span style="font-size: 8pt;">0</span> = <span>π - 3, is </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641579290?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641579290?profile=RESIZE_710x" width="250" class="align-center"/></a></span></p>
<p><span>where <em>γ</em> is the <a href="https://en.wikipedia.org/wiki/Euler%E2%80%93Mascheroni_constant" target="_blank" rel="noopener">Euler–Mascheroni constant</a>, see Appendix 2 <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">in this article</a>. This is probably a new result, never published before.</span></p>
<p><span>Below is a picture featuring the successive values of <em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>) for the smoother dynamical system mentioned above. These values are close to the digits <em>d<span style="font-size: 8pt;">n</span></em>. the initial condition is <em>x</em><span style="font-size: 8pt;">0</span> = π - 3. In my next article, I will further discuss a new way to define and measure chaos in these various systems.</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641636094?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641636094?profile=RESIZE_710x" width="500" class="align-center"/></a></span></p>
<p><span>The first 5,500 values of <em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>), for <em>n</em> = 0, 1, 2 and so on, are featured in the above picture. Think about what business, natural or industrial process could be modeled by such kinds of time series! The possibilities are endless. For instance, it could represent meteorite hits over a large time period, with a few large values representing massive impacts. Clearly, it can be used in outlier, extreme events, and risk modeling. </span></p>
<p>Finally, here is another example, this time based on an unrelated different bivariate dynamical system on the grid (the cat map), used for image encryption. This is a<span> mapping on a picture of a pair of cherries. The image is 74 pixels wide, and takes 114 iterations to be restored, although it appears upside-down at the halfway point (the 57th iteration). Source: <a href="https://en.wikipedia.org/wiki/Arnold%27s_cat_map" target="_blank" rel="noopener">here</a>. </span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641638058?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641638058?profile=RESIZE_710x" class="align-center"/></a></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>A Plethora of Machine Learning Articles: Part 2tag:www.datasciencecentral.com,2021-03-04:6448529:BlogPost:10416792021-03-04T01:44:59.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8629159091?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8629159091?profile=RESIZE_710x" width="400"></img></a></p>
<div class="xg_headline xg_headline-img xg_headline-2l"><div class="tb"><p><a class="xg_sprite xg_sprite-view" href="https://www.datasciencecentral.com/profiles/blog/list?user=3v6n5b6g08kgn"></a></p>
</div>
</div>
<div class="xg_module_body"><div class="postbody"><div class="xg_user_generated"><p style="text-align: center;"><em>Source:…</em></p>
</div>
</div>
</div>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8629159091?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8629159091?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<div class="xg_headline xg_headline-img xg_headline-2l"><div class="tb"><p><a class="xg_sprite xg_sprite-view" href="https://www.datasciencecentral.com/profiles/blog/list?user=3v6n5b6g08kgn"></a></p>
</div>
</div>
<div class="xg_module_body"><div class="postbody"><div class="xg_user_generated"><p style="text-align: center;"><em>Source: see<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/more-beautiful-math-images" target="_blank" rel="noopener">here</a></em></p>
<p><span>Part 1 of this short series focused on the business analytics / BI / operational research aspects, see <a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-1" target="_blank" rel="noopener">here</a>. In this Part 2, you will find the most interesting machine learning and statistics articles that I wrote in the last few years, focusing on core technical aspects. The whole series will feature articles related to the following aspects of machine learning:</span></p>
<ul>
<li><span>Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science)</span></li>
<li><span>Opinions, for instance about the value of a PhD in our field, or the use of some techniques</span></li>
<li><span>Methods, principles, rules of thumb, recipes, tricks</span></li>
<li><span>Business analytics (Part 1)</span></li>
</ul>
<p><span>My articles are always written in simple English and accessible to professionals with typically one year of calculus or statistical training, at the undergraduate level. They are geared towards people who use data but are interesting in gaining more practical analytical experience. Managers and decision makers are part of my intended audience. The style is compact, geared towards people who do not have a lot of free time. </span></p>
<p><span>Despite these restrictions, state-of-the-art, of-the-beaten-path results as well as machine learning trade secrets and research material are frequently shared. References to more advanced literature (from myself and other authors) is provided for those who want to dig deeper in the interested topics discussed. </span></p>
<p><span style="font-size: 14pt;"><strong>1. Core techniques</strong></span></p>
<p><span>These articles focus on techniques that have wide applications or that are otherwise fundamental or seminal in nature.</span></p>
<ol>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/introducing-an-all-purpose-robust-fast-simple-non-linear-r22" target="_blank" rel="noopener">Introducing an All-purpose, Robust, Fast, Simple Non-linear Regression</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/chaos-attractors-in-machine-learning-systems" target="_blank" rel="noopener">Variance, Attractors and Behavior of Chaotic Statistical Systems</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-family-of-generalized-gaussian-distributions" target="_blank" rel="noopener">New Family of Generalized Gaussian Distributions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-approach-to-linear-algebra-in-machine-learning" target="_blank" rel="noopener">Gentle Approach to Linear Algebra, with Machine Learning Applications</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/confidence-intervals-without-pain" target="_blank" rel="noopener">Confidence Intervals Without Pain</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/modern-re-sampling-and-statistical-recipes" target="_blank" rel="noopener">Re-sampling: Amazing Results and Applications</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-automatically-determine-the-number-of-clusters-in-your-dat" target="_blank" rel="noopener">How to Automatically Determine the Number of Clusters in your Data</a> - and more</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/decomposition-of-statistical-distributions-using-mixture-models-a" target="_blank" rel="noopener">New Perspectives on Statistical Distributions and Deep Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-original-underused-statistical-tests" target="_blank" rel="noopener">A Plethora of Original, Not Well-Known Statistical Tests</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/pattern-recognition-techniques-application-to-new-decimal-systems?xg_source=activity" target="_blank" rel="noopener">New Decimal Systems - Great Sandbox for Data Scientists and Mathematicians</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/are-the-digits-of-pi-truly-random" target="_blank" rel="noopener">Are the Digits of Pi Truly Random?</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/data-science-and-machine-learning-without-mathematics" target="_blank" rel="noopener">Data Science and Machine Learning Without Mathematics</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel" target="_blank" rel="noopener">Advanced Machine Learning with Basic Excel</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/state-of-the-art-machine-learning-automation-with-hdt" target="_blank" rel="noopener">State-of-the-Art Machine Learning Automation with HDT</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/building-outiler-resistant-centroids-in-any-dimension" target="_blank" rel="noopener">Tutorial: Neutralizing Outliers in Any Dimension</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/the-fundamental-statistics-theorem-revisited" target="_blank" rel="noopener">The Fundamental Statistics Theorem Revisited</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/variance-clustering-test-of-hypotheses-and-density-estimation-rev" target="_blank" rel="noopener">Variance, Clustering, and Density Estimation Revisited</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/the-death-of-the-statistical-test-of-hypothesis" target="_blank" rel="noopener">The Death of the Statistical Tests of Hypotheses</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/5-easy-steps-to-structure-highly-unstructured-big-data" target="_blank" rel="noopener">4 Easy Steps to Structure Highly Unstructured Big Data, via Automated Indexation</a> </li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/the-best-kept-secret-about-linear-and-logistic-regression" target="_blank" rel="noopener">The best kept secret about linear and logistic regression</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/black-box-confidence-intervals-excel-and-perl-implementations-det" target="_blank" rel="noopener">Black-box Confidence Intervals: Excel and Perl Implementation</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/comparing-linear-regression-with-the-jackknife-method" target="_blank" rel="noopener">Jackknife and linear regression in Excel: implementation and comparison</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/jackknife-logistic-and-linear-regression" target="_blank" rel="noopener">Jackknife logistic and linear regression for clustering and predictions</a></li>
</ol>
<p><span style="font-size: 14pt;"><strong>2. Free books</strong></span></p>
<ul>
<li><span><b>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning">here</a>. In about 300 pages and 28 chapters it covers many new topics, offering a fresh perspective on the subject, including rules of thumb and recipes that are easy to automate or integrate in black-box systems, as well as new model-free, data-driven foundations to statistical science and predictive analytics. The approach focuses on robust techniques; it is bottom-up (from applications to theory), in contrast to the traditional top-down approach.</span></p>
<p><span>The material is accessible to practitioners with a one-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications with numerous illustrations, is aimed at practitioners, researchers, and executives in various quantitative fields.</span></p>
</li>
<li><span><b>Applied Stochastic Processes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">here</a>. Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems (104 pages, 16 chapters.) This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject.</span></p>
<p><span>It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</span></p>
</li>
</ul>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
</div>
</div>
</div>A Plethora of Machine Learning Articles: Part 1tag:www.datasciencecentral.com,2021-02-21:6448529:BlogPost:10343672021-02-21T23:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8582358874?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8582358874?profile=RESIZE_710x" width="400"></img></a></p>
<p><em>Source: see <a href="https://www.datasciencecentral.com/profiles/blogs/more-beautiful-math-images" rel="noopener" target="_blank">here</a></em></p>
<p><span style="font-size: 12pt;">In Part 1 of this short series, I have included the most interesting articles that I wrote in the last few years. This part focuses on the business analytics / BI /…</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8582358874?profile=original" target="_blank" rel="noopener"><img width="400" class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8582358874?profile=RESIZE_710x"/></a></p>
<p><em>Source: see <a href="https://www.datasciencecentral.com/profiles/blogs/more-beautiful-math-images" target="_blank" rel="noopener">here</a></em></p>
<p><span style="font-size: 12pt;">In Part 1 of this short series, I have included the most interesting articles that I wrote in the last few years. This part focuses on the business analytics / BI / operational research aspects. The next parts will focus on</span></p>
<ul>
<li><span style="font-size: 12pt;">Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science)</span></li>
<li><span style="font-size: 12pt;">Opinions, for instance about the value of a PhD in our field, or the use of some techniques</span></li>
<li><span style="font-size: 12pt;">Methods, principles, rules of thumb, recipes, tricks</span></li>
</ul>
<p><span style="font-size: 12pt;">My articles are always written in simple English and accessible to professionals with typically one year of calculus or statistical training, at the undergraduate level. They are geared towards people who use data but are interesting in gaining more practical analytical experience. Managers and decision makers are part of my intended audience. The style is compact, geared towards people who do not have a lot of free time. </span></p>
<p style="text-align: center;"><em><a href="https://www.datasciencecentral.com/profiles/blogs/more-beautiful-math-images" target="_blank" rel="noopener"></a></em></p>
<p><span style="font-size: 12pt;">Despite these restrictions, state-of-the-art, of-the-beaten-path results as well as machine learning trade secrets and research material are frequently shared. References to more advanced literature (from myself and other authors) is provided for those who want to dig deeper in the interested topics discussed. </span></p>
<p><span style="font-size: 12pt;">Before starting, let me mention in section 1 two books that I wrote recently, available to all Data Science Central members.</span></p>
<p><span style="font-size: 14pt;"><strong>1. Free books</strong></span></p>
<ul>
<li><span style="font-size: 12pt;"><b>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</b></span><p><span style="font-size: 12pt;">Available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning">here</a>. In about 300 pages and 28 chapters it covers many new topics, offering a fresh perspective on the subject, including rules of thumb and recipes that are easy to automate or integrate in black-box systems, as well as new model-free, data-driven foundations to statistical science and predictive analytics. The approach focuses on robust techniques; it is bottom-up (from applications to theory), in contrast to the traditional top-down approach.</span></p>
<p><span style="font-size: 12pt;">The material is accessible to practitioners with a one-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications with numerous illustrations, is aimed at practitioners, researchers, and executives in various quantitative fields.</span></p>
</li>
<li><span style="font-size: 12pt;"><b>Applied Stochastic Processes</b></span><p><span style="font-size: 12pt;">Available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">here</a>. Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems (104 pages, 16 chapters.) This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject.</span></p>
<p><span style="font-size: 12pt;">It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</span></p>
</li>
</ul>
<p><span style="font-size: 14pt;"><strong>2. Business related articles</strong></span></p>
<p><span style="font-size: 12pt;">These articles focus on business applications and other matters relevant to being a data scientist working in the Industry. They are accessible to a wide audience, in the sense that they are less technical than many of my 200+ other articles.</span></p>
<ol>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/data-science-foundations-for-a-new-stock-market" target="_blank" rel="noopener">New Stock Trading and Lottery Game Rooted in Deep Math</a></span></li>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/data-science-wizardry" target="_blank" rel="noopener">Time series, Growth Modeling and Data Science Wizardy</a> </span></li>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-stabilize-data-to-avoid-decay-in-model-performance" target="_blank" rel="noopener">How to Stabilize Data Systems, to Avoid Decay in Model Performance</a></span></li>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/10-differences-between-junior-and-senior-data-scientist" target="_blank" rel="noopener">22 Differences Between Junior and Senior Data Scientists</a></span></li>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/the-first-things-you-should-learn-as-a-data-scientist-not-what-yo" target="_blank" rel="noopener">The First Things you Should Learn as a Data Scientist - Not what you Think</a></span></li>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning" target="_blank" rel="noopener">Difference between Machine Learning, Data Science, AI, Deep Learning, and Statistics</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/20-data-science-systems-used-by-amazon-to-operate-its-business" target="_blank" rel="noopener">21 data science systems used by Amazon to operate its business</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/life-cycle-of-data-science-projects" target="_blank" rel="noopener">Life Cycle of Data Science Projects</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/40-techniques-used-by-data-scientists" target="_blank" rel="noopener">40 Techniques Used by Data Scientists</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/helping-facebook-design-better-machine-learning-algorithms" target="_blank" rel="noopener">Designing better algorithms: 5 case studies</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/the-data-science-zoo" target="_blank" rel="noopener">Architecture of Data Science Projects</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/24-uses-of-statistical-modeling-part-ii" target="_blank" rel="noopener">24 Uses of Statistical Modeling (Part II)</a> | <a href="http://www.datasciencecentral.com/profiles/blogs/top-20-uses-of-statistical-modeling" target="_blank" rel="noopener">(Part I)</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/the-abcd-s-of-business-optimization" target="_blank" rel="noopener">The ABCD's of Business Optimization</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/is-data-science-a-sin-against-the-norms-of-statisticians" target="_blank" rel="noopener">What you won't learn in stats classes</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/biased-vs-unbiased-debunking-statistical-myths" target="_blank" rel="noopener">Biased vs Unbiased: Debunking Statistical Myths</a></span></li>
</ol>
<p></p>
<p><span style="font-size: 12pt;"><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span style="font-size: 12pt;"><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
<p></p>Maximum runs in Bernoulli trials: simulations and resultstag:www.datasciencecentral.com,2021-02-16:6448529:BlogPost:10293412021-02-16T08:00:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8561683465?profile=original" rel="noopener" target="_blank"><img class="align-full" src="https://storage.ning.com/topology/rest/1.0/file/get/8561683465?profile=RESIZE_710x" width="720"></img></a></p>
<p>Bernoulli trials are <span>random</span><span> experiments with two possible outcomes: "yes" and "no" (in the case of polls), </span><span> "success" and "failure" (in the case of gambling or clinical trials). The trials are independent from each other: for instance tossing a coin multiple times, or testing the success of a new drug against a specific…</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8561683465?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8561683465?profile=RESIZE_710x" width="720" class="align-full"/></a></p>
<p>Bernoulli trials are <span>random</span><span> experiments with two possible outcomes: "yes" and "no" (in the case of polls), </span><span> "success" and "failure" (in the case of gambling or clinical trials). The trials are independent from each other: for instance tossing a coin multiple times, or testing the success of a new drug against a specific medical condition, on multiple patients: improvements for a specific patient is viewed as a success, lack of improvement as a failure. </span></p>
<p><span>Here we are interested in maximum runs of successes (also called record runs), when they are expected to occur, and their expected length or duration. While the classical application is in games of chance, we will discuss an exciting application in number theory, more specifically, very good approximations of irrational numbers by rational numbers, and numeration systems with a non-integer base. We will also consider the case where the trials are not independent, and where there are more than two outcomes. For instance, if throwing a dice rather than a coin, there are six rather than two outcomes.</span></p>
<p><span>The data used here is simulated and allows us to get some good approximations for a number of interesting statistics. It is based on an unusual pseudo-random number generator that is very relevant to the problem being studied. A more theoretical approach can be found <a href="https://www.csun.edu/~hcmth031/tspolr.pdf" target="_blank" rel="noopener">here</a>, with connections to extreme value theory and the Gumbel distribution. See also my previous article <em>Distribution of Arrival Times for Extreme Events</em>, posted <a href="https://www.datasciencecentral.com/profiles/blogs/distribution-of-arrival-times-of-extreme-events" target="_blank" rel="noopener">here</a>. </span></p>
<p><span style="font-size: 14pt;"><strong>1. Simulations and theoretical results</strong></span></p>
<p>Bernoulli trials with <em>b</em> potential outcomes, each with the same probability of occurring, can be simulated using the following system. Start with some irrational number <em>x</em><span style="font-size: 8pt;">0</span> in [0, 1], say <em>x</em><span style="font-size: 8pt;">0</span> = log 2 (called the <em>seed</em>), and use the following iterations:</p>
<p style="text-align: center;"><em>a<span style="font-size: 8pt;">n</span></em> = INT(<em>b x<span style="font-size: 8pt;">n</span></em>)</p>
<p style="text-align: center;"><em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span> = <em>b x<span style="font-size: 8pt;">n</span></em> - INT(<em>b x<span style="font-size: 8pt;">n</span></em>).</p>
<p>INT represents the integer part function. The result of the <em>n</em>-th trial is <em>a<span style="font-size: 8pt;">n</span></em>: it is a coding integer between 0 and <em>b</em> - 1 inclusive, representing for instance the result of throwing a dice with <em>b</em> sides labeled 0, ..., <em>b</em> - 1. Also, <em>a<span style="font-size: 8pt;">n</span></em> is the <em>n</em>-th digit of <em>x</em><span style="font-size: 8pt;">0</span> in base <em>b</em>. These digits are strongly conjectured to be independent from each other, and have the same probability 1 / <em>b</em> to take on any of the <em>b</em> potential values. Thus this scheme can be used to simulate the Bernoulli trials in question. Also, unlike traditional pseudorandom number generators, it does not produce periodic sequences. Such a system can be viewed as a chaotic dynamical system, just like the sine map discussed in my previous article, <a href="https://www.datasciencecentral.com/profiles/blogs/beautiful-mathematical-images" target="_blank" rel="noopener">here</a>. </p>
<p>The Bernoulli trials generated with <em>x</em><span style="font-size: 8pt;">0</span>, that is the sequence <em>a</em><span style="font-size: 8pt;">0</span>, <em>a</em><span style="font-size: 8pt;">1</span>, and so on, constitutes just one instance of a Bernoulli experiment. If you try with <em>N</em> different seeds (the number <em>x</em><span style="font-size: 8pt;">0</span>), then you end up with <em>N</em> different, independent instances of Bernoulli experiments sharing the same dynamics, and things start to become interesting.</p>
<p><strong>1.1. Simulations</strong></p>
<p>I performed <em>N</em> = 200 simulations, each representing a Bernoulli experiment starting with a different seed <em>x</em><span style="font-size: 8pt;">0</span> each time, each consisting of 1,000,000 trials, with <em>b</em> = 3. Possible outcomes of each trial are 0, 1 or 2. I looked at successive record runs of zeros. For one of these experiments (a typical case), I've found this:</p>
<ul>
<li>One isolated zero (the first occurrence of zero) starts at position <em>n</em> = 3</li>
<li>The first run of 2 zeros starts at position 13 in the digits expansion</li>
<li>The next longer run consists of 3 zeros, starting at position 69</li>
<li>The next longer one (4 zeros) starts at position 132</li>
<li>Then we have 5 zeros starting at position 670, then 6 starting at position 743, 8 starting at position 13411, 10 starting at position 58454, and 12 starting at position 384100.</li>
</ul>
<p>The observations can be summarized by the following bivariate sequence:</p>
<p style="text-align: center;">(3,1), (13,2), (69,3), (132,4), (670,5), (743,6), (13411,8), (58454,10), (384100,12), …</p>
<p>If you blend all the sequences of vectors (<em>X</em>, <em>Y</em>) together, from the 200 experiments, you get the following: </p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8558262452?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8558262452?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>Record runs of Y zeros vs the position X at which they occur in a Bernoulli experiment</em></p>
<p>Note that in Figure 1, the plot represents <em>Y</em> versus log(<em>X</em>), and <em>b</em> = 3. A record run equal to <em>Y</em> means that starting at position <em>X</em>, we observe the first instance of a (record) run consisting of <em>Y</em> consecutive zeros, in at least one of the <em>N</em> experiments. In Figure 2 featuring aggregated data, you can see the average log(<em>X</em>) computed across the <em>N</em> = 200 experiments, for any record run of length <em>Y</em> = 0, 1, 2, and so on (up to <em>Y</em> = 13). The chart speaks for itself; in the linear fit in Figure 2, the slope approaches log <em>b</em> as <em>N</em> tends to infinity.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8558364664?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8558364664?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>Same as Figure 1, with log(X) averaged across the N = 200 experiments</em></p>
<p><strong>1.2. Theory</strong></p>
<p>A lot of theoretical results are known for maximum runs. We present a few of them here, with additional references. Note that in my article, I focus on record runs, which are different from maximum runs: in any Bernoulli experiment, maximum runs correspond to the first occurrence of a run of length 2, 3, 4, and so on. Record runs, as in the example outlined at the beginning of section 1, do not necessarily increase by unit increments: in my example, the first run of length 7 (not a record) occurs after the first (record) run of length 8. In short, you see a run of length 8 before you see one of length 7.</p>
<p>The main theoretical results, provided by <a href="https://mathoverflow.net/questions/383353/distribution-of-the-first-occurrence-of-a-maximum-record-run-of-zeros-in-the-d/383388#383388" target="_blank" rel="noopener">Yuval Peres</a>, are:</p>
<ul>
<li>Let <em>R<span style="font-size: 8pt;">n</span></em> be the length of the longest run in the first <em>n</em> digits. Then <em>R<span style="font-size: 8pt;">n</span></em> log(<em>b</em>) / log(<em>n</em>) tends to 1 almost surely as <em>n</em> tends to infinity. It was first proved by Renyi, see the discussion in reference [1].</li>
<li>The waiting times <em>T<span style="font-size: 8pt;">k</span></em> for the occurrence of a run of length <em>k</em> satisfy that <em>T<span style="font-size: 8pt;">k</span></em> / E(<em>T<span style="font-size: 8pt;">k</span></em>) is asymptotically exponentially distributed with mean 1. See references [2] - [4]. We also have (see reference [5] and [7]): <a href="https://storage.ning.com/topology/rest/1.0/file/get/8558795279?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8558795279?profile=RESIZE_710x" width="100" class="align-center"/></a></li>
</ul>
<p>All references are in section 3. Note that these theoretical results apply to any run, not just runs of zeros. </p>
<p><span style="font-size: 14pt;"><strong>2. Application and generalization</strong></span></p>
<p>If you replace the integer <em>b</em> by a non integer (strictly larger than 1), then the Bernoulli trials will inherit the properties of that unusual numeration system:</p>
<ul>
<li>The number of potential outcomes, for any trial, is INT(<em>b</em>), the integer part of <em>b</em></li>
<li>The trials are no longer independent: the <em>n</em>-th outcome <em>a<span style="font-size: 8pt;">n</span></em> is correlated with <em>a<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span></li>
<li>Outcomes have different probabilities: P(<em>a<span style="font-size: 8pt;">n</span></em> = 0) is not the same as P(<em>a<span style="font-size: 8pt;">n</span></em> = 1)</li>
</ul>
<p>Nevertheless, once can still perform the same simulations to estimate the statistics of interest. If <em>b</em> is a quadratic irrational, the corresponding successive outcomes (the <em>a<span style="font-size: 8pt;">n</span></em>'s) follow a Markov chain model. See <a href="https://www.jstage.jst.go.jp/article/jmath1948/26/1/26_1_33/_pdf" target="_blank" rel="noopener">here</a> for the theoretical details.</p>
<p>Regardless of whether <em>b</em> is an integer or not, the application we are interested in is the approximation of irrational numbers by a specific class of numbers. This is usually done using continued fractions if the class of numbers in question consists of the rational numbers, and there is an abundant literature on this topic, see for instance <a href="https://mathoverflow.net/questions/383142/algebraic-and-rational-parts-of-a-real-number" target="_blank" rel="noopener">here</a>. However, we focus instead on best approximations of an irrational number <em>x</em><span style="font-size: 8pt;">0</span> in [0, 1] by a rational number <em>β<span style="font-size: 8pt;">n</span></em>, where</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8558617871?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8558617871?profile=RESIZE_710x" width="120" class="align-center"/></a></p>
<p>Note that <em>β<span style="font-size: 8pt;">n</span></em><span> can be expressed as </span><em>p<span style="font-size: 8pt;">n</span></em> / <em>q<span style="font-size: 8pt;">n</span></em>, a quotient of two integers if <em>b</em> is an integer, with <em>q<span style="font-size: 8pt;">n</span></em> being equal to <em>b</em> at the power <em>n</em>. The best approximation is obtained when the <em>a<span style="font-size: 8pt;">k</span></em>'s are the successive outcomes of the Bernoulli experiment with seed <em>x</em><span style="font-size: 8pt;">0</span>, or in other words, the first <em>n</em> digits of <em>x</em><span style="font-size: 8pt;">0</span> in base <em>b</em>. The approximation is exceptionally good if the last digit <em>a<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">-1</span> is not zero, and it is followed ty a record run of digits equal to zero. The length of that run is expected to be asymptotically of the order to (log <em>n</em>) / (log <em>b</em>). It can not be better than that, for a fixed <em>n</em>. Therefore, I propose the following conjecture, based on the probability distributions associated with extreme (record) runs discussed in section 1.</p>
<p><strong>Conjecture</strong></p>
<p>For most numbers <em>x</em><span style="font-size: 8pt;">0</span> in [0, 1], and for any <span><em>ε</em> > 0,</span> if <em>p</em> / <em>q</em> is an approximation of <em>x</em><span style="font-size: 8pt;">0</span>, with <em>p</em>, <em>q</em> co-prime positive integers, we have</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8558683066?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8558683066?profile=RESIZE_710x" width="150" class="align-center"/></a></p>
<p>The details as how I came to this conjecture are outlined in the section<em> Connection with approximations of irrationals by rational numbers</em>, in <a href="https://mathoverflow.net/questions/383353/distribution-of-the-first-occurrence-of-a-maximum-record-run-of-zeros-in-the-d/" target="_blank" rel="noopener">this article</a>. While this is beyond the scope of this article, a discussion of best approximations by continued fractions leads to a similar conclusion. In particular, if <em>p<span style="font-size: 8pt;">n</span></em> / <em>q<span style="font-size: 8pt;">n</span></em> is the <em>n</em>-th convergent of the number <em>x</em>, we have the following result, see last theorem in <a href="https://math.colorado.edu/~rohi1040/expository/ergodicthysimplecontfracs.pdf" target="_blank" rel="noopener">this article</a>, pictured below. In short, it says that if <span><em>ε</em> = 0, then only some proportion of all numbers <em>x</em><span style="font-size: 8pt;">0</span> will satisfy the above inequality. With <em>ε</em> > 0, almost all <em>x</em><span style="font-size: 8pt;">0</span> will. </span></p>
<p></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8585977467?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8585977467?profile=RESIZE_710x" width="600" class="align-center"/></a></span></p>
<p></p>
<p>Finally, record runs in Bernoulli trials is a topic of combinatorial analysis, and thus relevant to machine learning, with numerous applications in combinatorics. Also, you can learn more about non-integer bases in <a href="https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness" target="_blank" rel="noopener">this article</a>. A summary table is available <a href="https://www.datasciencecentral.com/profiles/blogs/number-representation-systems-explained-in-one-picture" target="_blank" rel="noopener">here</a>.</p>
<p><span style="font-size: 14pt;"><strong>3. References</strong></span></p>
<p>[1] Schilling, Mark F. <em>The longest run of heads</em>. The College Mathematics Journal 21, no. 3 (1990): 196-207.</p>
<p>[2] Aldous, David. <em>Probability approximations via the Poisson clumping heuristic</em>. Vol. 77. Springer Science & Business Media, 2013.</p>
<p>[3] Földes, A. <em>The limit distribution of the length of the longest head-run</em>. Period Math Hung 10, 301–310</p>
<p>[4] Godbole, Anant P. <em>Poisson approximations for runs and patterns of rare events</em>. Advances in applied probability (1991): 851-865.</p>
<p>[5] Feller, William. <em>An introduction to probability theory and its applications</em>. 1957.</p>
<p>[6] Gerber, Hans U., and Shuo-Yen Robert Li. <em>The occurrence of sequence patterns in repeated experiments and hitting times in a Markov chain</em>. Stochastic Processes and their Applications 11, no. 1 (1981): 101-108.</p>
<p>[7] Li, Shuo-Yen Robert. <em>A martingale approach to the study of occurrence of sequence patterns in repeated experiments</em>. Annals of Probability 8, no. 6 (1980): 1171-1176.</p>
<p></p>
<p><em>To receive a weekly digest of our new articles, subscribe to our newsletter,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at<span> </span><a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></p>More Surprising Math Imagestag:www.datasciencecentral.com,2021-02-08:6448529:BlogPost:10226702021-02-08T04:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><em>To zoom in on any picture, click on the image to get a higher resolution.</em></p>
<p>This a follow up to my previous article <a href="https://www.datasciencecentral.com/profiles/blogs/beautiful-mathematical-images" rel="noopener" target="_blank">here</a>, where you can find additional, very different images, the theory behind it, and relevance to machine learning techniques. What is surprising is that all these images were produced with a formula with a single parameter <em>λ</em>, and…</p>
<p><em>To zoom in on any picture, click on the image to get a higher resolution.</em></p>
<p>This a follow up to my previous article <a href="https://www.datasciencecentral.com/profiles/blogs/beautiful-mathematical-images" target="_blank" rel="noopener">here</a>, where you can find additional, very different images, the theory behind it, and relevance to machine learning techniques. What is surprising is that all these images were produced with a formula with a single parameter <em>λ</em>, and they look very different depending on the value of <em>λ</em>. More precisely, they are generated using the following recursion:</p>
<p style="text-align: center;"><em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span><span> </span>=<span> </span><em>x<span style="font-size: 8pt;">n</span></em><span> </span>+ <em>λ</em><span> </span>sin(<em>y<span style="font-size: 8pt;">n</span></em>),</p>
<p style="text-align: center;"><em>y</em><span style="font-size: 8pt;"><em>n</em>+1</span><span> </span>=<span> </span><em>x<span style="font-size: 8pt;">n</span></em><span> </span>+ <em>λ</em><span> </span>sin(<em>x<span style="font-size: 8pt;">n</span></em>),</p>
<p>with initial conditions <em>x</em><span style="font-size: 8pt;">0</span>, <em>y</em><span style="font-size: 8pt;">0</span>. </p>
<p>Seven different groups of three images are displayed. In each group, the leftmost image, a scatterplot (in blue) corresponds to the orbit of (<em>x<span style="font-size: 8pt;">n</span></em>, <em>y<span style="font-size: 8pt;">n</span></em>) in two dimensions, given the initial conditions. The central images features <em>x<span style="font-size: 8pt;">n</span></em> and <em>y<span style="font-size: 8pt;">n</span></em> as two time series, with <em>x<span style="font-size: 8pt;">n</span></em> in blue and <em>y<span style="font-size: 8pt;">n</span></em> in red. In both cases, 20,000 iterations are used. The rightmost image is the same as the leftmost one, except that only the first 25 iterations are displayed, and a green curve connects the 25 dots, to show how the orbit looks like at the beginning. The initial vector (<em>x</em><span style="font-size: 8pt;">0</span>, <em>y</em><span style="font-size: 8pt;">0</span>) is not included in that image.</p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530324885?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530324885?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>x<span style="font-size: 8pt;">0</span> = 1, y<span style="font-size: 8pt;">0</span> = 4, λ = 0.04</em></p>
<p style="text-align: center;"></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530326887?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530326887?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>x<span style="font-size: 8pt;">0</span> = 1, y<span style="font-size: 8pt;">0</span> = 4, λ = 0.06</em></p>
<p style="text-align: center;"></p>
<p style="text-align: center;"><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530323258?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530323258?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3</strong>: <em>x<span style="font-size: 8pt;">0</span> = 3, y<span style="font-size: 8pt;">0</span> = 4, λ = 1.5</em></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530331493?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530331493?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 4</strong>: <em>x<span style="font-size: 8pt;">0</span> = 56, y<span style="font-size: 8pt;">0</span> = 4, λ = 0.04</em></p>
<p style="text-align: center;"></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530366692?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530366692?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 5</strong>: <em>x<span style="font-size: 8pt;">0</span> = 2, y<span style="font-size: 8pt;">0</span> = 4, λ = 10</em></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530385678?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530385678?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 6</strong>: <em>x<span>0</span> = 1, y<span>0</span> = 4, λ = 2.5</em></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530386883?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530386883?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 7</strong>: <em>x<span style="font-size: 8pt;">0</span> = 3, y<span style="font-size: 8pt;">0</span> = 4, λ = 2</em></p>
<p></p>
<p>As a bonus, here is another picture produced with a different type of chaotic dynamical system. It is discussed <a href="https://mathoverflow.net/questions/352967/is-this-a-new-strange-attractor" target="_blank" rel="noopener">here</a>. </p>
<p></p>
<p><em><a href="https://storage.ning.com/topology/rest/1.0/file/get/8582320259?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8582320259?profile=RESIZE_710x" width="400" class="align-center"/></a></em></p>
<p></p>
<p>Another interesting one can be found <a href="https://arxiv.org/pdf/1508.07814.pdf" target="_blank" rel="noopener">here</a> (page 21):</p>
<p><em><a href="https://storage.ning.com/topology/rest/1.0/file/get/8609092274?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8609092274?profile=RESIZE_710x" width="400" class="align-center"/></a></em></p>
<p></p>
<p><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></p>Beautiful Mathematical Imagestag:www.datasciencecentral.com,2021-02-02:6448529:BlogPost:10185032021-02-02T19:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><em>To zoom in on any picture, click on the image to get a higher resolution.</em></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8505475867?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8505475867?profile=RESIZE_710x" width="400"></img></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>The pillow basins (see section 3)</em></p>
<p style="text-align: center;"></p>
<p style="text-align: left;">The topic discussed here is closely related to optimization techniques in machine…</p>
<p><em>To zoom in on any picture, click on the image to get a higher resolution.</em></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8505475867?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8505475867?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>The pillow basins (see section 3)</em></p>
<p style="text-align: center;"></p>
<p style="text-align: left;">The topic discussed here is closely related to optimization techniques in machine learning. I will also talk about dynamic systems, especially discrete chaotic ones, in two dimensions. This is a fascinating branch of quantitative science, with numerous applications. This article provides you with an opportunity to gain exposure to this discipline, which is usually overlooked by data scientists but well studied by mathematicians and physicists. The images presented here are selected not just for their beauty, but most importantly for their intrinsic value: the practical insights that can be derived from them, and the implications for machine learning professionals. </p>
<p style="text-align: left;"></p>
<p><span style="font-size: 14pt;"><strong>1. Introduction to dynamical systems</strong></span></p>
<p>A discrete dynamical system is a sequence <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>f</em>(<em>x<span style="font-size: 8pt;">n</span></em>) where <em>n</em> is an integer, starting with <em>n</em> = 0 (the initial condition) and where <em>f</em> is a real-valued function. In the continuous version (not discussed here), the index <em>n</em> (also called time) is a real number. The function <em>f</em> is called the <em>map</em> of the system, the system itself is also called a <em>mapping</em>: the most studied one is the logistic map defined by <em>f</em>(<em>x</em>) = <span><em>ρ</em></span><em>x</em> (1 - <em>x</em>), with <em>x</em> in [0, 1]. When <span><em>ρ</em> = 4, it is fully chaotic. </span>The sequence (<em>x<span style="font-size: 8pt;">n</span></em>) for a specific initial condition <em>x</em><span style="font-size: 8pt;">0</span>, is called the <em>orbit</em>. </p>
<p>Another example of chaotic mapping is the digits in base <em>b</em> of an irrational number <em>z</em> in [0,1]. In this case, <em>x</em><span style="font-size: 8pt;">0</span> = <em>z</em>, <em>f</em>(<em>x</em>) = <em>bx</em> - INT(<em>bx</em>) and the <em>n</em>-th digit of <em>z</em> is INT(<em>bx<span style="font-size: 8pt;">n</span></em>). Here INT is the integer part function. It is studied in details in my book <em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems</em><span>, </span><span>available for free, <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" target="_blank" rel="noopener">here</a>. See also the second, large appendix in my free book </span><span><em>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</em>, available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">here</a>. Applications include the design of non-periodic pseudo-random number generators, cryptography, and even a new concept of number guessing (gambling or simulated stock market) where the winning numbers can be computed in advance with a public algorithm that requires trillions of years of computing time, while a fast, private algorithm is kept secret. See <a href="https://www.datasciencecentral.com/profiles/blogs/data-science-foundations-for-a-new-stock-market" target="_blank" rel="noopener">here</a>. </span></p>
<p>The concept easily generalizes to two dimensions. In this case <em>x<span style="font-size: 8pt;">n</span></em> is a vector or a complex number. Mappings in the complex plane are known to produce beautiful fractals; it has been used in fractal compression algorithms, to compress images. In one dimension, once in chaotic mode, they produce Brownian-like orbits, with applications in Fintech.</p>
<p><strong>1.1. The sine map</strong></p>
<p>Moving forward, we focus exclusively on a particular case of the <em>sine mapping</em>, both in one and two dimensions. This is one of the most simple nonlinear mappings, yet it is very versatile and produces a large number of varied and intriguing configurations. In one dimension, it is defined as follows:</p>
<p style="text-align: center;"><em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = -<em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>x<span style="font-size: 8pt;">n</span></em>).</p>
<p>In two dimensions, it is defined as</p>
<p style="text-align: center;"><em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = -<em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>y<span style="font-size: 8pt;">n</span></em>),</p>
<p style="text-align: center;"><em>y</em><span style="font-size: 8pt;"><em>n</em>+1</span> = -<em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>x<span style="font-size: 8pt;">n</span></em>).</p>
<p></p>
<p>This system is governed by two real parameters: <span><em>ρ</em> and</span> <span><em>λ</em>. Some of its properties and references are discussed <a href="https://mathoverflow.net/questions/382610/strange-behavior-of-x-n1-x-n-lambda-sin-x-n" target="_blank" rel="noopener">here</a>. </span></p>
<p><span style="font-size: 14pt;"><strong>2. Connection to machine learning optimization algorithms</strong></span></p>
<p>I need to introduce two more concepts before getting down to the interesting stuff. The first one is called <em>fixed point</em>. Note that a root is simply a value <em>x</em>* such that <em>f</em>(<em>x</em>*) = 0. Some systems don't have any root, some have one, some have several, and some have infinitely many, depending on the values of the parameters (in our case, depending on <em>ρ</em> and<em> λ</em>, see section 1.1). Some or all roots can be found using the following <em>fixed point</em> recursion: <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> + <em>f</em>(<em>x<span style="font-size: 8pt;">n</span></em>). In our case, this translates to using the following algorithm.</p>
<p><strong>2.1. Fixed point algorithm</strong></p>
<p>For our sine mapping defined in section 1.1, proceed as follows</p>
<p style="text-align: center;"><em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> - <em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>x<span style="font-size: 8pt;">n</span></em>)</p>
<p>in one dimension, or </p>
<p style="text-align: center;"><em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> - <em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>y<span style="font-size: 8pt;">n</span></em>),</p>
<p style="text-align: center;"><em>y</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> - <em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>x<span style="font-size: 8pt;">n</span></em>),</p>
<p>in two dimensions. If the sequences in question converge to some <em>x</em>* (one dimension) or <em>x</em>*, <em>y</em>* (two dimensions), then the limit in question is a fixed point of the system. To find as many fixed points as possible, you need to try many different initial conditions. Some initial conditions lead to one fixed point, some lead to another fixed point, some lead to nowhere. Some fixed points can never be reached no matter what initial conditions you use. This is illustrated later in this article. </p>
<p><strong>2.2. Connection to optimization algorithms</strong></p>
<p>Optimization techniques are widely used in machine learning and statistical science, for instance in deep neural networks, or if you want to find a maximum likelihood estimator.</p>
<p>When looking for the maxima or minima of a function <em>f</em>, you try to find the roots of the derivative of <em>f</em> (in one dimension) or by vanishing its gradient (in two dimensions). This is typically done using the Newton Raphson method, which is a particular type of fixed point algorithm, with quadratic convergence.</p>
<p><strong>2.3. Basins of attraction</strong></p>
<p>The second concept I introduce is <em>basins of attraction</em>. A basin of attraction is the full set of initial conditions such that when applying the fixed point algorithm in section 2.2, the fixed point iterations always converge to the same root <em>x</em>* of the system.</p>
<p>Let me illustrate this with the one-dimensional sine mapping, with <em>ρ</em> = 0 and <em>λ </em>= 1. The roots of the system are solutions to sin(<em>x</em>) = 0, that is <em>x</em>* = <em>k</em><span><em>π</em>, where <em>k</em> is any positive or negative integer. If the initial condition <em>x</em><span style="font-size: 8pt;">0</span> is anywhere in the open interval ]2<em>kπ</em>, 2(<em>k</em>+1)<em>π</em>[, then the fixed point algorithm always converge to the same <em>x</em>* = (2<em>k</em> + 1)<em>π</em>. So each of these intervals constitute a distinct basin of attraction, and there are infinitely many of them. However, none of the roots <em>x</em>* = 2<em>kπ</em> can be reached regardless of the initial condition <em>x</em><span style="font-size: 8pt;">0</span>, unless <em>x</em><span style="font-size: 8pt;">0</span> = <em>x</em>* = 2<em>kπ</em> itself. </span></p>
<p><span>In two dimensions, the basins of attractions look beautiful when plotted. Some have fractal boundaries. I believe none of their boundaries have an explicit, closed-form equation, except in trivial cases. This is illustrated in section 3, featuring the beautiful images promised at the beginning. </span></p>
<p><strong>2.4. Final note about the one-dimensional sine map</strong></p>
<p><span>The sequence <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>x<span style="font-size: 8pt;">n</span></em>) behaves as follows. Here we assume <em>λ</em> > 0 and <em>ρ</em> = 0.</span></p>
<ul>
<li><span>If <em>λ </em> < 1, it converges to a root <em>x</em>*</span></li>
<li><span>If <em>λ =</em> 4, it oscillates constantly in a narrow horizontal band, never converging</span></li>
<li><span>If <em>λ </em> > 6, it behaves chaotically as a Brownian motion, unbounded, with the following exception below</span></li>
</ul>
<p><span>There is a very narrow interval around <em>λ =</em> 8, where behavior is non-chaotic. In that case, <em>x<span style="font-size: 8pt;">n</span></em> is asymptotically equivalent to +2<em>π n</em> or - 2<em>π n</em>, and the sign depends on the initial condition <em>x</em><span style="font-size: 8pt;">0</span>, and is very sensitive to it. In addition, for instance if <em>x</em><span style="font-size: 8pt;">0</span> = 2 and <em>λ </em>= 8, then <em>x</em><span style="font-size: 8pt;">2<em>n</em></span> - <em>x</em><span style="font-size: 8pt;">2<em>n</em>-1</span> gets closer and closer to <em>α</em> = 7.939712..., and <em>x</em><span style="font-size: 8pt;">2<em>n</em>-1</span> - <em>x</em><span style="font-size: 8pt;">2<em>n</em>-2</span> gets closer and closer to <em>β</em> = -1.65653... as <em>n</em> increases, with <em>α</em> + <em>β</em> = 2<em>π</em>. Furthermore, <em>α</em> satisfies the equation</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8505364456?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8505364456?profile=RESIZE_710x" width="300" class="align-center"/></a></span></p>
<p><span style="font-size: 12pt;">For details, see <a href="https://mathoverflow.net/questions/382610/strange-behavior-of-x-n1-x-n-lambda-sin-x-n" target="_blank" rel="noopener">here</a>. The phenomenon in question is pictured in Figure 2 below. </span></p>
<p><span style="font-size: 10pt;"><a href="https://storage.ning.com/topology/rest/1.0/file/get/8507389694?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8507391075?profile=RESIZE_710x" width="400" class="align-center"/></a></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8507394283?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8507394283?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><span style="font-size: 12pt;"><strong>Figure 2</strong>: <em>x<span style="font-size: 8pt;">n</span> for n = 0 to 20,000 (X-axis), with x<span style="font-size: 8pt;">0</span> = 2; λ = 8 (top), λ = 7.98 (bottom)</em></span></p>
<p><span style="font-size: 14pt;"><strong>3. Beautiful math images and their implications</strong></span></p>
<p>The first picture (Figure 1, at the top of the article) features part of the four non-degenerate basins of attraction in the 2-dimensional sine map, when <span><em>λ =</em> 2 and <em>ρ </em>= 0.75. This sine map has 49 = 7 x 7 roots (<em>x</em>*, <em>y</em>*) with <em>x</em>* one of the 7 solutions of <em>ρ</em>x = <em>λ </em>sin(<em>λ</em> sin(<em>x</em>) / <em>ρ</em>), and <em>y</em>* also one of the 7 solutions of the same equation. Computations were performed using the fixed point algorithm described in section 2.1. Note that the white zone corresponds to initial conditions (<em>x</em><span style="font-size: 8pt;">0</span>, <em>y</em><span style="font-size: 8pt;">0</span>) that do not lead to convergence of the fixed point algorithm. Each basin is assigned one color (other than white), and is made of sections of pillows with same color, scattered all over across many pillows. I call it the pillow basins. It would be interesting to see if the basin boundaries can be represented by simple mathematical functions. One degenerate basin (the fifth basin) consisting of the diagonal line <em>x</em> = <em>y</em>, is not displayed in Figure 1.</span></p>
<p>The picture below (Figure 3) shows parts of 5 of the infinitely many basins of attractions corresponding to <span><em>λ</em></span> = 0.5 and <span><em>ρ</em></span> = 0, for the 2-dimensional site map. As in figure 1, the X-axis represents <em>x</em><span style="font-size: 8pt;">0</span>, the Y-axis represents <em>y</em><span style="font-size: 8pt;">0</span>. The range is from -4 to 4 both in Figure 1 and Figure 3. Each basin has its own color.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8507115889?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8507115889?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><span><strong>Figure 3</strong>: <em>The octopus basins</em></span></p>
<p><span>In this case, we have infinitely many roots (with <em>x</em>*, <em>y</em>* being a multiple of <em>π</em>) but only one-fourth of them can be reached by the fixed point algorithm. The more roots, the more basins, and as a result, the more interference between basins, making the image look noisy: a very small change in the initial conditions can lead to convergence to a different root, thus the overlapping between the basins. </span></p>
<p><span>The take out from this is that when dealing with an optimization problem with many local maxima and minima, the solution you get is very sensitive to the initial conditions. In some cases, it matters, and in some cases it does not. If you are looking for a local optimum only, this is not an issue. This is further illustrated in Figure 4 below. It shows the orbits - that is the locations of (<em>x<span style="font-size: 8pt;">n</span></em>, <em>y<span style="font-size: 8pt;">n</span></em>) - starting with four different initial conditions (<em>x</em>0, <em>y</em>0), for the sine map featured in Figure 1. The blue dots represent a root (<em>x</em>*, <em>y</em>*). Each orbit except the green one converges to a different root. The green one oscillates back and forth, never converging.</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8507235501?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8507235501?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><span><strong>Figure 4</strong>: <em>Four orbits corresponding to four initial conditions, for the case shown in Figure 1 </em></span></p>
<p><strong>Note:</strong> When the system is very sensitive to initial conditions and highly chaotic, orbits computed numerically may be all wrong as round-off errors propagate exponentially fast as <em>n</em> increases. In that case, it is needed to use high precision computing to get accurate orbits, see <a href="https://www.datasciencecentral.com/profiles/blogs/high-precision-computing-benchmark-examples-and-tutorial" target="_blank" rel="noopener">here</a>.</p>
<p><strong>3.1. Benchmarking clustering algorithms</strong></p>
<p><span>The basins of attractions can be used to benchmark supervised clustering algorithms. For instance, in Figure 1, if you group the red and black basins together, and the yellow and blue basins together, you end up with two well separated groups whose boundaries can be determined to arbitrary precision. One can sample points from the merged basins to create a training set with two groups, and check how well your clustering algorithm (based for instance on nearest neighbors or density estimation) can estimate the true boundaries. Another machine learning problem that you can test on these basins is boundary estimation: the problem consists in finding the boundary of a domain when you know points that are inside and points that are outside the domain. </span></p>
<p><strong>3.2. Interesting probability problem</strong></p>
<p><span>The case pictured in Figure 1 leads to an interesting question. If you pick up randomly a vector of initial conditions (<em>x</em><span style="font-size: 8pt;">0</span>, <em>y</em><span style="font-size: 8pt;">0</span>), what is the probability that it will fall in (say) the red basin? It turns out that the probabilities are identical regardless of the basin. However, the probability to fall outside any basin (the white area) is different.</span></p>
<p><em>More beautiful images can be found in Part 2 of this article, <a href="https://www.datasciencecentral.com/profiles/blogs/more-beautiful-math-images" target="_blank" rel="noopener">here</a>. To not miss them, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. See also <a href="https://www.datasciencecentral.com/profiles/blogs/deep-visualizations-riemann-s-conjecture" target="_blank" rel="noopener">this article</a>, featuring an image entitled "the eye of the Riemann Zeta function". See also the Wikipedia article about "Infinite Compositions of Analytic Functions", <a href="https://en.wikipedia.org/wiki/Infinite_compositions_of_analytic_functions#:~:text=In%20mathematics%2C%20infinite%20compositions%20of,convergence%2Fdivergence%20of%20these%20expansions." target="_blank" rel="noopener">here</a>. The picture below is from that article.</em></p>
<p></p>
<p><em><a href="https://storage.ning.com/topology/rest/1.0/file/get/8572990262?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8572990262?profile=RESIZE_710x" width="400" class="align-center"/></a></em></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He is also the founder and investor in<span> </span><a href="https://www.parisrestaurantandbar.com/blog" target="_blank" rel="noopener">Paris Restaurant</a><span> </span>in Anacortes, WA. You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>. </em></p>
<p></p>Can a Diploma from a Lower Ranking University Hurt your Data Science Career Prospects?tag:www.datasciencecentral.com,2021-01-29:6448529:BlogPost:10153502021-01-29T04:16:13.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>Here I specifically discuss the case of a PhD degree from a third-tier university, though to some extent, it also applies to master degrees. Many professionals joining companies such as Facebook, Microsoft, or Google in a role other than a programmer, typically have a PhD degree, although there are many exceptions. It is still possible to learn data science on the job, especially if you have a quantitative background (say in physics or engineering) and have experience working with serious…</p>
<p>Here I specifically discuss the case of a PhD degree from a third-tier university, though to some extent, it also applies to master degrees. Many professionals joining companies such as Facebook, Microsoft, or Google in a role other than a programmer, typically have a PhD degree, although there are many exceptions. It is still possible to learn data science on the job, especially if you have a quantitative background (say in physics or engineering) and have experience working with serious data: see <a href="https://www.datasciencecentral.com/profiles/blogs/is-it-still-possible-today-to-become-a-self-taught-data-scientist" target="_blank" rel="noopener">here</a>. After all, learning Python is not that hard and can be done via data camps. What is more difficult to acquire is the analytical maturity. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8492386293?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8492386293?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p style="text-align: center;"><em>University of Namur</em></p>
<p>In my cased, I did my PhD at the University of Namur, a place that nobody has heard of. The topic of my research was computational statistics and image analysis. These were hot topics back then, and I was also lucky to work part-time in the corporate world for a state-of-the-art GIS (Geographic Information System) company, working with engineers on digital satellite images, as part of my PhD program, thanks to my mentor. Much of what I worked on is still very active these days, on a much bigger scale. It was the precursor of automated driving systems, and the math department in my alma mater was young and still very creative back then. This brings me to my first advice when choosing a PhD program.</p>
<p><strong>Advice #1</strong></p>
<ul>
<li>If you come from a poor background, your options might be more limited (this was my case), and you need to leverage everything you can. My parents did not have the money to send me to expensive schools, and I ended up attending the closest one to avoid spending a lot of money on rent. On the plus side, I did not accumulate student loans.</li>
<li>Before deciding on a PhD program, carefully choose your mentor. Mine was not known for his research, but he was well connected to the industry, managed to get money to fund his projects, and was working on exciting, applied projects. </li>
</ul>
<p>A side effect on my last piece of advice is that if your goal is to stay in Academia, you may have to rely on yourself to make your research worthy of publications and susceptible to land you a tenured position. The way I did it is summarized in my next advice. You want ideally to leave all doors open, both Academia and other options.</p>
<p><strong>Advice #2</strong></p>
<ul>
<li>Be proactive about reaching out to well respected professors in your field. Attend conferences and meet peers from around the world. Accept jobs such as reviewers. Start publishing in third-tier journals, move to second-tier, and then get a few ones in first-tier journals before completing your PhD. The one I published in <em>Journal of Statistical Society, Series B</em>, is what resulted in me being accepted as a postdoc at Cambridge University. Initially when it was accepted, it only had my name on it. </li>
<li>It helps to be passionate about what you do. My very first paper was in <em>Journal of Number Theory</em>, during my first year as a PhD student. It happened because I had a passion for number theory that I developed during my middle-school and high-school years. I hated high-school math (repetitive, boring mechanical exercises) but loved the math that I discovered and self-learned myself during these years, mostly through reading. I was the only student to participate (and be a finalist) at the national Math Olympiads, in my school. When you are young, it's something good to have on your resume. </li>
</ul>
<p>So to answer the original question - does it hurt coming from a low ranking school - at this point you know that you can still succeed despite the odds. But it requires patience, perseverance, and you must be very good at what you do. Perhaps the biggest drawback is the lack of great connections that top schools offer. You have to make up for that. Also great schools have state-of-the-art equipment and labs (so you can learn the most modern stuff), but somehow my little math department didn't lack these, so I was not penalized for that. I also cultivated great relationships with the computer science department. At the end, my research was at the intersection of math, statistics and computer science.</p>
<p>My last piece of advice is about what happens next after completing your PhD. In my case, I started a postdoc at Cambridge then moved to the corporate world (after failing a job interview for a tenured position) and eventually became entrepreneur, VC-funded executive, and sold my last venture recently to a publicly traded company. I still do independent math research, even more so and of higher caliber than during my PhD years. </p>
<p><strong>Advice #3</strong></p>
<ul>
<li>Contact other successful professionals who came from a third-tier university to ask for their advice. In my math department, two other PhD students in my cohort ended up having a stellar career: Michel Bierlaire (postdoc MIT after Namur) is now full professor at EPFL; Didier Burton (also postdoc MIT after Namur) ended up as an executive at Yahoo. </li>
<li>If you can, leverage the fact that you are very applied, don't have student loans, and thus you can ask a lower salary, be more competitive, gain various horizontal experience in many places while developing world-class expertise in a few areas. I eventually realized that working for myself (not as consultant, but entrepreneur) was what I liked best.</li>
</ul>
<p>You may argue that you don't need any diploma to create your own self-funded company, not even elementary school, but in the end I believe I got the best I could out of my PhD. In my case, it also implied relocating several times, from Belgium (due to lack of jobs) to UK to United States, and from the East Coast to the Bay Area and finally Seattle. I've been through various bubbles and market crashes; you may use your analytical skills to navigate them the best you can, selling and buying at the right time, understanding the markets, and emerge stronger each time. </p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He is also the founder and investor in<span> </span><a href="https://www.parisrestaurantandbar.com/blog" target="_blank" rel="noopener">Paris Restaurant</a><span> </span>in Anacortes, WA. You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>Moving Averages: Natural Weights, Iterated Convolutions, and Central Limit Theoremtag:www.datasciencecentral.com,2021-01-26:6448529:BlogPost:10118062021-01-26T02:00:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>Convolution is a concept well known to machine learning and signal processing professionals. In this article, we explain in simple English how a moving average is actually a discrete convolution, and we use this fact to build weighted moving averages with natural weights that at the limit, have a Gaussian behavior guaranteed by the Central Limit Theorem. Moving averages are nothing more than blurring filters for signal processing experts, with a Gaussian-like kernel in the case discussed…</p>
<p>Convolution is a concept well known to machine learning and signal processing professionals. In this article, we explain in simple English how a moving average is actually a discrete convolution, and we use this fact to build weighted moving averages with natural weights that at the limit, have a Gaussian behavior guaranteed by the Central Limit Theorem. Moving averages are nothing more than blurring filters for signal processing experts, with a Gaussian-like kernel in the case discussed here. Inverting a moving average to recover the original signal consists in applying the inverse filter, known as a sharpening or enhancing filter. The inverse filter is used for instance in image analysis, to remove noise or deblur an image, while the original filter (the moving average) does the opposite. This is discussed here for one dimensional discrete signals, known as time series. Generalizations are also discussed. An interesting application in number theory, related to the famous unsolved Riemann conjecture, is also discussed.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8470334300?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8470334300?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>Bell-shaped distribution for re-scaled coefficients (the weights) discussed in section 1.1</em></p>
<p><span style="font-size: 14pt;"><b style="font-size: 14pt;">1.</b> <span style="font-size: 18.6667px;"><b>Weighted</b></span><b style="font-size: 14pt;"> moving averages as convolutions</b></span></p>
<p>Given a discrete time series with observations <em>X</em>(0), <em>X</em>(1), <em>X</em>(2)<i> </i>and so on, a weighted moving average can be defined by</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8469896293?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8469896293?profile=RESIZE_710x" width="350" class="align-center"/></a></p>
<p>Here <em>Y</em>(<em>t</em>) is the smoothed signal and <em>h</em> is a discrete density function (thus summing to one) though negative values of <em>h</em>(<em>k</em>) are sometimes used, for instance in Spencer's 15-point moving average used by actuaries, see <a href="https://mathworld.wolfram.com/Spencers15-PointMovingAverage.html" target="_blank" rel="noopener">here</a>. We assume that <em>t</em> can take on negative integer values. Also, unless otherwise specified, we assume the weights to be symmetrical, that is, <em>h</em>(<em>k</em>) = <em>h</em>(-<em>k</em>). The parameter <em>N</em> can be infinite, but typically, the values <em>h</em>(<em>k</em>) are fast decaying the further away you are from <em>k</em> = 0. </p>
<p>The notation used by mathematicians to represent this transformation is as follows: <em>Y</em> = <em>T</em>(<em>X</em>) = <em>h</em> * <em>X</em> where * is the convolution operator. This notation is convenient because it easily allows us to define the iterated moving average as a self-composition of the operator <em>T</em>, acting on the time series <em>X </em>: Start with <em>Y</em><span style="font-size: 8pt;">0</span> = <em>X</em>, <em>Y</em><span style="font-size: 8pt;">1</span> = <em>Y</em>, and let <em>Y</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>T</em>(<em>Y<span style="font-size: 8pt;">n</span></em>) = <em>h</em> * <em>Y<span style="font-size: 8pt;">n</span></em>. Likewise, we can define <span style="font-size: 12pt;"><em>h<span style="font-size: 8pt;">n</span></em></span> (with <em>h</em><span style="font-size: 8pt;">1</span> = <em>h</em>) as <em>h</em> * <em>h</em> * ... * <em>h</em>, that is, an <em>n</em>-fold self-convolution of <em>h</em>. Of course, <em>Y<span style="font-size: 8pt;">n</span></em> = <em>h<span style="font-size: 8pt;">n</span></em> * <em>X</em> so that we have</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8469956688?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8469956688?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>Note that the sum goes from -<em>N<span style="font-size: 8pt;">n</span></em> to <em>N<span style="font-size: 8pt;">n</span></em> this time, as each additional iteration increases the number of terms in the sum, so <em>N<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span> > <em>N<span style="font-size: 8pt;">n</span></em>, with <em>N</em><span style="font-size: 8pt;">1</span> = <em>N</em>. This becomes clear in the following illustration.</p>
<p><strong>1.1 Example</strong></p>
<p>The most basic case corresponds to <em>N</em> = 1, with <em>h</em>(-1) = <em>h</em>(0) = <em>h</em>(1) = 1/3. In this case, <em>N<span style="font-size: 8pt;">n</span></em> = <em>n</em>, and the average value of <em>h<span style="font-size: 8pt;">n</span></em>(<em>k</em>) is equal to 1 / (2<em>N<span style="font-size: 8pt;">n</span></em> +1). We have the following table:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8470110072?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8470110072?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>The above table shows how the weights are automatically determined, without guess work, rule of thumb, or fine-tuning required. Note that the sum of the elements in the <em>n</em>-th row is always equal to 3^<em>n</em> (3 at power <em>n</em>). This is very similar to the binomial coefficients table, and <em>h<span style="font-size: 8pt;">n</span></em>(<em>k</em>) are known as the trinomial coefficients, see <a href="https://oeis.org/search?q=1%2C6%2C21%2C50%2C90%2C126&language=english&go=Search" target="_blank" rel="noopener">here</a>. The difference is that for binomial coefficients, the sum of the elements in the <em>n</em>-th row is always equal to 2^<em>n</em>, and the <em>n</em>-th row only has <em>n</em> + 1 entries, versus 2<em>n</em> + 1 in our table. The values <em>h<span style="font-size: 8pt;">n</span></em>(<em>k</em>) corresponding to <em>n</em> = 100 are displayed in Figure 1, at the top of this article. They have been scaled by a factor equal to the square root of <em>N<span style="font-size: 8pt;">n</span></em>, since otherwise they all tend to zero as <em>n</em> tends to infinity. </p>
<p><strong>1.2 Link to the Central Limit Theorem</strong></p>
<p>The methodology developed here can be used to prove the central limit theorem in the most classic way. Indeed, the classic proof uses iterated self-convolutions, and the fact that the Fourier transform of convolutions is the product of the individual Fourier transform of each convolution. The Fourier transform is called characteristic function in probability theory. Interestingly, this leads to Gaussian approximations for partial sums of coefficients such as those in the <em>n</em>-th row, in the above table, when <em>n</em> is large and after proper rescaling. This is already well known for binomial coefficients (see <a href="http://www.ams.org/publicoutreach/feature-column/fcarc-normal" target="_blank" rel="noopener">here</a>), and it easily extends to the coefficients introduced here, as well as to many other types of mathematical coefficients. See also Figure 1.</p>
<p><span style="font-size: 14pt;"><strong>2. Inverting a moving average, and generalizations</strong></span></p>
<p>Inverting a moving average consists in retrieving the original time series or signal. It consists in applying the inverse filter to the observed data, to un-smooth it. It is usually not possible to do it, though the true answer is somewhat more nuanced. It is certainly easier to do when <em>N</em> is small, though usually <em>N</em> is not known, and the weights are also unknown. However if the observed data is the result of applying the simple convolution described in section 1.2 with <em>N</em> = 1, you only need to know the values of <em>X</em>(<em>t</em>) at two different times <em>t</em><span style="font-size: 8pt;">0</span> and <em>t</em><span style="font-size: 8pt;">1</span> to retrieve the original signal. This is easiest if you know <em>X</em>(<em>t</em>) at <em>t</em><span style="font-size: 8pt;">0</span> = 0 and at <em>t</em><span style="font-size: 8pt;">1</span> = 1: in this case, there is a simple inversion formula: </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8470438871?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8470438871?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>If you know <em>X</em>(0), <em>X</em>(1), and <em>Y</em>(<em>t</em>) for all <em>t</em>'s, you can iteratively retrieve <em>X</em>(2), <em>X</em>(3), and so on with the above recurrence formula. If you don't know <em>X</em>(0), <em>X</em>(1) but instead you know the variance and other higher moments of <em>X</em>(<em>t</em>), assuming <em>X</em>(<em>t</em>) is stationary, then you may test various <em>X</em>(0), <em>X</em>(1) until you find a pair matching these moments when reconstructing the full sequence <em>X</em>(<em>t</em>) using the above recurrence formula. The solution may not be unique. Other parameters you know about <em>X</em>(<em>t</em>) may be useful too for the reconstruction: the period (if any), the slope of a linear trend (if any), and so on. </p>
<p><strong>2.1 Generalizations</strong></p>
<p>The moving averages discussed here rely on the classic arithmetic mean as the fundamental convolution operator, corresponding to <em>N</em> = 1. It is possible to use other means such as the harmonic or geometric means, and even more general as those defined <a href="https://www.datasciencecentral.com/profiles/blogs/alternative-to-the-arithmetic-geometric-and-harmonic-means" target="_blank" rel="noopener">in this article</a>. It can be generalized to two or higher dimensions, and to a time-continuous signal. For prediction or extrapolation, see <a href="https://www.datasciencecentral.com/profiles/blogs/introducing-an-all-purpose-robust-fast-simple-non-linear-r22" target="_blank" rel="noopener">this article</a>. For interpolation, that is to estimate <em>X</em>(<em>t</em>) when <em>t</em> is not an integer, <a href="https://mathoverflow.net/questions/376081/infinite-partial-fraction-expansions-to-compute-fractional-iterations-and-recurr" target="_blank" rel="noopener">see this article</a>. </p>
<p><span style="font-size: 14pt;"><strong>3. Application and source code</strong></span></p>
<p>We applied the above methodology with <em>n</em> = 60 to the following time series, with 60 < <em>t</em> < 240 being an integer:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8477710477?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8477710477?profile=RESIZE_710x" width="250" class="align-center"/></a></p>
<p>Figure 2 shows <em>Y<span style="font-size: 8pt;">n</span></em>(<em>t</em>) with <em>n</em> = 60 (the red curve), after shifting and rescaling (multiplying) it by a factor of order sqrt(<em>n</em>). In this case, <em>X</em>(2<em>t</em>) represents the real part of the <a href="https://en.wikipedia.org/wiki/Dirichlet_eta_function" target="_blank" rel="noopener">Dirichlet Eta function</a> <span><em>η</em> </span>defined in the complex plane. If you replace the cosine by a sine in the definition of <em>X</em>(<em>t</em>), you get similar results for the imaginary part of <em>η</em>. What is spectacular here is that <em>Y<span style="font-size: 8pt;">n</span></em>(<em>t</em>) is very well approximated by a cosine function, see bottom of figure 2. The implication is that thanks to the self-convolution used here, we can approximate the real and imaginary parts of <span><em>η</em> </span>by a simple auto-regressive model. This in turn may have implications to help solve the famous <a href="https://www.datasciencecentral.com/profiles/blogs/deep-visualizations-riemann-s-conjecture" target="_blank" rel="noopener">Riemann Hypothesis</a> (RH) which essentially consists in locating the values of <em>t</em> such that <em>X</em>(2<em>t</em>) = 0 simultaneously for the real and imaginary part of <em>η</em>. RH states that there is no such <em>t</em> in our particular case, where a parameter 0.75 is used in the definition of <em>X</em>(<em>t</em>). It is conjectured to also be true if you replace 0.75 by any value strictly between 0.5 and 1. See more <a href="https://www.datasciencecentral.com/profiles/blogs/deep-visualizations-riemann-s-conjecture" target="_blank" rel="noopener">here</a> and <a href="https://mathoverflow.net/questions/382043/incredibly-accurate-recursions-for-the-riemann-zeta-function" target="_blank" rel="noopener">here</a>. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8477209286?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8477209286?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>weighted moving average (WMA) with n = 60 (top), model fitting with cosine function (bottom)</em></p>
<p>Note that <em>X</em>(<em>t</em>), the blue curve, is non-periodic, while the red curve is almost perfectly periodic. If you use arbitrary moving averages instead of the one based on a convolution <em>h<span style="font-size: 8pt;">n</span></em> * <em>X</em>, you won't get a perfect fit in the bottom part of figure 2, certainly not a perfect fit with a simple cosine function. <a href="https://storage.ning.com/topology/rest/1.0/file/get/8477213652?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8477213652?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3</strong>: <em>same as top part of figure 2, but using a different X(t) for the blue curve</em></p>
<p>Also, the perfect fit can not be achieved if you replace the logarithm in the definition of <em>X</em>(<em>t</em>), by a much faster growing function. This is illustrated in figure 3, where the logarithm in <em>X</em>(<em>t</em>) was replaced by a square root.</p>
<p>The source code can be downloaded <a href="https://storage.ning.com/topology/rest/1.0/file/get/8477763473?profile=original" target="_blank" rel="noopener">here</a> (convol2b.pl.txt). Since it is dealing with convolutions, it can be further optimized using Fast Fourier Transforms (FFT), see <a href="http://www.dspguide.com/ch18/2.htm" target="_blank" rel="noopener">here</a>. Finally, it would be interesting to treat this case assuming the time <em>t</em> is continuous, using continuous rather than discrete convolutions.</p>
<p></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He is also the founder and investor in <a href="https://www.parisrestaurantandbar.com/blog" target="_blank" rel="noopener">Paris Restaurant</a> in Anacortes, WA. You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>