Vincent Granville's Posts - Data Science Central
2021-04-10T20:50:33Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
https://storage.ning.com/topology/rest/1.0/file/get/2800211702?profile=RESIZE_48X48&width=48&height=48&crop=1%3A1
https://www.datasciencecentral.com/profiles/blog/feed?user=3v6n5b6g08kgn&xn_auth=no
Simple Machine Learning Approach to Testing for Independence
tag:www.datasciencecentral.com,2021-04-08:6448529:BlogPost:1046622
2021-04-08T06:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8771488658?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8771488658?profile=RESIZE_710x" width="500"></img></a></p>
<p>We describe here a methodology that applies to any statistical test, and illustrated in the context of assessing independence between successive observations in a data set. After reviewing a few standard approaches, we discuss our methodology, its benefits, and drawbacks. The data used here for illustration purposes, has known theoretical…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8771488658?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8771488658?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>We describe here a methodology that applies to any statistical test, and illustrated in the context of assessing independence between successive observations in a data set. After reviewing a few standard approaches, we discuss our methodology, its benefits, and drawbacks. The data used here for illustration purposes, has known theoretical auto-correlations. Thus it can be used to benchmark various statistical tests. Our methodology also applies to data with high volatility, in particular, to time series models with undefined autocorrelations. Such models (see for instance Figure 1 <a href="https://www.datasciencecentral.com/profiles/blogs/defining-and-measuring-chaos-in-data-sets-why-and-how-in-simple-w" target="_blank" rel="noopener">in this article</a>) are usually ignored by practitioners, despite their interesting properties.</p>
<p>Independence is a stronger concept than all autocorrelations being equal to zero. In particular, some functional non-linear relationships between successive data points may result in zero autocorrelation even though the observations exhibit strong auto-dependencies: a classic example is points randomly located on a circle centered at the origin; the correlation between the <em>X</em> and <em>Y</em> variables may be zero, but of course <em>X</em> and <em>Y</em> are not independent.</p>
<p><span style="font-size: 14pt;"><strong>1. Testing for independence: classic methods</strong></span></p>
<p>The most well known test is the Chi-Square test, see <a href="http://mlwiki.org/index.php/Chi-Squared_Test_of_Independence" target="_blank" rel="noopener">here</a>. It is used to test independence in contingency tables or between two time series. In the latter case, it requires binning the data, and works only if each bin has enough observations, usually more than 5. Its exact statistic under the assumption of independence has a known distribution: Chi-Squared, itself well approximated by a normal distribution for moderately sized data sets, see <a href="https://en.wikipedia.org/wiki/Chi-square_distribution#Asymptotic_properties" target="_blank" rel="noopener">here</a>. </p>
<p>Another test is based on the Kolmogorov-Smirnov statistics. It is typically used to measure <a href="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test" target="_blank" rel="noopener">goodness of fit</a>, but can be adapted to assess independence between two variables (or columns, in a data set). See <a href="https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-8/issue-2/A-Kolmogorov-Smirnov-type-test-for-independence-between-marks-and/10.1214/14-EJS961.full" target="_blank" rel="noopener">here</a>. Convergence to the exact distribution is slow. Our test described in section 2 is somewhat similar, but it is entirely data-driven, model free: our confidence intervals are based on re-sampling techniques, not on tabulated values of known statistical distributions. Our test was first discussed in section 2.3 of a previous article entitled <em>New Tests of Randomness and Independence for Sequences of Observations</em>, and available <a href="https://www.datasciencecentral.com/profiles/blogs/a-new-test-of-independence" target="_blank" rel="noopener">here</a>. In section 2 of this article, a better and simplified version is presented, suitable for big data. In addition, we discuss how to build confidence intervals, in a simple way that will appeal to machine learning professionals.</p>
<p>Finally, rather than testing for independence in successive observations (say, a time series) one can look at the square of the observed autocorrelations of lag-1, lag-2 and so on, up to lag-<em>k</em> (say <em>k</em> = 10). The absence of autocorrelations does not imply independence, but this test is easier to perform than a full independence test. The Ljung-Box and the Box-Pierce tests are the most popular ones used in this context, with Ljung-Box converging faster to the limiting (asymptotic) Chi-Squared distribution of the test statistic, as the sample size increases. See <a href="https://en.wikipedia.org/wiki/Ljung%E2%80%93Box_test" target="_blank" rel="noopener">here</a>.</p>
<p><span style="font-size: 14pt;"><strong>2. Our Test</strong></span></p>
<p>The data consists of a time series <em>x</em><span style="font-size: 8pt;">1</span>, <em>x</em><span style="font-size: 8pt;">2, ...<span style="font-size: 10pt;">, <em>x</em><span style="font-size: 8pt;"><em>n</em></span></span></span>. We want to test whether successive observations are independent or not, that is, whether <em>x</em><span style="font-size: 8pt;">1</span>, <em>x</em><span style="font-size: 8pt;">2</span>, ..., x<span style="font-size: 8pt;"><em>n</em>-1</span> and <em>x</em><span style="font-size: 8pt;">2</span>, <em>x</em><span style="font-size: 8pt;">3</span>, ..., x<span style="font-size: 8pt;"><em>n</em></span> are independent or not. It can be generalized to a broader test of independence (see section 2.3 <a href="https://www.datasciencecentral.com/profiles/blogs/a-new-test-of-independence" target="_blank" rel="noopener">here</a>) or to bivariate observations: <em>x</em><span style="font-size: 8pt;">1</span>, <em>x</em><span style="font-size: 8pt;">2</span>, ..., <em>x<span style="font-size: 8pt;">n</span></em> versus <em>y</em><span style="font-size: 8pt;">1</span>, <em>y</em><span style="font-size: 8pt;">2</span>, ..., <em>y</em><span style="font-size: 8pt;"><em>n</em></span>. For the sake of simplicity, we assume that the observations are in [0, 1].</p>
<p><strong>2.1. Step #1</strong></p>
<p>The first step to perform the test, consists in computing the following statistics:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8779418488?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8779418488?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>for <em>N</em> vectors (<em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em><span>, </span><span lang="el" title="Greek-language text" xml:lang="el"><em>β</em>)<em>'s,</em></span> where <em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em><span>, </span><span lang="el" title="Greek-language text" xml:lang="el"><em>β</em> </span>are randomly sampled or equally spaced values in [0, 1], and <em>χ</em> is the indicator function: <em>χ</em>(<em>A</em>) = 1 if <em>A</em> is true, otherwise <em>χ</em>(<em>A</em>) = 0. The idea behind the test is intuitive: if <em>q</em>(<em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em><span>, </span><span lang="el" title="Greek-language text" xml:lang="el"><em>β</em></span>) is statistically different from zero for one or more of the randomly chosen (<em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em><span>, </span><span lang="el" title="Greek-language text" xml:lang="el"><em>β</em></span>)'s, then successive observations can not possibly be independent, in other words, <em>x<span style="font-size: 8pt;">k</span></em> and <em>x</em><span style="font-size: 8pt;"><em>k</em>+1</span> are not independent. </p>
<p>In practice, I chose <em>N</em> = 100 vectors (<em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em>, <span lang="el" title="Greek-language text" xml:lang="el"><em>β</em>)</span> <span lang="el" title="Greek-language text" xml:lang="el">evenly distributed on the unit square [0, 1] x [0, 1], assuming that the <em>x<span style="font-size: 8pt;">k</span></em>'s take values in [0, 1] and that <em>n</em> is much larger than <em>N</em>, say n = 25 <em>N</em>. </span></p>
<p><strong>2.2. Step #2</strong></p>
<p>Two natural statistics for the test are</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8779295860?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8779295860?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>The first one <em>S</em>, once standardized, should asymptotically have a Kolmogorov-Smirnov distribution. The second one <em>T</em>, once standardized, should asymptotically have a normal distribution, despite the fact that the various <em>q</em>(<em><span lang="el" title="Greek-language text" xml:lang="el">α</span></em><span>, </span><span lang="el" title="Greek-language text" xml:lang="el"><em>β</em>)'s are never independent. However, we do not care about the theoretical (asymptotic) distribution, thus moving away from the classic statistical approach. We use a methodology that is typical of machine learning, and described in section 2.3.</span></p>
<p><span lang="el" title="Greek-language text" xml:lang="el">Nevertheless, the principle is the same in both cases: the higher the value of <em>S</em> or <em>T</em> computed on the data set, the most likely we must reject the assumption of independence. Among the two statistics, <em>T</em> has less volatility than <em>S</em>, and may be preferred. But <em>S</em> is better at detecting very small departures from independence.</span></p>
<p><strong>2.3. Step #3</strong></p>
<p>The technique described here is very generic, intuitive, and simple. It applies to any statistical test of hypotheses, not just for testing independence. It is somewhat similar to cross-validation. It consists or reshuffling the observations in various ways (see the <a href="https://en.wikipedia.org/wiki/Resampling_(statistics)" target="_blank" rel="noopener">resampling entry</a> in Wikipedia to see how it actually works) and compute <em>S</em> (or <em>T</em>) for each of the 10 different reshuffled time series. After reshuffling, one would assume that any serial, pairwise independence has been lost, and thus you get an idea of the distribution of <em>S</em> (or <em>T</em>) in case of independence. Now compute <em>S</em> on the original time series. Is it higher than the 10 values you computed on the reshuffled time series? If yes, you have a 90% chance that the original time series exhibits serial, pairwise dependency. </p>
<p>A better but more complicated method consists of computing the empirical distribution of the <em>x<span style="font-size: 8pt;">k</span></em>'s, then generate 10 <em>n</em> independent deviates with that distribution. This constitutes 10 time series, each with <em>n</em> independent observations. Compute <em>S</em> for each of these time series, and compare with the value of <em>S</em> computed on the original time series. If the value computed on the original time series is higher, then you have a 90% chance that the original time series exhibits serial, pairwise dependency. This is the preferred method if the original time series has strong, long-range autocorrelations.</p>
<p><strong>2.4. Test data set and results</strong></p>
<p>I tested the methodology on an artificial data set (a discrete dynamical system) created as follows: <em>x</em><span style="font-size: 8pt;">1</span> = log(2) and <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>b</em> <em>x<span style="font-size: 8pt;">n</span></em> - INT(<em>b x<span style="font-size: 8pt;">n</span></em>). Here <em>b</em> is an integer larger than 1, and INT is the integer part function. The data generated behaves like any real time series, and has the following properties.</p>
<ul>
<li>The theoretical distribution of the <em>x<span style="font-size: 8pt;">k</span></em>'s is uniform on [0, 1]</li>
<li>The lag-<em>k</em> autocorrelation is known and equal to 1 / <em>b</em>^<em>k</em> (<em>b</em> at power <em>k</em>)</li>
</ul>
<p>It is thus easy to test for independence and to benchmark various statistical tests: the larger <em>b</em>, the closer we are to independence. With a pseudo-random number generator, one can generate a time series consisting of independently and identically distributed deviates, with a uniform distribution on [0, 1], to check the distribution of <em>S</em> (or <em>T</em>) and its expectation, in case of true independence, and compare it with values of <em>S</em> (or <em>T</em>) computed on the artificial data, using various values of <em>b</em>. In this test with <em>N</em> = 100 <em>n</em> = 2500, <em>b</em> = 4, (corresponding to an autocorrelation of 0.25) the value of <i>S</i> is 6 times larger than the one obtained for full independence. For <em>b</em> = 8, (corresponding to an autocorrelation of 0.125), <i>S</i> is 3 times larger than the one obtained for full independence. This validates the test described here at least on this kind of dataset, as it correctly detects lack of independence by yielding abnormally high values of <em>T</em> when the independence assumption is violated.</p>
<p><strong>Note</strong>: Another interesting feature of the dataset used here is this: using <em>b</em>^<em>k</em> (<em>b</em> at power <em>k</em>) instead of <em>b</em>, is equivalent to checking lag-<em>k</em> independence, that is, independence between <em>x</em><span style="font-size: 8pt;">1</span>, <em>x</em><span style="font-size: 8pt;">2</span>, ... and <em>x</em><span style="font-size: 8pt;">1+<em>k</em></span>, <em>x</em><span style="font-size: 8pt;">2+<em>k</em></span>, ... in the original time series corresponding to <em>b</em>. The reason being that in the original series (corresponding to <em>b</em>), we have x<span style="font-size: 8pt;"><i>n</i>+<em>k</em></span> = <em>b</em>^<em>k</em> x<span style="font-size: 10.6667px;"><i>n</i></span> - INT(<em>b</em>^<em>k</em> <em>x<span style="font-size: 10.6667px;">n</span></em>).</p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><a href="https://www.datasciencecentral.com/profiles/blogs/a-new-test-of-independence"></a></p>
A Plethora of Machine Learning Tricks, Recipes, and Statistical Models
tag:www.datasciencecentral.com,2021-04-06:6448529:BlogPost:1046327
2021-04-06T03:59:22.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8760416479?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8760416479?profile=RESIZE_710x" width="400"></img></a></p>
<p style="text-align: center;"><em>Source: See article #5, in section 1</em></p>
<p><span>Part 2 of this short series focused on fundamental techniques, see <a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-2" rel="noopener" target="_blank">here</a>. In this Part 3, you will find several…</span></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8760416479?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8760416479?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: See article #5, in section 1</em></p>
<p><span>Part 2 of this short series focused on fundamental techniques, see <a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-2" target="_blank" rel="noopener">here</a>. In this Part 3, you will find several machine learning tricks and recipes, many with a statistical flavor. These are articles that I wrote in the last few years. The whole series will feature articles related to the following aspects of machine learning:</span></p>
<ul>
<li><span>Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science)</span></li>
<li><span>Opinions, for instance about the value of a PhD in our field, or the use of some techniques</span></li>
<li><span>Methods, principles, rules of thumb, recipes, tricks</span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-1" target="_blank" rel="noopener">Business analytics</a> </span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-2" target="_blank" rel="noopener">Core Techniques</a> </span></li>
</ul>
<p><span>My articles are always written in simple English and accessible to professionals with typically one year of calculus or statistical training, at the undergraduate level. They are geared towards people who use data but are interesting in gaining more practical analytical experience. Managers and decision makers are part of my intended audience. The style is compact, geared towards people who do not have a lot of free time. </span></p>
<p><span>Despite these restrictions, state-of-the-art, of-the-beaten-path results as well as machine learning trade secrets and research material are frequently shared. References to more advanced literature (from myself and other authors) is provided for those who want to dig deeper in the interested topics discussed. </span></p>
<p><span><strong>1. Machine Learning Tricks, Recipes and Statistical Models</strong></span></p>
<p><span>These articles focus on techniques that have wide applications or that are otherwise fundamental or seminal in nature.</span></p>
<ol>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/defining-and-measuring-chaos-in-data-sets-why-and-how-in-simple-w">Defining and Measuring Chaos in Data Sets: Why and How, in Simple Words</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/hurwitz-riemann-zeta-and-other-special-probability-distributions">Hurwitz-Riemann Zeta And Other Special Probability Distributions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/maximum-runs-in-bernoulli-trials">Maximum runs in Bernoulli trials: simulations and results</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/moving-averages-natural-weights-iterated-convolutions-and-central" target="_blank" rel="noopener">Moving Averages: Natural Weights, Iterated Convolutions, and Central Limit Theorem</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/things-you-did-not-know-you-could-do-with-excel" target="_blank" rel="noopener">Amazing Things You Did Not Know You Could Do in Excel</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-new-test-of-independence" target="_blank" rel="noopener">New Tests of Randomness and Independence for Sequences of Observations</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/interesting-application-of-the-poisson-binomial-distribution" target="_blank" rel="noopener">Interesting Application of the Poisson-Binomial Distribution</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/alternative-to-the-arithmetic-geometric-and-harmonic-means" target="_blank" rel="noopener">Alternative to the Arithmetic, Geometric, and Harmonic Means</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/bernouilli-lattice-models-connection-to-poisson-processes" target="_blank" rel="noopener">Bernouilli Lattice Models - Connection to Poisson Processes</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/simulating-distributions-with-one-line-of-code" target="_blank" rel="noopener">Simulating Distributions with One-Line Formulas, even in Excel</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/simplified-logistic-regression" target="_blank" rel="noopener">Simplified Logistic Regression</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/simple-trick-to-normalize-correlations-r-squared-and-so-on" target="_blank" rel="noopener">Simple Trick to Normalize Correlations, R-squared, and so on</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/simple-trick-to-remove-serial-correlation-in-regression-models" target="_blank" rel="noopener">Simple Trick to Remove Serial Correlation in Regression Models</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-beautiful-result-in-probability-theory" target="_blank" rel="noopener">A Beautiful Result in Probability Theory</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/long-range-correlation-in-time-series-tutorial-and-case-study" target="_blank" rel="noopener">Long-range Correlations in Time Series: Modeling, Testing, Case Study</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-correlation-and-regression-in-statistics" target="_blank" rel="noopener">Difference Between Correlation and Regression in Statistics</a></li>
</ol>
<p><span><strong>2. Free books</strong></span></p>
<ul>
<li><span><b>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning">here</a>. In about 300 pages and 28 chapters it covers many new topics, offering a fresh perspective on the subject, including rules of thumb and recipes that are easy to automate or integrate in black-box systems, as well as new model-free, data-driven foundations to statistical science and predictive analytics. The approach focuses on robust techniques; it is bottom-up (from applications to theory), in contrast to the traditional top-down approach.</span></p>
<p><span>The material is accessible to practitioners with a one-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications with numerous illustrations, is aimed at practitioners, researchers, and executives in various quantitative fields.</span></p>
</li>
<li><span><b>Applied Stochastic Processes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">here</a>. Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems (104 pages, 16 chapters.) This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject.</span></p>
<p><span>It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</span></p>
</li>
</ul>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
Defining and Measuring Chaos in Data Sets: Why and How, in Simple Words
tag:www.datasciencecentral.com,2021-03-29:6448529:BlogPost:1045635
2021-03-29T00:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8735877694?profile=original" rel="noopener" target="_blank"><img class="align-full" src="https://storage.ning.com/topology/rest/1.0/file/get/8735877694?profile=RESIZE_710x" width="720"></img></a></p>
<p>There are many ways chaos is defined, each scientific field and each expert having its own definitions. We share here a few of the most common metrics used to quantify the level of chaos in univariate time series or data sets. We also introduce a new, simple definition based on metrics that are familiar to everyone. Generally speaking, chaos…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8735877694?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8735877694?profile=RESIZE_710x" width="720" class="align-full"/></a></p>
<p>There are many ways chaos is defined, each scientific field and each expert having its own definitions. We share here a few of the most common metrics used to quantify the level of chaos in univariate time series or data sets. We also introduce a new, simple definition based on metrics that are familiar to everyone. Generally speaking, chaos represents how predictable a system is, be it the weather, stock prices, economic time series, medical or biological indicators, earthquakes, or anything that has some level of randomness. </p>
<p>In most applications, various statistical models (or data-driven, model-free techniques) are used to make predictions. Model selection and comparison can be based on testing various models, each one with its own level of chaos. Sometimes, time series do not have an auto-correlation function due to the high level of variability in the observations: for instance, the theoretical variance of the model is infinite. An example is provided in section 2.2 <a href="https://www.datasciencecentral.com/profiles/blogs/hurwitz-riemann-zeta-and-other-special-probability-distributions" target="_blank" rel="noopener">in this article</a> (see picture below), used to model extreme events. In this case, chaos is a handy metric, and it allows you to build and use models that are otherwise ignored or unknown by practitioners. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8725268092?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8725268092?profile=RESIZE_710x" width="450" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>Time series with indefinite autocorrelation; instead, chaos is used to measure predictability</em></p>
<p>Below are various definitions of chaos, depending on the context they are used for. References about how to compute these metrics, are provided in each case.</p>
<p><strong>Hurst exponent</strong></p>
<p>The <a href="https://en.wikipedia.org/wiki/Hurst_exponent" target="_blank" rel="noopener">Hurst exponent</a> <em>H</em> is used to measure the level of smoothness in time series, and in particular, the level of long-term memory. <em>H</em> takes on values between 0 and 1, with <em>H</em> = 1/2 corresponding to the Brownian motion, and <em>H</em> = 0 corresponding to pure white noise. Higher values correspond to smoother time series, and lower values to more rugged data. Examples of time series with various values of <em>H</em> are found <a href="https://www.datasciencecentral.com/profiles/blogs/long-range-correlation-in-time-series-tutorial-and-case-study" target="_blank" rel="noopener">in this article</a>, see picture below. In the same article, the relation to the <em>detrending moving average</em> (another metric to measure chaos) is explained. Also, <em>H</em> is related to the fractal dimension. Applications include stock price modeling.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8725551894?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8725551894?profile=RESIZE_710x" width="350" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>Time series with H = 1/2 (top), and H close to 1 (bottom)</em></p>
<p><strong>Lyapunov exponent</strong></p>
<p>In dynamical systems, the Lyapunov exponent is used to quantify how a system is sensitive to initial conditions. Intuitively, the more sensitive to initial conditions, the more chaotic the system is. For instance, the system <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> - INT(<em>x<span style="font-size: 8pt;">n</span></em>), where INT represents the integer function, is very sensitive to the initial condition <em>x</em><span style="font-size: 8pt;">0</span>. A very small change in the value of <em>x</em><span style="font-size: 8pt;">0</span> results in values of <em>x<span style="font-size: 8pt;">n</span></em> that are totally different even for <em>n</em> as low as 45. See how to compute the Lyapunov exponent, <a href="https://en.wikipedia.org/wiki/Lyapunov_exponent" target="_blank" rel="noopener">here</a>.</p>
<p><strong>Fractal dimension</strong></p>
<p>A one-dimensional curve can be defined parametrically by a system of two equations. For instance <em>x</em>(<em>t</em>) = sin(<em>t</em>), <em>y</em>(<em>t</em>) = cos(<em>t</em>) represents a circle of radius 1, centered at the origin. Typically, <em>t</em> is referred to as the time, and the curve itself is called an orbit. In some cases, as <em>t</em> increases, the orbit fills more and more space in the plane. In some cases, it will fill a dense area, to the point that it seems to be an object with a dimension strictly between 1 and 2. An example is provided in section 2 <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">in this article</a>, and pictured below. A formal definition of fractal dimension can be found <a href="https://en.wikipedia.org/wiki/Fractal_dimension" target="_blank" rel="noopener">here</a>.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8725489684?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8725489684?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3</strong>: <em>Example of a curve filling a dense area (fractal dimension > 1)</em></p>
<p>The picture in figure 3 is related to the Riemann hypothesis. Any meteorologist who sees the connection to hurricanes and their eye, could bring some light about how to solve this infamous mathematical conjecture, based on the physical laws governing hurricanes. Conversely, this picture (and the underlying mathematics) could also be used as statistical model for hurricane modeling and forecasting. </p>
<p><strong>Approximate entropy</strong></p>
<p>In statistics, the approximate entropy is a metric used to quantify regularity and predictability in time series fluctuations. Applications include medical data, finance, physiology, human factors engineering, and climate sciences. See the Wikipedia entry, <a href="https://en.wikipedia.org/wiki/Approximate_entropy" target="_blank" rel="noopener">here</a>.</p>
<p>It should not be confused with <a href="https://en.wikipedia.org/wiki/Entropy" target="_blank" rel="noopener">entropy</a>, which measures the amount of information attached to a specific probability distribution (with the uniform distribution on [0, 1] achieving maximum entropy among all continuous distributions on [0, 1], and the normal distribution achieving maximum entropy among all continuous distributions defined on the real line, with a specific variance). Entropy is used to compare the efficiency of various encryption systems, and has been used in feature selection strategies in machine learning, see <a href="https://www.datasciencecentral.com/profiles/blogs/feature-selection-a-simple-solution" target="_blank" rel="noopener">here</a>.</p>
<p><strong>Independence metric </strong></p>
<p>Here I discuss some metrics that are of interest in the context of dynamical systems, offering an alternative to the Lyapunov exponent to measure chaos. While the Lyapunov exponents deals with sensitivity to initial conditions, the classic statistics mentioned here deals with measuring predictability for a single instance (observed time series) of a dynamical systems. However, they are most useful to compare the level of chaos between two different dynamical systems with similar properties. A dynamical system is a sequence <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>T</em>(<em>x<span style="font-size: 8pt;">n</span></em>), with initial condition <em>x</em><span style="font-size: 8pt;">0</span>. Examples are provided in my last two articles, <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">here</a> and <a href="https://www.datasciencecentral.com/profiles/blogs/hurwitz-riemann-zeta-and-other-special-probability-distributions" target="_blank" rel="noopener">here</a>. See also <a href="https://www.datasciencecentral.com/profiles/blogs/beautiful-mathematical-images" target="_blank" rel="noopener">here</a>. </p>
<p>A natural metric to measure chaos is the maximum autocorrelation in absolute value, between the sequence (<em>x<span style="font-size: 8pt;">n</span></em>), and the shifted sequences (<em>x</em><span style="font-size: 8pt;"><em>n</em>+<em>k</em></span>), for <em>k</em> = 1, 2, and so on. Its value is maximum and equal to 1 in case of periodicity, and minimum and equal to 0 for the most chaotic cases. However, some sequences attached to dynamical systems, such as the digit sequence pictured in Figure 1 in this article, do not have theoretical autocorrelations: these autocorrelations don't exist because the underlying expectation or variance is infinite or does not exist. A possible solution with positive sequences is to compute the autocorrelations on <em>y<span style="font-size: 8pt;">n</span></em> = log(<em>x<span style="font-size: 8pt;">n</span></em>) rather than on the <em>x<span style="font-size: 8pt;">n</span></em>'s.</p>
<p>In addition, there may be strong non-linear dependencies, and thus high predictability for a sequence (<em>x<span style="font-size: 8pt;">n</span></em>), even if autocorrelations are zero. Thus the desire to build a better metric. In my next article, I will introduce a metric measuring the level of independence, as a proxy to quantifying chaos. It will be similar in some ways to the Kolmogorov-Smirnov metric used to test independence and illustrated <a href="https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-8/issue-2/A-Kolmogorov-Smirnov-type-test-for-independence-between-marks-and/10.1214/14-EJS961.full" target="_blank" rel="noopener">here</a>, however, without much theory, essentially using a machine learning approach and data-driven, model-free techniques to build confidence intervals and compare the amount of chaos in two dynamical systems: one fully chaotic versus one not fully chaotic. Some of this is discussed <a href="https://math.stackexchange.com/questions/4079669/question-about-a-special-test-of-independence-autocorrelation" target="_blank" rel="noopener">here</a>.</p>
<p>I did not include the variance as a metric to measure chaos, as the variance can always be standardized by a change of scale, unless it is infinite.</p>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
Hurwitz-Riemann Zeta And Other Special Probability Distributions
tag:www.datasciencecentral.com,2021-03-22:6448529:BlogPost:1044813
2021-03-22T05:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691835652?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8691835652?profile=RESIZE_710x" width="600"></img></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.datasciencecentral.com/profiles/blogs/babar-mimou" rel="noopener" target="_blank">here</a></em></p>
<p>In my previous article <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" rel="noopener" target="_blank">here</a>, I discussed a…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691835652?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691835652?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.datasciencecentral.com/profiles/blogs/babar-mimou" target="_blank" rel="noopener">here</a></em></p>
<p>In my previous article <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">here</a>, I discussed a simple way to solve complex optimization problems in machine learning. This was illustrated in the case of complex dynamical systems, involving non-linear equations in infinite dimensions, known as functional equations. These equations were solved using a fixed point algorithm, of which the Newton–Raphson method is a well known, widely used example.</p>
<p>These equations are typically solved numerically, as no theoretical solution is known in most cases. Nevertheless, in our case, a few examples have an exact, known solution. These examples with known solution are very useful, in the sense that they allow you to test your numerical algorithm and assess how fast it converges, or not. All the solutions were probability distributions, and in this article we introduce an even larger, generic class of problems (chaotic discrete dynamical systems) with known solution. The distributions presented here can thus be used as tests to benchmark optimization algorithms, but they also have their own interest for statistical modeling purposes, especially in risk management and extreme event modeling.</p>
<p>Each dynamical system discussed here (or in my previous article) comes with two distributions:</p>
<ul>
<li>A continuous one on [0, 1], known as the <em>invariant distribution</em>.</li>
<li>A discrete one taking on strictly positive integer values, known as the <em>digit distribution</em>.</li>
</ul>
<p>Besides, these distributions are very useful in number theory, though this will not be discussed here. The name Hurwitz and Riemann-Zeta is just a reminder of their strong connection to number theory problems such as continued fractions, approximation of irrational numbers by rational ones, the construction and distribution of the digits of random numbers in various numeration systems, and the famous <a href="https://en.wikipedia.org/wiki/Riemann_hypothesis" target="_blank" rel="noopener">Riemann Hypothesis</a> that has a one million dollar prize attached to it. Some of this is discussed <a href="https://mathoverflow.net/questions/383925/about-generalized-continued-fractions" target="_blank" rel="noopener">here</a> and in some of my past MathOverflow questions. However, our focus here is applications in machine learning.</p>
<p><span style="font-size: 14pt;"><strong>1. The Hurwitz-Riemann Zeta distribution</strong></span></p>
<p>Without diving into the details, let me first briefly discuss other Riemann-related distributions invented by different authors. For a definition of the Hurwitz function, see <a href="https://en.wikipedia.org/wiki/Hurwitz_zeta_function" target="_blank" rel="noopener">here</a>. It generalizes the <a href="https://en.wikipedia.org/wiki/Riemann_zeta_function" target="_blank" rel="noopener">Riemann Zeta function</a>. The most well known probability distribution related to these functions is the discrete <a href="https://en.wikipedia.org/wiki/Zipf%27s_law" target="_blank" rel="noopener">Zipf distribution</a>. It is well known by machine learning practitioners, and used to model phenomena such as "the top 10 websites amount to (say) 95% of the Internet traffic". Another example, this time continuous over the set of all positive real numbers, can be found <a href="https://benthamopen.com/FULLTEXT/TOSPJ-7-53" target="_blank" rel="noopener">here</a>. The paper is entitled <em>A New Class of Distributions Based on Hurwitz Zeta Function with Applications for Risk Management</em>. The author defines a family of distributions that generalizes the exponential power, normal, gamma, Weibull, Rayleigh, Maxwell-Boltzmann and chi-squared distributions, with applications in actuarial sciences. Finally, there is also a well known example (for mathematicians) defined on the complex plane, see <a href="https://arxiv.org/pdf/1504.03438.pdf" target="_blank" rel="noopener">here</a>. The paper is entitled <em>A complete Riemann zeta distribution and the Riemann hypothesis</em>.</p>
<p>Our Hurwitz-Riemann Zeta distribution is yet another example arising this time from discrete dynamical systems, continuous on [0, 1]. It also has a sister discrete distribution attached to it, useful for statistical modeling. It is defined as follows.</p>
<p><strong>1.1. Our Hurwitz-Riemann Zeta distribution</strong></p>
<p>The distribution discussed here is the most basic example, from the generic family described in section 2. It depends on one parameter <em>s</em> > 0, and the support domain is [0, 1]. The construction mechanism is defined in section 2, for the general case. Our Hurwitz-Riemann zeta distribution has the following density:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8699635072?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8699635072?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>where <span><em>ζ</em>(<em>s</em>, <em>x</em>) is the Hurwitz function, see <a href="https://en.wikipedia.org/wiki/Hurwitz_zeta_function" target="_blank" rel="noopener">here</a>. It has the following two first moments:</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691286058?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691286058?profile=RESIZE_710x" width="550" class="align-center"/></a></span></p>
<p>where <em>ζ</em>(<em>s</em>) = <em>ζ</em>(<em>s</em>, 1) is the Riemann Zeta function. This allows you to compute its variance. Higher moments can also be computed exactly. The cases <em>s</em> = 0, 1 or 2 are limiting cases, with the limit as <em>s</em> tends to zero, corresponding to the uniform density on [0, 1]. Particular values (<em>s</em> = 1, 2), empirically verified, are:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691307680?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691307680?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>Here <span><em>γ</em> = 0.57721... is the Euler-Mascheroni constant, see <a href="https://en.wikipedia.org/wiki/Euler%E2%80%93Mascheroni_constant" target="_blank" rel="noopener">here</a>. </span></p>
<p><strong>1.2. The discrete version</strong></p>
<p>These systems also have a discrete distribution attached to them, called the digit distribution, and described in section 2. For the Hurwitz-Riemann case, the probability that a digit is equal to <em>k</em>, is </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691322267?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691322267?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>The expectation is finite only if <em>s</em> > 1. Likewise, the variance is finite only if <em>s</em> > 2. By contrast, the Zipf distribution has <em>P</em>(<em>k</em>) = (1 / <em>ζ</em>(<em>s</em>)) * 1 / <em>k</em>^<em>s</em>.</p>
<p><span style="font-size: 14pt;"><strong>2. A generic family of distributions, with applications</strong></span></p>
<p><span>We are dealing with a particular type of discrete dynamical system defined by </span><em>x</em><span><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>) - INT(<em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>)), where INT is the integer part function, and <em>x</em><span style="font-size: 8pt;">0</span> in [0, 1] is the initial condition. The function <em>p</em>, defined for real numbers in [0, 1], is strictly decreasing and invertible, with <em>p</em>(1) = 1 and <em>p</em>(0) infinite. The results discussed here are valid for the vast majority of initial conditions, nevertheless there are infinitely many exceptions, for instance <em>x</em><span style="font-size: 8pt;">0</span> = 0. These systems are discussed in details in my previous article, <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">here</a>. In this section, only the main results are presented. These systems have the following properties:</span></p>
<ul>
<li><span>The <em>n</em>-th digit of <em>x</em><span style="font-size: 8pt;">0</span> is <em>d<span style="font-size: 8pt;">n</span></em> = INT(<em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>)). These digits are called <a href="https://www.tandfonline.com/doi/abs/10.1080/026811199282100?journalCode=cdss19" target="_blank" rel="noopener">codewords</a> in the context of dynamical systems. The probability that a digit is equal to <em>k</em> (<em>k</em> = 1, 2, 3 and so on) is <em>F</em>(<em>q</em>(<em>k</em>)) - <em>F</em>(<em>q</em>(<em>k</em>+1)) where <em>F</em> and <em>q</em> are defined below. If you know the digits, you can retrieve <em>x</em><span style="font-size: 8pt;">0</span> using the algorithm described in my previous article. </span></li>
<li><span>The invariant distribution <em>F</em>, which is the limit of the empirical distribution of the <em>x<span style="font-size: 8pt;">n</span></em>'s, satisfies the following functional equation: <a href="https://storage.ning.com/topology/rest/1.0/file/get/8691388861?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691388861?profile=RESIZE_710x" width="250" class="align-center"/></a></span></li>
</ul>
<p><span>where <em>q</em> is the inverse of the function <em>p, q</em>' denotes the derivative of <em>q</em>, and <em>f</em> (the invariant density) is the derivative of <em>F</em>. We focus only on the results that are of interest to machine learning professionals. </span></p>
<p><span>Typically numerical methods are needed to solve the above functional equation, however here we are dealing with a large class of dynamical systems for which the theoretical solution is known. The purpose is to test numerical algorithms to check how well and how fast they can approach the exact solution, as discussed in section 2 <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">in my previous article</a>. The invariant distribution <em>F</em> discussed below is far more general than the ones described in my earlier article. </span></p>
<p><strong>2.1. Generalized Hurwitz-Riemann Zeta distribution</strong></p>
<p><span>One way to find a dynamical system with know invariant distribution is to specify that distribution upfront, and then compute the resulting function <em>p</em>(<em>x</em>) that defines the system in question. Based on theory discussed <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">here</a> and <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">here</a>, one can proceed as follows. Try a monotonic increasing function <em>r</em>(<em>x</em>) with <em>r</em>(2) = 1 + <em>r</em>(1). Let <em>F</em>(<em>x</em>) = <em>r</em>(<em>x</em>+1) - <em>r</em>(1), and <em>R</em>(<em>x</em>) = <em>r</em>(<em>x</em>+1) - <em>r</em>(<em>x</em>). Then <em>R</em>(<em>x</em>) = <em>F</em>(<em>q</em>(<em>x</em>)), that is, <em>R</em>(<em>p</em>(<em>x</em>)) = <em>F</em>(<em>x</em>) since <em>q</em>(<em>p</em>(<em>x</em>)) = <em>x</em>. You can retrieve <em>p</em>(<em>x</em>) by inverting <em>R</em>(<em>x</em>). </span></p>
<p><span>A simple but generic example is </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691691652?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691691652?profile=RESIZE_710x" width="190" class="align-center"/></a></span></p>
<p><span>where <em>ψ</em> is a strictly decreasing function with <em>ψ</em>(∞) = 0, <em>ψ</em>(1) = 1, and <em>ψ</em>(0) = ∞. Then you have</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691705091?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691705091?profile=RESIZE_710x" width="280" class="align-center"/></a></span></p>
<p><span>It is easy to show that <em>R</em>(<em>x</em>) = <em>ψ</em>(<em>x</em>), thanks to a careful choice for the function <em>r</em>(<em>x</em>). This explains why the system has a simple theoretical solution; it was indeed built for that purpose. As a consequence, the probability for a digit to be equal to <em>k</em> (<em>k</em> = 1, 2, and so on) is simply equal to <em>P</em>(<em>k</em>) = <em>ψ</em>(<i>k</i>) - <em>ψ</em>(<i>k</i>+1). For more details, see Example 5 <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">in this article</a>, in the section <em>Appendix 1: Exact solution for various 1-D dynamical systems</em>.</span></p>
<p><span>The Hurwitz-Riemann particular case in section 1.1 corresponds to</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691709297?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691709297?profile=RESIZE_710x" width="300" class="align-center"/></a></span></p>
<p>Another particular case corresponds to <span><em>ψ</em>(<em>x</em>) = log<span style="font-size: 8pt;">2</span>(1 + 1/x), where log<span style="font-size: 8pt;">2</span> represents the logarithm in base 2. The associated dynamical system is known as the Gauss map and related to continued fractions. Its digits are the coefficients of continued fractions, and are known to follow a <a href="https://en.wikipedia.org/wiki/Gauss%E2%80%93Kuzmin_distribution" target="_blank" rel="noopener">Gauss-Kuzmin distribution</a>. Also, <em>p</em>(<em>x</em>) = <em>q</em>(x) = 1/<em>x</em>. It is discussed <a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-solve-complex-optimization-problems" target="_blank" rel="noopener">in my previous article</a>. See also Example 2 <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">in this article</a>, in the section <em>Appendix 1: Exact solution for various 1-D dynamical systems</em>.</span></p>
<p><strong>2.2. Application</strong></p>
<p><span>Besides being useful to test optimization algorithms against the exact solution (such as solving the above functional equation), the digits of the system have applications in simulations, encoding, random number generation, and statistical modeling. In particular, below is a picture featuring the typical behavior of the first 2,000 values of <em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>), starting with <em>x</em><span style="font-size: 8pt;">0</span> = 0.5. Depending on the choice of the function <em>ψ</em>,<em> </em>these values may or may not be highly autocorrelated, and in some cases expectation and/or variance are infinite, which implies that the autocorrelation does not exist. The picture below features the Hurwitz-Riemann case with <em>s</em> = 2 (expectation for the digits is finite and equal to <em>ζ</em>(2) = π^2 / 6, but variance is infinite).</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8691827873?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8691827873?profile=RESIZE_710x" width="500" class="align-center"/></a></span></p>
<p><span>Other special distributions are discussed in my previous articles:</span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-family-of-generalized-gaussian-distributions" target="_blank" rel="noopener">New Family of Generalized Gaussian Distributions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/interesting-application-of-the-poisson-binomial-distribution" target="_blank" rel="noopener">Interesting Application of the Poisson-Binomial Distribution</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-strange-family-of-statistical-distributions" target="_blank" rel="noopener">A Strange Family of Statistical Distributions</a></li>
</ul>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
<p></p>
An Easy Way to Solve Complex Optimization Problems in Machine Learning
tag:www.datasciencecentral.com,2021-03-08:6448529:BlogPost:1042655
2021-03-08T03:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641667893?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8641667893?profile=RESIZE_710x" width="400"></img></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.wikiwand.com/en/Test_functions_for_optimization" rel="noopener" target="_blank">here</a></em></p>
<p>There are numerous examples in machine learning, statistics, mathematics and deep learning, requiring an algorithm to solve some complicated equations: for instance, maximum likelihood…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641667893?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641667893?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: <a href="https://www.wikiwand.com/en/Test_functions_for_optimization" target="_blank" rel="noopener">here</a></em></p>
<p>There are numerous examples in machine learning, statistics, mathematics and deep learning, requiring an algorithm to solve some complicated equations: for instance, maximum likelihood estimation (think about logistic regression or the EM algorithm) or gradient methods (think about stochastic or swarm optimization). Here we are dealing with even more difficult problems, where the solution is not a set of optimal parameters (a finite dimensional object), but a function (an infinite dimensional object).</p>
<p>The context is discrete, chaotic dynamical systems, with applications to weather forecasting, population growth models, complex econometric systems, image encryption, chemistry (mixtures), physics (how matter reaches an equilibrium temperature), astronomy (how celestial man-made or natural bodies end up having stable or unstable orbits), or stock market prices, to name a few. These are referred to as complex systems.</p>
<p>The solutions to the problems discussed here requires numerical methods, as usually no exact solution is known. The type of equation to be solved is called <em>functional equation</em> or <em>stochastic integral</em>. We explore a few cases where the exact solution is actually known: this helps assess the efficiency, accuracy and speed of convergence of the numerical methods discussed in this article. These methods are based on the fixed-point algorithm applied to infinite dimensional problems.</p>
<p><span style="font-size: 14pt;"><strong>1. The general problem</strong></span></p>
<p>We are dealing with a discrete dynamical system defined by <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <i>T</i>(<em>x<span style="font-size: 8pt;">n</span></em>), where <i>T</i> is a real-valued function, and <em>x</em><span style="font-size: 8pt;">0</span> is the initial condition. For the sake of simplicity, we restrict ourselves to the case where <em>x<span style="font-size: 8pt;">n</span></em> is in [0, 1]. Generalizations, for instance with <em>x<span style="font-size: 8pt;">n</span></em> being a vector, are described <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">here</a>. The most well known example is the <a href="https://en.wikipedia.org/wiki/Logistic_map" target="_blank" rel="noopener">logistic map</a>, with <i>T</i>(<em>x</em>) = <em>λx</em>(1-<em>x</em>), exhibiting a chaotic behavior or not, depending on the value of the parameter <em><span>λ</span></em>.</p>
<p>In our case, the function <i>T</i>(<em>x</em>) takes the following form: <i>T</i>(<em>x</em>) = <em>p</em>(<em>x</em>) - INT(<em>p</em>(<em>x</em>)), where INT denote the integer part function, <em>p</em>(<em>x</em>) is positive, monotonic, continuous and decreasing (thus bijective) with <em>p</em>(1) = 1 and <em>p</em>(0) infinite. For instance <em>p</em>(<em>x</em>) = 1 / <em>x</em> corresponds to the Gauss map associated with continued fractions; it is the most fundamental and basic example, and I discuss it <a href="https://mathoverflow.net/questions/383925/about-generalized-continued-fractions" target="_blank" rel="noopener">here</a> as well as below in this article. Another example is the Hurwitz-Riemann map, discussed <a href="https://www.datasciencecentral.com/profiles/blogs/hurwitz-riemann-zeta-and-other-special-probability-distributions" target="_blank" rel="noopener">here</a>. </p>
<p><strong>1.1. Invariant distribution and ergodicity</strong></p>
<p>The <em>invariant distribution</em> of the system is the one followed by the successive <em>x<span style="font-size: 8pt;">n</span></em>'s, or in other words, the limit of the empirical distribution attached to the <em>x<span style="font-size: 8pt;">n</span></em>'s, given an initial condition <em>x</em><span style="font-size: 8pt;">0</span>. A lot of interesting properties can be derived if the invariant density <em>f</em>(<em>x</em>) (the derivative of the invariant distribution) is known, assuming it exists. This only works with <a href="https://en.wikipedia.org/wiki/Ergodicity" target="_blank" rel="noopener">ergodic systems</a>. All systems under consideration here are <em>ergodic</em>. The invariant distribution applies to almost all initial conditions <em>x</em><span style="font-size: 8pt;">0</span>, though some <span style="font-size: 8pt;"><span style="font-size: 12pt;"><em>x</em></span>0</span>'s called exceptions, violate the law. This is a typical feature of all these systems. For some systems (the <a href="https://en.wikipedia.org/wiki/Dyadic_transformation" target="_blank" rel="noopener">Bernoulli map</a> for instance), the <em>x</em><span style="font-size: 8pt;">0</span>'s that are not exceptions are called <a href="https://en.wikipedia.org/wiki/Normal_number" target="_blank" rel="noopener">normal numbers</a>. </p>
<p>By ergodic, I mean that for almost any initial condition <em>x</em><span style="font-size: 8pt;">0</span>, the sequence (<em>x<span style="font-size: 8pt;">n</span></em>) eventually visits all parts of [0, 1], in a uniform and random sense. This implies that the average behavior of the system can be deduced from the trajectory of a "typical" sequence (<em>x<span style="font-size: 8pt;">n</span></em>) attached to an initial condition <em>x</em><span style="font-size: 8pt;">0</span>. Equivalently, a sufficiently large collection of random instances of the process (also called orbits) can represent the average statistical properties of the entire process.</p>
<p>Invariant distributions are also called equilibrium or attractor distributions in probability theory.</p>
<p><strong>1.2. The functional equation to be solved</strong></p>
<p>Let us assume that the invariant distribution <em>F</em>(<em>x</em>) can be written as <em>F</em>(<em>x</em>) = <em>r</em>(<em>x</em>+1) − r(1) for some function <i>r</i>. The support domain for <em>F</em>(<em>x</em>) is [0, 1], thus <em>F</em>(0) = 0, <em>F</em>(1) = 1, <em>F</em>(<em>x</em>) = 0 if x < 0, and <em>F</em>(<em>x</em>) = 1 if <em>x</em> > 1. Define <em>R</em>(<em>x</em>) = <em>r</em>(<em>x</em>+1) − <em>r</em>(<em>x</em>). Then we can retrieve <em>p</em>(<em>x</em>) (under some conditions) using the formula</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641305083?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641305083?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>Thus <em>r</em>(<em>x</em>) must be increasing on [1,2] and <em>r</em>(2) = 1 + <em>r</em>(1). Not any function can be an invariant distribution.</p>
<p>In practice, you know <em>p</em>(<em>x</em>) and you try to find the invariant distribution <em>F</em>(<em>x</em>). So the above formula is not useful, except that it helps you create a table of dynamical systems, defined by their function <em>p</em>(<em>x</em>), with known invariant distribution. Such a table is available <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">here</a>, see Appendix 1 in that article, in particular example 5 featuring a Riemann zeta system. It is useful to test the fixed point algorithm described in section 2, when the exact solution is known. </p>
<p>If you only know <em>p</em>(<em>x</em>), to retrieve <em>F</em>(<em>x</em>) or its derivative <em>f</em>(<em>x</em>), you need to solve the following functional equation, whose unknown is the function <em>f</em>. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641363282?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641363282?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>where <em>q</em> is the inverse of the function <em>p</em>. Note that <em>R</em>(<em>x</em>) = <em>F</em>(<em>q</em>(<em>x</em>)) or alternatively, <em>R</em>(<em>p</em>(<em>x</em>)) = <em>F</em>(<em>x</em>), with <em>p</em>(<em>q</em>(<em>x</em>)) = <em>q</em>(<em>p</em>(<em>x</em>)) = <em>x</em>. Also, here <em>x</em> is in [0, 1]. In practice, you get a good approximation if you use the first 1,000 terms in the sum. Typically, the invariant density <em>f</em> is bounded, and the weights |<em>q</em>'(<em>x</em>+<em>k</em>)| are decaying relatively fast as <em>k</em> increases. </p>
<p>The theory behind this is beyond the scope of this article. It is based on the <a href="https://en.wikipedia.org/wiki/Transfer_operator" target="_blank" rel="noopener">transfer operator</a>, and also briefly discussed in one of my previous articles, <a href="https://mathoverflow.net/questions/383925/about-generalized-continued-fractions/383997#383997" target="_blank" rel="noopener">here</a>: see section "Functional equation for <em>f</em>". The invariant density is the eigenfunction of the transfer operator, corresponding to the eigenvalue 1. Also, if <em>x</em> is replaced by a vector (for instance, if working with bivariate dynamical systems), the above formula can be generalized, involving two variables <em>x</em>, <em>y</em>, and the derivative of the (joint) distribution is replaced by a Jacobian. </p>
<p><span style="font-size: 14pt;"><strong>2. Numerical solution via the fixed point algorithm</strong></span></p>
<p>The last formula in section 1.2. suggests a simple iterative algorithm to solve this type of equation. You need to start with an initial function <em>f</em><span style="font-size: 8pt;">0</span>, and in this case, the uniform distribution on [0, 1] is usually a good starting point. That is, <span style="font-size: 12pt;"><em>f</em></span><span style="font-size: 8pt;">0</span>(<em>x</em>) = 1 if <em>x</em> is in [0, 1], and 0 elsewhere. The iterative step is as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641383454?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641383454?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>with <em>x</em> in [0, 1]. Each iteration <em>n</em> generates a whole new function <em>f<span style="font-size: 8pt;">n</span></em> on [0, 1], and the hope is that the algorithm converges as <em>n</em> tends to infinity. If convergence occurs, the limiting function must be the invariant density of the system. This is an example of the <a href="https://en.wikipedia.org/wiki/Fixed-point_iteration" target="_blank" rel="noopener">fixed point algorithm</a>, in infinite dimension.</p>
<p>In practice, you compute <em>f</em>(<em>x</em>) for only (say) 10,000 values of <em>x</em> evenly spaced between 0 and 1. If for instance, <em>f</em><span style="font-size: 8pt;"><em>n</em>+1</span>(0.5) requires the computation of (say) <em>f<span style="font-size: 8pt;">n</span></em>(0.879237...) and the closest value in your array is <em>f<span style="font-size: 8pt;">n</span></em>(0.8792), you replace <em>f<span style="font-size: 8pt;">n</span></em>(0.879237...) by <em>f<span style="font-size: 8pt;">n</span></em>(0.8792) or you use interpolation techniques. This is more efficient than using a function defined recursively in a programming language. Surprisingly the convergence is very fast and in the examples tested, the error between the true solution and the one obtained after 3 iterations, is very small, see picture below.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641440290?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641440290?profile=RESIZE_710x" width="400" class="align-center"/></a>In the above picture, <em>p</em>(<em>x</em>) = <em>q</em>(<em>x</em>) = 1 / <em>x</em>, and the invariant distribution is known: <em>f</em>(<em>x</em>) = 1 / ((1+<em>x</em>)(log 2)). It is pictured in red, and it is related to the <a href="https://en.wikipedia.org/wiki/Gauss%E2%80%93Kuzmin_distribution" target="_blank" rel="noopener">Gauss-Kuzmin distribution</a>. Note that we started with the uniform distribution <em>f</em><span style="font-size: 8pt;">0</span> pictured in black (the flat line). The first iterate <em>f</em><span style="font-size: 8pt;">1</span> is in green, the second one <em>f</em><span style="font-size: 8pt;">2</span> is in grey, and the third one <em>f</em><span style="font-size: 8pt;">3</span> is in orange, and almost undistinguishable from the exact solution in red (I need magnifying glasses to see it). Source code for these computations is available <a href="http://datashaping.com/solve2b.txt" target="_blank" rel="noopener">here</a>. In the source code, there are two extra parameters <span><em>α</em>, <em>λ</em>. When <em>α</em> = <em>λ</em> = 1, it corresponds to the classic case <em>p</em>(<em>x</em>) = 1 / <em>x</em>.</span></p>
<p><span style="font-size: 14pt;"><strong>3. Applications</strong></span></p>
<p>One interesting concept associated with these dynamical systems is that of <em>digit</em>. The <em>n</em>-th digit <em>d<span style="font-size: 8pt;">n</span></em> is defined as INT(<em>p</em>(<em>x</em><span style="font-size: 8pt;">n</span>)) where INT is the integer part function. I call it "digit" because all these systems have a numeration system attached to them, generalizing standard numeration systems which are just a particular case. If you know the digits attached to an initial condition <em>x</em><span style="font-size: 8pt;">0</span>, you can retrieve <em>x</em><span style="font-size: 8pt;">0</span> with a simple algorithm. Start with <em>n</em> = <em>N</em> large enough and <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;"><em>+1</em></span> = 0 (you will get about <em>N</em> digits of accuracy for <em>x</em><span style="font-size: 8pt;">0</span>), and compute iteratively <em>x<span style="font-size: 8pt;">n</span></em> backward from <em>n</em> = <em>N</em> to <em>n</em> = 0 using the recursion <em>x<span style="font-size: 8pt;">n</span></em> = <em>q</em>(<em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> + <em>d<span style="font-size: 8pt;">n</span></em>) - INT(<em>q</em>(<span style="font-size: 8pt;"><span style="font-size: 10pt;">x</span><em>n</em>+1</span> + <span style="font-size: 10pt;"><em>d<span style="font-size: 8pt;">n</span></em></span>)). These digits can be used in encryption systems.</p>
<p>This will be described in detail in my upcoming book <em>Gentle Introduction to Discrete Dynamical Systems</em>. However, the interesting part discussed here is related to statistical modeling. As a starter, let's look at the digits of <em>x</em><span style="font-size: 8pt;">0</span> = <span>π - 3 in two different dynamical systems:</span></p>
<ul>
<li><span><strong>Continued fractions</strong>. Here <em>p</em>(<em>x</em>) = 1 / <em>x</em>. The first 20 digits are 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 3, 3, 23, 1, 1, 7, 4, 35, see <a href="https://oeis.org/A001203" target="_blank" rel="noopener">here</a>. </span></li>
<li><strong>A less chaotic dynamical system</strong>. Here <em>p</em>(<em>x</em>) = (-1 + SQRT(5 +4/<em>x</em>)) / 2. <span>The first 20 digits are </span>2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 26, 1, 3, 1, 10, 1, 1. We also have <em>F</em>(x) = 2<em>x</em> / (<em>x</em>+1).</li>
</ul>
<p>The distribution of the digits is known in both cases. For continued fractions, it is the <a href="https://en.wikipedia.org/wiki/Gauss%E2%80%93Kuzmin_distribution" target="_blank" rel="noopener">Gauss-Kuzmin distribution</a>. For the second system, the probability that a digit is equal to <em>k</em>, is 4 / (<em>k</em>(<em>k</em>+1)(<em>k</em>+2)), see Example 1 <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">in this article</a>. In general, the probability in question is equal to <em>F</em>(<em>q</em>(<em>k</em>)) - <em>F</em>(<em>q</em>(<em>k</em>+1)) for <em>k</em> = 1, 2, and so on. Clearly, the distribution of these digits can be used to quantify the level of chaos in the system. For continued fractions, the expected value of an arbitrary digit is infinite (though it is finite and well known for the logarithm of a digit, see <a href="https://en.wikipedia.org/wiki/Khinchin%27s_constant" target="_blank" rel="noopener">here</a>), while it is finite (equal to 2) for the second system. Yet each system, given enough time, will shoot arbitrarily large digits. Another way to quantify chaos in a dynamical system is to look at the auto-correlation structure of the sequence (<em>x<span style="font-size: 8pt;">n</span></em>). Auto-correlations very close to zero, decaying very fast, are associated with highly chaotic systems. In the case of continued fraction, the lag-1 auto-correlation, defined as the limit of the empirical auto-correlation on a sequence starting with (say) <em>x</em><span style="font-size: 8pt;">0</span> = <span>π - 3, is </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641579290?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641579290?profile=RESIZE_710x" width="250" class="align-center"/></a></span></p>
<p><span>where <em>γ</em> is the <a href="https://en.wikipedia.org/wiki/Euler%E2%80%93Mascheroni_constant" target="_blank" rel="noopener">Euler–Mascheroni constant</a>, see Appendix 2 <a href="https://mathoverflow.net/questions/385156/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin" target="_blank" rel="noopener">in this article</a>. This is probably a new result, never published before.</span></p>
<p><span>Below is a picture featuring the successive values of <em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>) for the smoother dynamical system mentioned above. These values are close to the digits <em>d<span style="font-size: 8pt;">n</span></em>. the initial condition is <em>x</em><span style="font-size: 8pt;">0</span> = π - 3. In my next article, I will further discuss a new way to define and measure chaos in these various systems.</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641636094?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641636094?profile=RESIZE_710x" width="500" class="align-center"/></a></span></p>
<p><span>The first 5,500 values of <em>p</em>(<em>x<span style="font-size: 8pt;">n</span></em>), for <em>n</em> = 0, 1, 2 and so on, are featured in the above picture. Think about what business, natural or industrial process could be modeled by such kinds of time series! The possibilities are endless. For instance, it could represent meteorite hits over a large time period, with a few large values representing massive impacts. Clearly, it can be used in outlier, extreme events, and risk modeling. </span></p>
<p>Finally, here is another example, this time based on an unrelated different bivariate dynamical system on the grid (the cat map), used for image encryption. This is a<span> mapping on a picture of a pair of cherries. The image is 74 pixels wide, and takes 114 iterations to be restored, although it appears upside-down at the halfway point (the 57th iteration). Source: <a href="https://en.wikipedia.org/wiki/Arnold%27s_cat_map" target="_blank" rel="noopener">here</a>. </span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8641638058?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8641638058?profile=RESIZE_710x" class="align-center"/></a></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
A Plethora of Machine Learning Articles: Part 2
tag:www.datasciencecentral.com,2021-03-04:6448529:BlogPost:1041679
2021-03-04T01:44:59.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8629159091?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8629159091?profile=RESIZE_710x" width="400"></img></a></p>
<div class="xg_headline xg_headline-img xg_headline-2l"><div class="tb"><p><a class="xg_sprite xg_sprite-view" href="https://www.datasciencecentral.com/profiles/blog/list?user=3v6n5b6g08kgn"></a></p>
</div>
</div>
<div class="xg_module_body"><div class="postbody"><div class="xg_user_generated"><p style="text-align: center;"><em>Source:…</em></p>
</div>
</div>
</div>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8629159091?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8629159091?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<div class="xg_headline xg_headline-img xg_headline-2l"><div class="tb"><p><a class="xg_sprite xg_sprite-view" href="https://www.datasciencecentral.com/profiles/blog/list?user=3v6n5b6g08kgn"></a></p>
</div>
</div>
<div class="xg_module_body"><div class="postbody"><div class="xg_user_generated"><p style="text-align: center;"><em>Source: see<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/more-beautiful-math-images" target="_blank" rel="noopener">here</a></em></p>
<p><span>Part 1 of this short series focused on the business analytics / BI / operational research aspects, see <a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-machine-learning-articles-part-1" target="_blank" rel="noopener">here</a>. In this Part 2, you will find the most interesting machine learning and statistics articles that I wrote in the last few years, focusing on core technical aspects. The whole series will feature articles related to the following aspects of machine learning:</span></p>
<ul>
<li><span>Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science)</span></li>
<li><span>Opinions, for instance about the value of a PhD in our field, or the use of some techniques</span></li>
<li><span>Methods, principles, rules of thumb, recipes, tricks</span></li>
<li><span>Business analytics (Part 1)</span></li>
</ul>
<p><span>My articles are always written in simple English and accessible to professionals with typically one year of calculus or statistical training, at the undergraduate level. They are geared towards people who use data but are interesting in gaining more practical analytical experience. Managers and decision makers are part of my intended audience. The style is compact, geared towards people who do not have a lot of free time. </span></p>
<p><span>Despite these restrictions, state-of-the-art, of-the-beaten-path results as well as machine learning trade secrets and research material are frequently shared. References to more advanced literature (from myself and other authors) is provided for those who want to dig deeper in the interested topics discussed. </span></p>
<p><span style="font-size: 14pt;"><strong>1. Core techniques</strong></span></p>
<p><span>These articles focus on techniques that have wide applications or that are otherwise fundamental or seminal in nature.</span></p>
<ol>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/introducing-an-all-purpose-robust-fast-simple-non-linear-r22" target="_blank" rel="noopener">Introducing an All-purpose, Robust, Fast, Simple Non-linear Regression</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/chaos-attractors-in-machine-learning-systems" target="_blank" rel="noopener">Variance, Attractors and Behavior of Chaotic Statistical Systems</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-family-of-generalized-gaussian-distributions" target="_blank" rel="noopener">New Family of Generalized Gaussian Distributions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-approach-to-linear-algebra-in-machine-learning" target="_blank" rel="noopener">Gentle Approach to Linear Algebra, with Machine Learning Applications</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/confidence-intervals-without-pain" target="_blank" rel="noopener">Confidence Intervals Without Pain</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/modern-re-sampling-and-statistical-recipes" target="_blank" rel="noopener">Re-sampling: Amazing Results and Applications</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-automatically-determine-the-number-of-clusters-in-your-dat" target="_blank" rel="noopener">How to Automatically Determine the Number of Clusters in your Data</a> - and more</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/decomposition-of-statistical-distributions-using-mixture-models-a" target="_blank" rel="noopener">New Perspectives on Statistical Distributions and Deep Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-plethora-of-original-underused-statistical-tests" target="_blank" rel="noopener">A Plethora of Original, Not Well-Known Statistical Tests</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/pattern-recognition-techniques-application-to-new-decimal-systems?xg_source=activity" target="_blank" rel="noopener">New Decimal Systems - Great Sandbox for Data Scientists and Mathematicians</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/are-the-digits-of-pi-truly-random" target="_blank" rel="noopener">Are the Digits of Pi Truly Random?</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/data-science-and-machine-learning-without-mathematics" target="_blank" rel="noopener">Data Science and Machine Learning Without Mathematics</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel" target="_blank" rel="noopener">Advanced Machine Learning with Basic Excel</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/state-of-the-art-machine-learning-automation-with-hdt" target="_blank" rel="noopener">State-of-the-Art Machine Learning Automation with HDT</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/building-outiler-resistant-centroids-in-any-dimension" target="_blank" rel="noopener">Tutorial: Neutralizing Outliers in Any Dimension</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/the-fundamental-statistics-theorem-revisited" target="_blank" rel="noopener">The Fundamental Statistics Theorem Revisited</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/variance-clustering-test-of-hypotheses-and-density-estimation-rev" target="_blank" rel="noopener">Variance, Clustering, and Density Estimation Revisited</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/the-death-of-the-statistical-test-of-hypothesis" target="_blank" rel="noopener">The Death of the Statistical Tests of Hypotheses</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/5-easy-steps-to-structure-highly-unstructured-big-data" target="_blank" rel="noopener">4 Easy Steps to Structure Highly Unstructured Big Data, via Automated Indexation</a> </li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/the-best-kept-secret-about-linear-and-logistic-regression" target="_blank" rel="noopener">The best kept secret about linear and logistic regression</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/black-box-confidence-intervals-excel-and-perl-implementations-det" target="_blank" rel="noopener">Black-box Confidence Intervals: Excel and Perl Implementation</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/comparing-linear-regression-with-the-jackknife-method" target="_blank" rel="noopener">Jackknife and linear regression in Excel: implementation and comparison</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blogs/jackknife-logistic-and-linear-regression" target="_blank" rel="noopener">Jackknife logistic and linear regression for clustering and predictions</a></li>
</ol>
<p><span style="font-size: 14pt;"><strong>2. Free books</strong></span></p>
<ul>
<li><span><b>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning">here</a>. In about 300 pages and 28 chapters it covers many new topics, offering a fresh perspective on the subject, including rules of thumb and recipes that are easy to automate or integrate in black-box systems, as well as new model-free, data-driven foundations to statistical science and predictive analytics. The approach focuses on robust techniques; it is bottom-up (from applications to theory), in contrast to the traditional top-down approach.</span></p>
<p><span>The material is accessible to practitioners with a one-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications with numerous illustrations, is aimed at practitioners, researchers, and executives in various quantitative fields.</span></p>
</li>
<li><span><b>Applied Stochastic Processes</b></span><p><span>Available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">here</a>. Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems (104 pages, 16 chapters.) This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject.</span></p>
<p><span>It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</span></p>
</li>
</ul>
<p></p>
<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
</div>
</div>
</div>
A Plethora of Machine Learning Articles: Part 1
tag:www.datasciencecentral.com,2021-02-21:6448529:BlogPost:1034367
2021-02-21T23:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8582358874?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8582358874?profile=RESIZE_710x" width="400"></img></a></p>
<p><em>Source: see <a href="https://www.datasciencecentral.com/profiles/blogs/more-beautiful-math-images" rel="noopener" target="_blank">here</a></em></p>
<p><span style="font-size: 12pt;">In Part 1 of this short series, I have included the most interesting articles that I wrote in the last few years. This part focuses on the business analytics / BI /…</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8582358874?profile=original" target="_blank" rel="noopener"><img width="400" class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8582358874?profile=RESIZE_710x"/></a></p>
<p><em>Source: see <a href="https://www.datasciencecentral.com/profiles/blogs/more-beautiful-math-images" target="_blank" rel="noopener">here</a></em></p>
<p><span style="font-size: 12pt;">In Part 1 of this short series, I have included the most interesting articles that I wrote in the last few years. This part focuses on the business analytics / BI / operational research aspects. The next parts will focus on</span></p>
<ul>
<li><span style="font-size: 12pt;">Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science)</span></li>
<li><span style="font-size: 12pt;">Opinions, for instance about the value of a PhD in our field, or the use of some techniques</span></li>
<li><span style="font-size: 12pt;">Methods, principles, rules of thumb, recipes, tricks</span></li>
</ul>
<p><span style="font-size: 12pt;">My articles are always written in simple English and accessible to professionals with typically one year of calculus or statistical training, at the undergraduate level. They are geared towards people who use data but are interesting in gaining more practical analytical experience. Managers and decision makers are part of my intended audience. The style is compact, geared towards people who do not have a lot of free time. </span></p>
<p style="text-align: center;"><em><a href="https://www.datasciencecentral.com/profiles/blogs/more-beautiful-math-images" target="_blank" rel="noopener"></a></em></p>
<p><span style="font-size: 12pt;">Despite these restrictions, state-of-the-art, of-the-beaten-path results as well as machine learning trade secrets and research material are frequently shared. References to more advanced literature (from myself and other authors) is provided for those who want to dig deeper in the interested topics discussed. </span></p>
<p><span style="font-size: 12pt;">Before starting, let me mention in section 1 two books that I wrote recently, available to all Data Science Central members.</span></p>
<p><span style="font-size: 14pt;"><strong>1. Free books</strong></span></p>
<ul>
<li><span style="font-size: 12pt;"><b>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</b></span><p><span style="font-size: 12pt;">Available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning">here</a>. In about 300 pages and 28 chapters it covers many new topics, offering a fresh perspective on the subject, including rules of thumb and recipes that are easy to automate or integrate in black-box systems, as well as new model-free, data-driven foundations to statistical science and predictive analytics. The approach focuses on robust techniques; it is bottom-up (from applications to theory), in contrast to the traditional top-down approach.</span></p>
<p><span style="font-size: 12pt;">The material is accessible to practitioners with a one-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications with numerous illustrations, is aimed at practitioners, researchers, and executives in various quantitative fields.</span></p>
</li>
<li><span style="font-size: 12pt;"><b>Applied Stochastic Processes</b></span><p><span style="font-size: 12pt;">Available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">here</a>. Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems (104 pages, 16 chapters.) This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject.</span></p>
<p><span style="font-size: 12pt;">It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</span></p>
</li>
</ul>
<p><span style="font-size: 14pt;"><strong>2. Business related articles</strong></span></p>
<p><span style="font-size: 12pt;">These articles focus on business applications and other matters relevant to being a data scientist working in the Industry. They are accessible to a wide audience, in the sense that they are less technical than many of my 200+ other articles.</span></p>
<ol>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/data-science-foundations-for-a-new-stock-market" target="_blank" rel="noopener">New Stock Trading and Lottery Game Rooted in Deep Math</a></span></li>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/data-science-wizardry" target="_blank" rel="noopener">Time series, Growth Modeling and Data Science Wizardy</a> </span></li>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-stabilize-data-to-avoid-decay-in-model-performance" target="_blank" rel="noopener">How to Stabilize Data Systems, to Avoid Decay in Model Performance</a></span></li>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/10-differences-between-junior-and-senior-data-scientist" target="_blank" rel="noopener">22 Differences Between Junior and Senior Data Scientists</a></span></li>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/the-first-things-you-should-learn-as-a-data-scientist-not-what-yo" target="_blank" rel="noopener">The First Things you Should Learn as a Data Scientist - Not what you Think</a></span></li>
<li><span style="font-size: 12pt;"><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning" target="_blank" rel="noopener">Difference between Machine Learning, Data Science, AI, Deep Learning, and Statistics</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/20-data-science-systems-used-by-amazon-to-operate-its-business" target="_blank" rel="noopener">21 data science systems used by Amazon to operate its business</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/life-cycle-of-data-science-projects" target="_blank" rel="noopener">Life Cycle of Data Science Projects</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/40-techniques-used-by-data-scientists" target="_blank" rel="noopener">40 Techniques Used by Data Scientists</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/helping-facebook-design-better-machine-learning-algorithms" target="_blank" rel="noopener">Designing better algorithms: 5 case studies</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/the-data-science-zoo" target="_blank" rel="noopener">Architecture of Data Science Projects</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/24-uses-of-statistical-modeling-part-ii" target="_blank" rel="noopener">24 Uses of Statistical Modeling (Part II)</a> | <a href="http://www.datasciencecentral.com/profiles/blogs/top-20-uses-of-statistical-modeling" target="_blank" rel="noopener">(Part I)</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/the-abcd-s-of-business-optimization" target="_blank" rel="noopener">The ABCD's of Business Optimization</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/is-data-science-a-sin-against-the-norms-of-statisticians" target="_blank" rel="noopener">What you won't learn in stats classes</a></span></li>
<li><span style="font-size: 12pt;"><a href="http://www.datasciencecentral.com/profiles/blogs/biased-vs-unbiased-debunking-statistical-myths" target="_blank" rel="noopener">Biased vs Unbiased: Debunking Statistical Myths</a></span></li>
</ol>
<p></p>
<p><span style="font-size: 12pt;"><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></span></p>
<p><span style="font-size: 12pt;"><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></span></p>
<p></p>
Maximum runs in Bernoulli trials: simulations and results
tag:www.datasciencecentral.com,2021-02-16:6448529:BlogPost:1029341
2021-02-16T08:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8561683465?profile=original" rel="noopener" target="_blank"><img class="align-full" src="https://storage.ning.com/topology/rest/1.0/file/get/8561683465?profile=RESIZE_710x" width="720"></img></a></p>
<p>Bernoulli trials are <span>random</span><span> experiments with two possible outcomes: "yes" and "no" (in the case of polls), </span><span> "success" and "failure" (in the case of gambling or clinical trials). The trials are independent from each other: for instance tossing a coin multiple times, or testing the success of a new drug against a specific…</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8561683465?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8561683465?profile=RESIZE_710x" width="720" class="align-full"/></a></p>
<p>Bernoulli trials are <span>random</span><span> experiments with two possible outcomes: "yes" and "no" (in the case of polls), </span><span> "success" and "failure" (in the case of gambling or clinical trials). The trials are independent from each other: for instance tossing a coin multiple times, or testing the success of a new drug against a specific medical condition, on multiple patients: improvements for a specific patient is viewed as a success, lack of improvement as a failure. </span></p>
<p><span>Here we are interested in maximum runs of successes (also called record runs), when they are expected to occur, and their expected length or duration. While the classical application is in games of chance, we will discuss an exciting application in number theory, more specifically, very good approximations of irrational numbers by rational numbers, and numeration systems with a non-integer base. We will also consider the case where the trials are not independent, and where there are more than two outcomes. For instance, if throwing a dice rather than a coin, there are six rather than two outcomes.</span></p>
<p><span>The data used here is simulated and allows us to get some good approximations for a number of interesting statistics. It is based on an unusual pseudo-random number generator that is very relevant to the problem being studied. A more theoretical approach can be found <a href="https://www.csun.edu/~hcmth031/tspolr.pdf" target="_blank" rel="noopener">here</a>, with connections to extreme value theory and the Gumbel distribution. See also my previous article <em>Distribution of Arrival Times for Extreme Events</em>, posted <a href="https://www.datasciencecentral.com/profiles/blogs/distribution-of-arrival-times-of-extreme-events" target="_blank" rel="noopener">here</a>. </span></p>
<p><span style="font-size: 14pt;"><strong>1. Simulations and theoretical results</strong></span></p>
<p>Bernoulli trials with <em>b</em> potential outcomes, each with the same probability of occurring, can be simulated using the following system. Start with some irrational number <em>x</em><span style="font-size: 8pt;">0</span> in [0, 1], say <em>x</em><span style="font-size: 8pt;">0</span> = log 2 (called the <em>seed</em>), and use the following iterations:</p>
<p style="text-align: center;"><em>a<span style="font-size: 8pt;">n</span></em> = INT(<em>b x<span style="font-size: 8pt;">n</span></em>)</p>
<p style="text-align: center;"><em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span> = <em>b x<span style="font-size: 8pt;">n</span></em> - INT(<em>b x<span style="font-size: 8pt;">n</span></em>).</p>
<p>INT represents the integer part function. The result of the <em>n</em>-th trial is <em>a<span style="font-size: 8pt;">n</span></em>: it is a coding integer between 0 and <em>b</em> - 1 inclusive, representing for instance the result of throwing a dice with <em>b</em> sides labeled 0, ..., <em>b</em> - 1. Also, <em>a<span style="font-size: 8pt;">n</span></em> is the <em>n</em>-th digit of <em>x</em><span style="font-size: 8pt;">0</span> in base <em>b</em>. These digits are strongly conjectured to be independent from each other, and have the same probability 1 / <em>b</em> to take on any of the <em>b</em> potential values. Thus this scheme can be used to simulate the Bernoulli trials in question. Also, unlike traditional pseudorandom number generators, it does not produce periodic sequences. Such a system can be viewed as a chaotic dynamical system, just like the sine map discussed in my previous article, <a href="https://www.datasciencecentral.com/profiles/blogs/beautiful-mathematical-images" target="_blank" rel="noopener">here</a>. </p>
<p>The Bernoulli trials generated with <em>x</em><span style="font-size: 8pt;">0</span>, that is the sequence <em>a</em><span style="font-size: 8pt;">0</span>, <em>a</em><span style="font-size: 8pt;">1</span>, and so on, constitutes just one instance of a Bernoulli experiment. If you try with <em>N</em> different seeds (the number <em>x</em><span style="font-size: 8pt;">0</span>), then you end up with <em>N</em> different, independent instances of Bernoulli experiments sharing the same dynamics, and things start to become interesting.</p>
<p><strong>1.1. Simulations</strong></p>
<p>I performed <em>N</em> = 200 simulations, each representing a Bernoulli experiment starting with a different seed <em>x</em><span style="font-size: 8pt;">0</span> each time, each consisting of 1,000,000 trials, with <em>b</em> = 3. Possible outcomes of each trial are 0, 1 or 2. I looked at successive record runs of zeros. For one of these experiments (a typical case), I've found this:</p>
<ul>
<li>One isolated zero (the first occurrence of zero) starts at position <em>n</em> = 3</li>
<li>The first run of 2 zeros starts at position 13 in the digits expansion</li>
<li>The next longer run consists of 3 zeros, starting at position 69</li>
<li>The next longer one (4 zeros) starts at position 132</li>
<li>Then we have 5 zeros starting at position 670, then 6 starting at position 743, 8 starting at position 13411, 10 starting at position 58454, and 12 starting at position 384100.</li>
</ul>
<p>The observations can be summarized by the following bivariate sequence:</p>
<p style="text-align: center;">(3,1), (13,2), (69,3), (132,4), (670,5), (743,6), (13411,8), (58454,10), (384100,12), …</p>
<p>If you blend all the sequences of vectors (<em>X</em>, <em>Y</em>) together, from the 200 experiments, you get the following: </p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8558262452?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8558262452?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>Record runs of Y zeros vs the position X at which they occur in a Bernoulli experiment</em></p>
<p>Note that in Figure 1, the plot represents <em>Y</em> versus log(<em>X</em>), and <em>b</em> = 3. A record run equal to <em>Y</em> means that starting at position <em>X</em>, we observe the first instance of a (record) run consisting of <em>Y</em> consecutive zeros, in at least one of the <em>N</em> experiments. In Figure 2 featuring aggregated data, you can see the average log(<em>X</em>) computed across the <em>N</em> = 200 experiments, for any record run of length <em>Y</em> = 0, 1, 2, and so on (up to <em>Y</em> = 13). The chart speaks for itself; in the linear fit in Figure 2, the slope approaches log <em>b</em> as <em>N</em> tends to infinity.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8558364664?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8558364664?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>Same as Figure 1, with log(X) averaged across the N = 200 experiments</em></p>
<p><strong>1.2. Theory</strong></p>
<p>A lot of theoretical results are known for maximum runs. We present a few of them here, with additional references. Note that in my article, I focus on record runs, which are different from maximum runs: in any Bernoulli experiment, maximum runs correspond to the first occurrence of a run of length 2, 3, 4, and so on. Record runs, as in the example outlined at the beginning of section 1, do not necessarily increase by unit increments: in my example, the first run of length 7 (not a record) occurs after the first (record) run of length 8. In short, you see a run of length 8 before you see one of length 7.</p>
<p>The main theoretical results, provided by <a href="https://mathoverflow.net/questions/383353/distribution-of-the-first-occurrence-of-a-maximum-record-run-of-zeros-in-the-d/383388#383388" target="_blank" rel="noopener">Yuval Peres</a>, are:</p>
<ul>
<li>Let <em>R<span style="font-size: 8pt;">n</span></em> be the length of the longest run in the first <em>n</em> digits. Then <em>R<span style="font-size: 8pt;">n</span></em> log(<em>b</em>) / log(<em>n</em>) tends to 1 almost surely as <em>n</em> tends to infinity. It was first proved by Renyi, see the discussion in reference [1].</li>
<li>The waiting times <em>T<span style="font-size: 8pt;">k</span></em> for the occurrence of a run of length <em>k</em> satisfy that <em>T<span style="font-size: 8pt;">k</span></em> / E(<em>T<span style="font-size: 8pt;">k</span></em>) is asymptotically exponentially distributed with mean 1. See references [2] - [4]. We also have (see reference [5] and [7]): <a href="https://storage.ning.com/topology/rest/1.0/file/get/8558795279?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8558795279?profile=RESIZE_710x" width="100" class="align-center"/></a></li>
</ul>
<p>All references are in section 3. Note that these theoretical results apply to any run, not just runs of zeros. </p>
<p><span style="font-size: 14pt;"><strong>2. Application and generalization</strong></span></p>
<p>If you replace the integer <em>b</em> by a non integer (strictly larger than 1), then the Bernoulli trials will inherit the properties of that unusual numeration system:</p>
<ul>
<li>The number of potential outcomes, for any trial, is INT(<em>b</em>), the integer part of <em>b</em></li>
<li>The trials are no longer independent: the <em>n</em>-th outcome <em>a<span style="font-size: 8pt;">n</span></em> is correlated with <em>a<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span></li>
<li>Outcomes have different probabilities: P(<em>a<span style="font-size: 8pt;">n</span></em> = 0) is not the same as P(<em>a<span style="font-size: 8pt;">n</span></em> = 1)</li>
</ul>
<p>Nevertheless, once can still perform the same simulations to estimate the statistics of interest. If <em>b</em> is a quadratic irrational, the corresponding successive outcomes (the <em>a<span style="font-size: 8pt;">n</span></em>'s) follow a Markov chain model. See <a href="https://www.jstage.jst.go.jp/article/jmath1948/26/1/26_1_33/_pdf" target="_blank" rel="noopener">here</a> for the theoretical details.</p>
<p>Regardless of whether <em>b</em> is an integer or not, the application we are interested in is the approximation of irrational numbers by a specific class of numbers. This is usually done using continued fractions if the class of numbers in question consists of the rational numbers, and there is an abundant literature on this topic, see for instance <a href="https://mathoverflow.net/questions/383142/algebraic-and-rational-parts-of-a-real-number" target="_blank" rel="noopener">here</a>. However, we focus instead on best approximations of an irrational number <em>x</em><span style="font-size: 8pt;">0</span> in [0, 1] by a rational number <em>β<span style="font-size: 8pt;">n</span></em>, where</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8558617871?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8558617871?profile=RESIZE_710x" width="120" class="align-center"/></a></p>
<p>Note that <em>β<span style="font-size: 8pt;">n</span></em><span> can be expressed as </span><em>p<span style="font-size: 8pt;">n</span></em> / <em>q<span style="font-size: 8pt;">n</span></em>, a quotient of two integers if <em>b</em> is an integer, with <em>q<span style="font-size: 8pt;">n</span></em> being equal to <em>b</em> at the power <em>n</em>. The best approximation is obtained when the <em>a<span style="font-size: 8pt;">k</span></em>'s are the successive outcomes of the Bernoulli experiment with seed <em>x</em><span style="font-size: 8pt;">0</span>, or in other words, the first <em>n</em> digits of <em>x</em><span style="font-size: 8pt;">0</span> in base <em>b</em>. The approximation is exceptionally good if the last digit <em>a<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">-1</span> is not zero, and it is followed ty a record run of digits equal to zero. The length of that run is expected to be asymptotically of the order to (log <em>n</em>) / (log <em>b</em>). It can not be better than that, for a fixed <em>n</em>. Therefore, I propose the following conjecture, based on the probability distributions associated with extreme (record) runs discussed in section 1.</p>
<p><strong>Conjecture</strong></p>
<p>For most numbers <em>x</em><span style="font-size: 8pt;">0</span> in [0, 1], and for any <span><em>ε</em> > 0,</span> if <em>p</em> / <em>q</em> is an approximation of <em>x</em><span style="font-size: 8pt;">0</span>, with <em>p</em>, <em>q</em> co-prime positive integers, we have</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8558683066?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8558683066?profile=RESIZE_710x" width="150" class="align-center"/></a></p>
<p>The details as how I came to this conjecture are outlined in the section<em> Connection with approximations of irrationals by rational numbers</em>, in <a href="https://mathoverflow.net/questions/383353/distribution-of-the-first-occurrence-of-a-maximum-record-run-of-zeros-in-the-d/" target="_blank" rel="noopener">this article</a>. While this is beyond the scope of this article, a discussion of best approximations by continued fractions leads to a similar conclusion. In particular, if <em>p<span style="font-size: 8pt;">n</span></em> / <em>q<span style="font-size: 8pt;">n</span></em> is the <em>n</em>-th convergent of the number <em>x</em>, we have the following result, see last theorem in <a href="https://math.colorado.edu/~rohi1040/expository/ergodicthysimplecontfracs.pdf" target="_blank" rel="noopener">this article</a>, pictured below. In short, it says that if <span><em>ε</em> = 0, then only some proportion of all numbers <em>x</em><span style="font-size: 8pt;">0</span> will satisfy the above inequality. With <em>ε</em> > 0, almost all <em>x</em><span style="font-size: 8pt;">0</span> will. </span></p>
<p></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8585977467?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8585977467?profile=RESIZE_710x" width="600" class="align-center"/></a></span></p>
<p></p>
<p>Finally, record runs in Bernoulli trials is a topic of combinatorial analysis, and thus relevant to machine learning, with numerous applications in combinatorics. Also, you can learn more about non-integer bases in <a href="https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness" target="_blank" rel="noopener">this article</a>. A summary table is available <a href="https://www.datasciencecentral.com/profiles/blogs/number-representation-systems-explained-in-one-picture" target="_blank" rel="noopener">here</a>.</p>
<p><span style="font-size: 14pt;"><strong>3. References</strong></span></p>
<p>[1] Schilling, Mark F. <em>The longest run of heads</em>. The College Mathematics Journal 21, no. 3 (1990): 196-207.</p>
<p>[2] Aldous, David. <em>Probability approximations via the Poisson clumping heuristic</em>. Vol. 77. Springer Science & Business Media, 2013.</p>
<p>[3] Földes, A. <em>The limit distribution of the length of the longest head-run</em>. Period Math Hung 10, 301–310</p>
<p>[4] Godbole, Anant P. <em>Poisson approximations for runs and patterns of rare events</em>. Advances in applied probability (1991): 851-865.</p>
<p>[5] Feller, William. <em>An introduction to probability theory and its applications</em>. 1957.</p>
<p>[6] Gerber, Hans U., and Shuo-Yen Robert Li. <em>The occurrence of sequence patterns in repeated experiments and hitting times in a Markov chain</em>. Stochastic Processes and their Applications 11, no. 1 (1981): 101-108.</p>
<p>[7] Li, Shuo-Yen Robert. <em>A martingale approach to the study of occurrence of sequence patterns in repeated experiments</em>. Annals of Probability 8, no. 6 (1980): 1171-1176.</p>
<p></p>
<p><em>To receive a weekly digest of our new articles, subscribe to our newsletter,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at<span> </span><a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></p>
More Surprising Math Images
tag:www.datasciencecentral.com,2021-02-08:6448529:BlogPost:1022670
2021-02-08T04:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><em>To zoom in on any picture, click on the image to get a higher resolution.</em></p>
<p>This a follow up to my previous article <a href="https://www.datasciencecentral.com/profiles/blogs/beautiful-mathematical-images" rel="noopener" target="_blank">here</a>, where you can find additional, very different images, the theory behind it, and relevance to machine learning techniques. What is surprising is that all these images were produced with a formula with a single parameter <em>λ</em>, and…</p>
<p><em>To zoom in on any picture, click on the image to get a higher resolution.</em></p>
<p>This a follow up to my previous article <a href="https://www.datasciencecentral.com/profiles/blogs/beautiful-mathematical-images" target="_blank" rel="noopener">here</a>, where you can find additional, very different images, the theory behind it, and relevance to machine learning techniques. What is surprising is that all these images were produced with a formula with a single parameter <em>λ</em>, and they look very different depending on the value of <em>λ</em>. More precisely, they are generated using the following recursion:</p>
<p style="text-align: center;"><em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span><span> </span>=<span> </span><em>x<span style="font-size: 8pt;">n</span></em><span> </span>+ <em>λ</em><span> </span>sin(<em>y<span style="font-size: 8pt;">n</span></em>),</p>
<p style="text-align: center;"><em>y</em><span style="font-size: 8pt;"><em>n</em>+1</span><span> </span>=<span> </span><em>x<span style="font-size: 8pt;">n</span></em><span> </span>+ <em>λ</em><span> </span>sin(<em>x<span style="font-size: 8pt;">n</span></em>),</p>
<p>with initial conditions <em>x</em><span style="font-size: 8pt;">0</span>, <em>y</em><span style="font-size: 8pt;">0</span>. </p>
<p>Seven different groups of three images are displayed. In each group, the leftmost image, a scatterplot (in blue) corresponds to the orbit of (<em>x<span style="font-size: 8pt;">n</span></em>, <em>y<span style="font-size: 8pt;">n</span></em>) in two dimensions, given the initial conditions. The central images features <em>x<span style="font-size: 8pt;">n</span></em> and <em>y<span style="font-size: 8pt;">n</span></em> as two time series, with <em>x<span style="font-size: 8pt;">n</span></em> in blue and <em>y<span style="font-size: 8pt;">n</span></em> in red. In both cases, 20,000 iterations are used. The rightmost image is the same as the leftmost one, except that only the first 25 iterations are displayed, and a green curve connects the 25 dots, to show how the orbit looks like at the beginning. The initial vector (<em>x</em><span style="font-size: 8pt;">0</span>, <em>y</em><span style="font-size: 8pt;">0</span>) is not included in that image.</p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530324885?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530324885?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>x<span style="font-size: 8pt;">0</span> = 1, y<span style="font-size: 8pt;">0</span> = 4, λ = 0.04</em></p>
<p style="text-align: center;"></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530326887?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530326887?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>x<span style="font-size: 8pt;">0</span> = 1, y<span style="font-size: 8pt;">0</span> = 4, λ = 0.06</em></p>
<p style="text-align: center;"></p>
<p style="text-align: center;"><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530323258?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530323258?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3</strong>: <em>x<span style="font-size: 8pt;">0</span> = 3, y<span style="font-size: 8pt;">0</span> = 4, λ = 1.5</em></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530331493?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530331493?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 4</strong>: <em>x<span style="font-size: 8pt;">0</span> = 56, y<span style="font-size: 8pt;">0</span> = 4, λ = 0.04</em></p>
<p style="text-align: center;"></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530366692?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530366692?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 5</strong>: <em>x<span style="font-size: 8pt;">0</span> = 2, y<span style="font-size: 8pt;">0</span> = 4, λ = 10</em></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530385678?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530385678?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 6</strong>: <em>x<span>0</span> = 1, y<span>0</span> = 4, λ = 2.5</em></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8530386883?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8530386883?profile=RESIZE_710x" width="700" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 7</strong>: <em>x<span style="font-size: 8pt;">0</span> = 3, y<span style="font-size: 8pt;">0</span> = 4, λ = 2</em></p>
<p></p>
<p>As a bonus, here is another picture produced with a different type of chaotic dynamical system. It is discussed <a href="https://mathoverflow.net/questions/352967/is-this-a-new-strange-attractor" target="_blank" rel="noopener">here</a>. </p>
<p></p>
<p><em><a href="https://storage.ning.com/topology/rest/1.0/file/get/8582320259?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8582320259?profile=RESIZE_710x" width="400" class="align-center"/></a></em></p>
<p></p>
<p>Another interesting one can be found <a href="https://arxiv.org/pdf/1508.07814.pdf" target="_blank" rel="noopener">here</a> (page 21):</p>
<p><em><a href="https://storage.ning.com/topology/rest/1.0/file/get/8609092274?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8609092274?profile=RESIZE_710x" width="400" class="align-center"/></a></em></p>
<p></p>
<p><em>To receive a weekly digest of our new articles, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at <a href="http://datashaping.com/" target="_blank" rel="noopener">DataShaping.com</a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="http://datashaping.com/" target="_blank" rel="noopener">here</a>.</em></p>
Beautiful Mathematical Images
tag:www.datasciencecentral.com,2021-02-02:6448529:BlogPost:1018503
2021-02-02T19:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><em>To zoom in on any picture, click on the image to get a higher resolution.</em></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8505475867?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8505475867?profile=RESIZE_710x" width="400"></img></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>The pillow basins (see section 3)</em></p>
<p style="text-align: center;"></p>
<p style="text-align: left;">The topic discussed here is closely related to optimization techniques in machine…</p>
<p><em>To zoom in on any picture, click on the image to get a higher resolution.</em></p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8505475867?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8505475867?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>The pillow basins (see section 3)</em></p>
<p style="text-align: center;"></p>
<p style="text-align: left;">The topic discussed here is closely related to optimization techniques in machine learning. I will also talk about dynamic systems, especially discrete chaotic ones, in two dimensions. This is a fascinating branch of quantitative science, with numerous applications. This article provides you with an opportunity to gain exposure to this discipline, which is usually overlooked by data scientists but well studied by mathematicians and physicists. The images presented here are selected not just for their beauty, but most importantly for their intrinsic value: the practical insights that can be derived from them, and the implications for machine learning professionals. </p>
<p style="text-align: left;"></p>
<p><span style="font-size: 14pt;"><strong>1. Introduction to dynamical systems</strong></span></p>
<p>A discrete dynamical system is a sequence <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>f</em>(<em>x<span style="font-size: 8pt;">n</span></em>) where <em>n</em> is an integer, starting with <em>n</em> = 0 (the initial condition) and where <em>f</em> is a real-valued function. In the continuous version (not discussed here), the index <em>n</em> (also called time) is a real number. The function <em>f</em> is called the <em>map</em> of the system, the system itself is also called a <em>mapping</em>: the most studied one is the logistic map defined by <em>f</em>(<em>x</em>) = <span><em>ρ</em></span><em>x</em> (1 - <em>x</em>), with <em>x</em> in [0, 1]. When <span><em>ρ</em> = 4, it is fully chaotic. </span>The sequence (<em>x<span style="font-size: 8pt;">n</span></em>) for a specific initial condition <em>x</em><span style="font-size: 8pt;">0</span>, is called the <em>orbit</em>. </p>
<p>Another example of chaotic mapping is the digits in base <em>b</em> of an irrational number <em>z</em> in [0,1]. In this case, <em>x</em><span style="font-size: 8pt;">0</span> = <em>z</em>, <em>f</em>(<em>x</em>) = <em>bx</em> - INT(<em>bx</em>) and the <em>n</em>-th digit of <em>z</em> is INT(<em>bx<span style="font-size: 8pt;">n</span></em>). Here INT is the integer part function. It is studied in details in my book <em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems</em><span>, </span><span>available for free, <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" target="_blank" rel="noopener">here</a>. See also the second, large appendix in my free book </span><span><em>Statistics: New Foundations, Toolbox, and Machine Learning Recipes</em>, available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">here</a>. Applications include the design of non-periodic pseudo-random number generators, cryptography, and even a new concept of number guessing (gambling or simulated stock market) where the winning numbers can be computed in advance with a public algorithm that requires trillions of years of computing time, while a fast, private algorithm is kept secret. See <a href="https://www.datasciencecentral.com/profiles/blogs/data-science-foundations-for-a-new-stock-market" target="_blank" rel="noopener">here</a>. </span></p>
<p>The concept easily generalizes to two dimensions. In this case <em>x<span style="font-size: 8pt;">n</span></em> is a vector or a complex number. Mappings in the complex plane are known to produce beautiful fractals; it has been used in fractal compression algorithms, to compress images. In one dimension, once in chaotic mode, they produce Brownian-like orbits, with applications in Fintech.</p>
<p><strong>1.1. The sine map</strong></p>
<p>Moving forward, we focus exclusively on a particular case of the <em>sine mapping</em>, both in one and two dimensions. This is one of the most simple nonlinear mappings, yet it is very versatile and produces a large number of varied and intriguing configurations. In one dimension, it is defined as follows:</p>
<p style="text-align: center;"><em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = -<em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>x<span style="font-size: 8pt;">n</span></em>).</p>
<p>In two dimensions, it is defined as</p>
<p style="text-align: center;"><em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = -<em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>y<span style="font-size: 8pt;">n</span></em>),</p>
<p style="text-align: center;"><em>y</em><span style="font-size: 8pt;"><em>n</em>+1</span> = -<em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>x<span style="font-size: 8pt;">n</span></em>).</p>
<p></p>
<p>This system is governed by two real parameters: <span><em>ρ</em> and</span> <span><em>λ</em>. Some of its properties and references are discussed <a href="https://mathoverflow.net/questions/382610/strange-behavior-of-x-n1-x-n-lambda-sin-x-n" target="_blank" rel="noopener">here</a>. </span></p>
<p><span style="font-size: 14pt;"><strong>2. Connection to machine learning optimization algorithms</strong></span></p>
<p>I need to introduce two more concepts before getting down to the interesting stuff. The first one is called <em>fixed point</em>. Note that a root is simply a value <em>x</em>* such that <em>f</em>(<em>x</em>*) = 0. Some systems don't have any root, some have one, some have several, and some have infinitely many, depending on the values of the parameters (in our case, depending on <em>ρ</em> and<em> λ</em>, see section 1.1). Some or all roots can be found using the following <em>fixed point</em> recursion: <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> + <em>f</em>(<em>x<span style="font-size: 8pt;">n</span></em>). In our case, this translates to using the following algorithm.</p>
<p><strong>2.1. Fixed point algorithm</strong></p>
<p>For our sine mapping defined in section 1.1, proceed as follows</p>
<p style="text-align: center;"><em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> - <em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>x<span style="font-size: 8pt;">n</span></em>)</p>
<p>in one dimension, or </p>
<p style="text-align: center;"><em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> - <em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>y<span style="font-size: 8pt;">n</span></em>),</p>
<p style="text-align: center;"><em>y</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> - <em>ρx<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>x<span style="font-size: 8pt;">n</span></em>),</p>
<p>in two dimensions. If the sequences in question converge to some <em>x</em>* (one dimension) or <em>x</em>*, <em>y</em>* (two dimensions), then the limit in question is a fixed point of the system. To find as many fixed points as possible, you need to try many different initial conditions. Some initial conditions lead to one fixed point, some lead to another fixed point, some lead to nowhere. Some fixed points can never be reached no matter what initial conditions you use. This is illustrated later in this article. </p>
<p><strong>2.2. Connection to optimization algorithms</strong></p>
<p>Optimization techniques are widely used in machine learning and statistical science, for instance in deep neural networks, or if you want to find a maximum likelihood estimator.</p>
<p>When looking for the maxima or minima of a function <em>f</em>, you try to find the roots of the derivative of <em>f</em> (in one dimension) or by vanishing its gradient (in two dimensions). This is typically done using the Newton Raphson method, which is a particular type of fixed point algorithm, with quadratic convergence.</p>
<p><strong>2.3. Basins of attraction</strong></p>
<p>The second concept I introduce is <em>basins of attraction</em>. A basin of attraction is the full set of initial conditions such that when applying the fixed point algorithm in section 2.2, the fixed point iterations always converge to the same root <em>x</em>* of the system.</p>
<p>Let me illustrate this with the one-dimensional sine mapping, with <em>ρ</em> = 0 and <em>λ </em>= 1. The roots of the system are solutions to sin(<em>x</em>) = 0, that is <em>x</em>* = <em>k</em><span><em>π</em>, where <em>k</em> is any positive or negative integer. If the initial condition <em>x</em><span style="font-size: 8pt;">0</span> is anywhere in the open interval ]2<em>kπ</em>, 2(<em>k</em>+1)<em>π</em>[, then the fixed point algorithm always converge to the same <em>x</em>* = (2<em>k</em> + 1)<em>π</em>. So each of these intervals constitute a distinct basin of attraction, and there are infinitely many of them. However, none of the roots <em>x</em>* = 2<em>kπ</em> can be reached regardless of the initial condition <em>x</em><span style="font-size: 8pt;">0</span>, unless <em>x</em><span style="font-size: 8pt;">0</span> = <em>x</em>* = 2<em>kπ</em> itself. </span></p>
<p><span>In two dimensions, the basins of attractions look beautiful when plotted. Some have fractal boundaries. I believe none of their boundaries have an explicit, closed-form equation, except in trivial cases. This is illustrated in section 3, featuring the beautiful images promised at the beginning. </span></p>
<p><strong>2.4. Final note about the one-dimensional sine map</strong></p>
<p><span>The sequence <em>x</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>x<span style="font-size: 8pt;">n</span></em> + <em>λ</em> sin(<em>x<span style="font-size: 8pt;">n</span></em>) behaves as follows. Here we assume <em>λ</em> > 0 and <em>ρ</em> = 0.</span></p>
<ul>
<li><span>If <em>λ </em> < 1, it converges to a root <em>x</em>*</span></li>
<li><span>If <em>λ =</em> 4, it oscillates constantly in a narrow horizontal band, never converging</span></li>
<li><span>If <em>λ </em> > 6, it behaves chaotically as a Brownian motion, unbounded, with the following exception below</span></li>
</ul>
<p><span>There is a very narrow interval around <em>λ =</em> 8, where behavior is non-chaotic. In that case, <em>x<span style="font-size: 8pt;">n</span></em> is asymptotically equivalent to +2<em>π n</em> or - 2<em>π n</em>, and the sign depends on the initial condition <em>x</em><span style="font-size: 8pt;">0</span>, and is very sensitive to it. In addition, for instance if <em>x</em><span style="font-size: 8pt;">0</span> = 2 and <em>λ </em>= 8, then <em>x</em><span style="font-size: 8pt;">2<em>n</em></span> - <em>x</em><span style="font-size: 8pt;">2<em>n</em>-1</span> gets closer and closer to <em>α</em> = 7.939712..., and <em>x</em><span style="font-size: 8pt;">2<em>n</em>-1</span> - <em>x</em><span style="font-size: 8pt;">2<em>n</em>-2</span> gets closer and closer to <em>β</em> = -1.65653... as <em>n</em> increases, with <em>α</em> + <em>β</em> = 2<em>π</em>. Furthermore, <em>α</em> satisfies the equation</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8505364456?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8505364456?profile=RESIZE_710x" width="300" class="align-center"/></a></span></p>
<p><span style="font-size: 12pt;">For details, see <a href="https://mathoverflow.net/questions/382610/strange-behavior-of-x-n1-x-n-lambda-sin-x-n" target="_blank" rel="noopener">here</a>. The phenomenon in question is pictured in Figure 2 below. </span></p>
<p><span style="font-size: 10pt;"><a href="https://storage.ning.com/topology/rest/1.0/file/get/8507389694?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8507391075?profile=RESIZE_710x" width="400" class="align-center"/></a></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8507394283?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8507394283?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><span style="font-size: 12pt;"><strong>Figure 2</strong>: <em>x<span style="font-size: 8pt;">n</span> for n = 0 to 20,000 (X-axis), with x<span style="font-size: 8pt;">0</span> = 2; λ = 8 (top), λ = 7.98 (bottom)</em></span></p>
<p><span style="font-size: 14pt;"><strong>3. Beautiful math images and their implications</strong></span></p>
<p>The first picture (Figure 1, at the top of the article) features part of the four non-degenerate basins of attraction in the 2-dimensional sine map, when <span><em>λ =</em> 2 and <em>ρ </em>= 0.75. This sine map has 49 = 7 x 7 roots (<em>x</em>*, <em>y</em>*) with <em>x</em>* one of the 7 solutions of <em>ρ</em>x = <em>λ </em>sin(<em>λ</em> sin(<em>x</em>) / <em>ρ</em>), and <em>y</em>* also one of the 7 solutions of the same equation. Computations were performed using the fixed point algorithm described in section 2.1. Note that the white zone corresponds to initial conditions (<em>x</em><span style="font-size: 8pt;">0</span>, <em>y</em><span style="font-size: 8pt;">0</span>) that do not lead to convergence of the fixed point algorithm. Each basin is assigned one color (other than white), and is made of sections of pillows with same color, scattered all over across many pillows. I call it the pillow basins. It would be interesting to see if the basin boundaries can be represented by simple mathematical functions. One degenerate basin (the fifth basin) consisting of the diagonal line <em>x</em> = <em>y</em>, is not displayed in Figure 1.</span></p>
<p>The picture below (Figure 3) shows parts of 5 of the infinitely many basins of attractions corresponding to <span><em>λ</em></span> = 0.5 and <span><em>ρ</em></span> = 0, for the 2-dimensional site map. As in figure 1, the X-axis represents <em>x</em><span style="font-size: 8pt;">0</span>, the Y-axis represents <em>y</em><span style="font-size: 8pt;">0</span>. The range is from -4 to 4 both in Figure 1 and Figure 3. Each basin has its own color.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8507115889?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8507115889?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><span><strong>Figure 3</strong>: <em>The octopus basins</em></span></p>
<p><span>In this case, we have infinitely many roots (with <em>x</em>*, <em>y</em>* being a multiple of <em>π</em>) but only one-fourth of them can be reached by the fixed point algorithm. The more roots, the more basins, and as a result, the more interference between basins, making the image look noisy: a very small change in the initial conditions can lead to convergence to a different root, thus the overlapping between the basins. </span></p>
<p><span>The take out from this is that when dealing with an optimization problem with many local maxima and minima, the solution you get is very sensitive to the initial conditions. In some cases, it matters, and in some cases it does not. If you are looking for a local optimum only, this is not an issue. This is further illustrated in Figure 4 below. It shows the orbits - that is the locations of (<em>x<span style="font-size: 8pt;">n</span></em>, <em>y<span style="font-size: 8pt;">n</span></em>) - starting with four different initial conditions (<em>x</em>0, <em>y</em>0), for the sine map featured in Figure 1. The blue dots represent a root (<em>x</em>*, <em>y</em>*). Each orbit except the green one converges to a different root. The green one oscillates back and forth, never converging.</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8507235501?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8507235501?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><span><strong>Figure 4</strong>: <em>Four orbits corresponding to four initial conditions, for the case shown in Figure 1 </em></span></p>
<p><strong>Note:</strong> When the system is very sensitive to initial conditions and highly chaotic, orbits computed numerically may be all wrong as round-off errors propagate exponentially fast as <em>n</em> increases. In that case, it is needed to use high precision computing to get accurate orbits, see <a href="https://www.datasciencecentral.com/profiles/blogs/high-precision-computing-benchmark-examples-and-tutorial" target="_blank" rel="noopener">here</a>.</p>
<p><strong>3.1. Benchmarking clustering algorithms</strong></p>
<p><span>The basins of attractions can be used to benchmark supervised clustering algorithms. For instance, in Figure 1, if you group the red and black basins together, and the yellow and blue basins together, you end up with two well separated groups whose boundaries can be determined to arbitrary precision. One can sample points from the merged basins to create a training set with two groups, and check how well your clustering algorithm (based for instance on nearest neighbors or density estimation) can estimate the true boundaries. Another machine learning problem that you can test on these basins is boundary estimation: the problem consists in finding the boundary of a domain when you know points that are inside and points that are outside the domain. </span></p>
<p><strong>3.2. Interesting probability problem</strong></p>
<p><span>The case pictured in Figure 1 leads to an interesting question. If you pick up randomly a vector of initial conditions (<em>x</em><span style="font-size: 8pt;">0</span>, <em>y</em><span style="font-size: 8pt;">0</span>), what is the probability that it will fall in (say) the red basin? It turns out that the probabilities are identical regardless of the basin. However, the probability to fall outside any basin (the white area) is different.</span></p>
<p><em>More beautiful images can be found in Part 2 of this article, <a href="https://www.datasciencecentral.com/profiles/blogs/more-beautiful-math-images" target="_blank" rel="noopener">here</a>. To not miss them, subscribe to our newsletter, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. See also <a href="https://www.datasciencecentral.com/profiles/blogs/deep-visualizations-riemann-s-conjecture" target="_blank" rel="noopener">this article</a>, featuring an image entitled "the eye of the Riemann Zeta function". See also the Wikipedia article about "Infinite Compositions of Analytic Functions", <a href="https://en.wikipedia.org/wiki/Infinite_compositions_of_analytic_functions#:~:text=In%20mathematics%2C%20infinite%20compositions%20of,convergence%2Fdivergence%20of%20these%20expansions." target="_blank" rel="noopener">here</a>. The picture below is from that article.</em></p>
<p></p>
<p><em><a href="https://storage.ning.com/topology/rest/1.0/file/get/8572990262?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8572990262?profile=RESIZE_710x" width="400" class="align-center"/></a></em></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He is also the founder and investor in<span> </span><a href="https://www.parisrestaurantandbar.com/blog" target="_blank" rel="noopener">Paris Restaurant</a><span> </span>in Anacortes, WA. You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>. </em></p>
<p></p>
Can a Diploma from a Lower Ranking University Hurt your Data Science Career Prospects?
tag:www.datasciencecentral.com,2021-01-29:6448529:BlogPost:1015350
2021-01-29T04:16:13.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here I specifically discuss the case of a PhD degree from a third-tier university, though to some extent, it also applies to master degrees. Many professionals joining companies such as Facebook, Microsoft, or Google in a role other than a programmer, typically have a PhD degree, although there are many exceptions. It is still possible to learn data science on the job, especially if you have a quantitative background (say in physics or engineering) and have experience working with serious…</p>
<p>Here I specifically discuss the case of a PhD degree from a third-tier university, though to some extent, it also applies to master degrees. Many professionals joining companies such as Facebook, Microsoft, or Google in a role other than a programmer, typically have a PhD degree, although there are many exceptions. It is still possible to learn data science on the job, especially if you have a quantitative background (say in physics or engineering) and have experience working with serious data: see <a href="https://www.datasciencecentral.com/profiles/blogs/is-it-still-possible-today-to-become-a-self-taught-data-scientist" target="_blank" rel="noopener">here</a>. After all, learning Python is not that hard and can be done via data camps. What is more difficult to acquire is the analytical maturity. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8492386293?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8492386293?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p style="text-align: center;"><em>University of Namur</em></p>
<p>In my cased, I did my PhD at the University of Namur, a place that nobody has heard of. The topic of my research was computational statistics and image analysis. These were hot topics back then, and I was also lucky to work part-time in the corporate world for a state-of-the-art GIS (Geographic Information System) company, working with engineers on digital satellite images, as part of my PhD program, thanks to my mentor. Much of what I worked on is still very active these days, on a much bigger scale. It was the precursor of automated driving systems, and the math department in my alma mater was young and still very creative back then. This brings me to my first advice when choosing a PhD program.</p>
<p><strong>Advice #1</strong></p>
<ul>
<li>If you come from a poor background, your options might be more limited (this was my case), and you need to leverage everything you can. My parents did not have the money to send me to expensive schools, and I ended up attending the closest one to avoid spending a lot of money on rent. On the plus side, I did not accumulate student loans.</li>
<li>Before deciding on a PhD program, carefully choose your mentor. Mine was not known for his research, but he was well connected to the industry, managed to get money to fund his projects, and was working on exciting, applied projects. </li>
</ul>
<p>A side effect on my last piece of advice is that if your goal is to stay in Academia, you may have to rely on yourself to make your research worthy of publications and susceptible to land you a tenured position. The way I did it is summarized in my next advice. You want ideally to leave all doors open, both Academia and other options.</p>
<p><strong>Advice #2</strong></p>
<ul>
<li>Be proactive about reaching out to well respected professors in your field. Attend conferences and meet peers from around the world. Accept jobs such as reviewers. Start publishing in third-tier journals, move to second-tier, and then get a few ones in first-tier journals before completing your PhD. The one I published in <em>Journal of Statistical Society, Series B</em>, is what resulted in me being accepted as a postdoc at Cambridge University. Initially when it was accepted, it only had my name on it. </li>
<li>It helps to be passionate about what you do. My very first paper was in <em>Journal of Number Theory</em>, during my first year as a PhD student. It happened because I had a passion for number theory that I developed during my middle-school and high-school years. I hated high-school math (repetitive, boring mechanical exercises) but loved the math that I discovered and self-learned myself during these years, mostly through reading. I was the only student to participate (and be a finalist) at the national Math Olympiads, in my school. When you are young, it's something good to have on your resume. </li>
</ul>
<p>So to answer the original question - does it hurt coming from a low ranking school - at this point you know that you can still succeed despite the odds. But it requires patience, perseverance, and you must be very good at what you do. Perhaps the biggest drawback is the lack of great connections that top schools offer. You have to make up for that. Also great schools have state-of-the-art equipment and labs (so you can learn the most modern stuff), but somehow my little math department didn't lack these, so I was not penalized for that. I also cultivated great relationships with the computer science department. At the end, my research was at the intersection of math, statistics and computer science.</p>
<p>My last piece of advice is about what happens next after completing your PhD. In my case, I started a postdoc at Cambridge then moved to the corporate world (after failing a job interview for a tenured position) and eventually became entrepreneur, VC-funded executive, and sold my last venture recently to a publicly traded company. I still do independent math research, even more so and of higher caliber than during my PhD years. </p>
<p><strong>Advice #3</strong></p>
<ul>
<li>Contact other successful professionals who came from a third-tier university to ask for their advice. In my math department, two other PhD students in my cohort ended up having a stellar career: Michel Bierlaire (postdoc MIT after Namur) is now full professor at EPFL; Didier Burton (also postdoc MIT after Namur) ended up as an executive at Yahoo. </li>
<li>If you can, leverage the fact that you are very applied, don't have student loans, and thus you can ask a lower salary, be more competitive, gain various horizontal experience in many places while developing world-class expertise in a few areas. I eventually realized that working for myself (not as consultant, but entrepreneur) was what I liked best.</li>
</ul>
<p>You may argue that you don't need any diploma to create your own self-funded company, not even elementary school, but in the end I believe I got the best I could out of my PhD. In my case, it also implied relocating several times, from Belgium (due to lack of jobs) to UK to United States, and from the East Coast to the Bay Area and finally Seattle. I've been through various bubbles and market crashes; you may use your analytical skills to navigate them the best you can, selling and buying at the right time, understanding the markets, and emerge stronger each time. </p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He is also the founder and investor in<span> </span><a href="https://www.parisrestaurantandbar.com/blog" target="_blank" rel="noopener">Paris Restaurant</a><span> </span>in Anacortes, WA. You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
Moving Averages: Natural Weights, Iterated Convolutions, and Central Limit Theorem
tag:www.datasciencecentral.com,2021-01-26:6448529:BlogPost:1011806
2021-01-26T02:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Convolution is a concept well known to machine learning and signal processing professionals. In this article, we explain in simple English how a moving average is actually a discrete convolution, and we use this fact to build weighted moving averages with natural weights that at the limit, have a Gaussian behavior guaranteed by the Central Limit Theorem. Moving averages are nothing more than blurring filters for signal processing experts, with a Gaussian-like kernel in the case discussed…</p>
<p>Convolution is a concept well known to machine learning and signal processing professionals. In this article, we explain in simple English how a moving average is actually a discrete convolution, and we use this fact to build weighted moving averages with natural weights that at the limit, have a Gaussian behavior guaranteed by the Central Limit Theorem. Moving averages are nothing more than blurring filters for signal processing experts, with a Gaussian-like kernel in the case discussed here. Inverting a moving average to recover the original signal consists in applying the inverse filter, known as a sharpening or enhancing filter. The inverse filter is used for instance in image analysis, to remove noise or deblur an image, while the original filter (the moving average) does the opposite. This is discussed here for one dimensional discrete signals, known as time series. Generalizations are also discussed. An interesting application in number theory, related to the famous unsolved Riemann conjecture, is also discussed.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8470334300?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8470334300?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>Bell-shaped distribution for re-scaled coefficients (the weights) discussed in section 1.1</em></p>
<p><span style="font-size: 14pt;"><b style="font-size: 14pt;">1.</b> <span style="font-size: 18.6667px;"><b>Weighted</b></span><b style="font-size: 14pt;"> moving averages as convolutions</b></span></p>
<p>Given a discrete time series with observations <em>X</em>(0), <em>X</em>(1), <em>X</em>(2)<i> </i>and so on, a weighted moving average can be defined by</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8469896293?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8469896293?profile=RESIZE_710x" width="350" class="align-center"/></a></p>
<p>Here <em>Y</em>(<em>t</em>) is the smoothed signal and <em>h</em> is a discrete density function (thus summing to one) though negative values of <em>h</em>(<em>k</em>) are sometimes used, for instance in Spencer's 15-point moving average used by actuaries, see <a href="https://mathworld.wolfram.com/Spencers15-PointMovingAverage.html" target="_blank" rel="noopener">here</a>. We assume that <em>t</em> can take on negative integer values. Also, unless otherwise specified, we assume the weights to be symmetrical, that is, <em>h</em>(<em>k</em>) = <em>h</em>(-<em>k</em>). The parameter <em>N</em> can be infinite, but typically, the values <em>h</em>(<em>k</em>) are fast decaying the further away you are from <em>k</em> = 0. </p>
<p>The notation used by mathematicians to represent this transformation is as follows: <em>Y</em> = <em>T</em>(<em>X</em>) = <em>h</em> * <em>X</em> where * is the convolution operator. This notation is convenient because it easily allows us to define the iterated moving average as a self-composition of the operator <em>T</em>, acting on the time series <em>X </em>: Start with <em>Y</em><span style="font-size: 8pt;">0</span> = <em>X</em>, <em>Y</em><span style="font-size: 8pt;">1</span> = <em>Y</em>, and let <em>Y</em><span style="font-size: 8pt;"><em>n</em>+1</span> = <em>T</em>(<em>Y<span style="font-size: 8pt;">n</span></em>) = <em>h</em> * <em>Y<span style="font-size: 8pt;">n</span></em>. Likewise, we can define <span style="font-size: 12pt;"><em>h<span style="font-size: 8pt;">n</span></em></span> (with <em>h</em><span style="font-size: 8pt;">1</span> = <em>h</em>) as <em>h</em> * <em>h</em> * ... * <em>h</em>, that is, an <em>n</em>-fold self-convolution of <em>h</em>. Of course, <em>Y<span style="font-size: 8pt;">n</span></em> = <em>h<span style="font-size: 8pt;">n</span></em> * <em>X</em> so that we have</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8469956688?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8469956688?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>Note that the sum goes from -<em>N<span style="font-size: 8pt;">n</span></em> to <em>N<span style="font-size: 8pt;">n</span></em> this time, as each additional iteration increases the number of terms in the sum, so <em>N<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span> > <em>N<span style="font-size: 8pt;">n</span></em>, with <em>N</em><span style="font-size: 8pt;">1</span> = <em>N</em>. This becomes clear in the following illustration.</p>
<p><strong>1.1 Example</strong></p>
<p>The most basic case corresponds to <em>N</em> = 1, with <em>h</em>(-1) = <em>h</em>(0) = <em>h</em>(1) = 1/3. In this case, <em>N<span style="font-size: 8pt;">n</span></em> = <em>n</em>, and the average value of <em>h<span style="font-size: 8pt;">n</span></em>(<em>k</em>) is equal to 1 / (2<em>N<span style="font-size: 8pt;">n</span></em> +1). We have the following table:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8470110072?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8470110072?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>The above table shows how the weights are automatically determined, without guess work, rule of thumb, or fine-tuning required. Note that the sum of the elements in the <em>n</em>-th row is always equal to 3^<em>n</em> (3 at power <em>n</em>). This is very similar to the binomial coefficients table, and <em>h<span style="font-size: 8pt;">n</span></em>(<em>k</em>) are known as the trinomial coefficients, see <a href="https://oeis.org/search?q=1%2C6%2C21%2C50%2C90%2C126&language=english&go=Search" target="_blank" rel="noopener">here</a>. The difference is that for binomial coefficients, the sum of the elements in the <em>n</em>-th row is always equal to 2^<em>n</em>, and the <em>n</em>-th row only has <em>n</em> + 1 entries, versus 2<em>n</em> + 1 in our table. The values <em>h<span style="font-size: 8pt;">n</span></em>(<em>k</em>) corresponding to <em>n</em> = 100 are displayed in Figure 1, at the top of this article. They have been scaled by a factor equal to the square root of <em>N<span style="font-size: 8pt;">n</span></em>, since otherwise they all tend to zero as <em>n</em> tends to infinity. </p>
<p><strong>1.2 Link to the Central Limit Theorem</strong></p>
<p>The methodology developed here can be used to prove the central limit theorem in the most classic way. Indeed, the classic proof uses iterated self-convolutions, and the fact that the Fourier transform of convolutions is the product of the individual Fourier transform of each convolution. The Fourier transform is called characteristic function in probability theory. Interestingly, this leads to Gaussian approximations for partial sums of coefficients such as those in the <em>n</em>-th row, in the above table, when <em>n</em> is large and after proper rescaling. This is already well known for binomial coefficients (see <a href="http://www.ams.org/publicoutreach/feature-column/fcarc-normal" target="_blank" rel="noopener">here</a>), and it easily extends to the coefficients introduced here, as well as to many other types of mathematical coefficients. See also Figure 1.</p>
<p><span style="font-size: 14pt;"><strong>2. Inverting a moving average, and generalizations</strong></span></p>
<p>Inverting a moving average consists in retrieving the original time series or signal. It consists in applying the inverse filter to the observed data, to un-smooth it. It is usually not possible to do it, though the true answer is somewhat more nuanced. It is certainly easier to do when <em>N</em> is small, though usually <em>N</em> is not known, and the weights are also unknown. However if the observed data is the result of applying the simple convolution described in section 1.2 with <em>N</em> = 1, you only need to know the values of <em>X</em>(<em>t</em>) at two different times <em>t</em><span style="font-size: 8pt;">0</span> and <em>t</em><span style="font-size: 8pt;">1</span> to retrieve the original signal. This is easiest if you know <em>X</em>(<em>t</em>) at <em>t</em><span style="font-size: 8pt;">0</span> = 0 and at <em>t</em><span style="font-size: 8pt;">1</span> = 1: in this case, there is a simple inversion formula: </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8470438871?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8470438871?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>If you know <em>X</em>(0), <em>X</em>(1), and <em>Y</em>(<em>t</em>) for all <em>t</em>'s, you can iteratively retrieve <em>X</em>(2), <em>X</em>(3), and so on with the above recurrence formula. If you don't know <em>X</em>(0), <em>X</em>(1) but instead you know the variance and other higher moments of <em>X</em>(<em>t</em>), assuming <em>X</em>(<em>t</em>) is stationary, then you may test various <em>X</em>(0), <em>X</em>(1) until you find a pair matching these moments when reconstructing the full sequence <em>X</em>(<em>t</em>) using the above recurrence formula. The solution may not be unique. Other parameters you know about <em>X</em>(<em>t</em>) may be useful too for the reconstruction: the period (if any), the slope of a linear trend (if any), and so on. </p>
<p><strong>2.1 Generalizations</strong></p>
<p>The moving averages discussed here rely on the classic arithmetic mean as the fundamental convolution operator, corresponding to <em>N</em> = 1. It is possible to use other means such as the harmonic or geometric means, and even more general as those defined <a href="https://www.datasciencecentral.com/profiles/blogs/alternative-to-the-arithmetic-geometric-and-harmonic-means" target="_blank" rel="noopener">in this article</a>. It can be generalized to two or higher dimensions, and to a time-continuous signal. For prediction or extrapolation, see <a href="https://www.datasciencecentral.com/profiles/blogs/introducing-an-all-purpose-robust-fast-simple-non-linear-r22" target="_blank" rel="noopener">this article</a>. For interpolation, that is to estimate <em>X</em>(<em>t</em>) when <em>t</em> is not an integer, <a href="https://mathoverflow.net/questions/376081/infinite-partial-fraction-expansions-to-compute-fractional-iterations-and-recurr" target="_blank" rel="noopener">see this article</a>. </p>
<p><span style="font-size: 14pt;"><strong>3. Application and source code</strong></span></p>
<p>We applied the above methodology with <em>n</em> = 60 to the following time series, with 60 < <em>t</em> < 240 being an integer:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8477710477?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8477710477?profile=RESIZE_710x" width="250" class="align-center"/></a></p>
<p>Figure 2 shows <em>Y<span style="font-size: 8pt;">n</span></em>(<em>t</em>) with <em>n</em> = 60 (the red curve), after shifting and rescaling (multiplying) it by a factor of order sqrt(<em>n</em>). In this case, <em>X</em>(2<em>t</em>) represents the real part of the <a href="https://en.wikipedia.org/wiki/Dirichlet_eta_function" target="_blank" rel="noopener">Dirichlet Eta function</a> <span><em>η</em> </span>defined in the complex plane. If you replace the cosine by a sine in the definition of <em>X</em>(<em>t</em>), you get similar results for the imaginary part of <em>η</em>. What is spectacular here is that <em>Y<span style="font-size: 8pt;">n</span></em>(<em>t</em>) is very well approximated by a cosine function, see bottom of figure 2. The implication is that thanks to the self-convolution used here, we can approximate the real and imaginary parts of <span><em>η</em> </span>by a simple auto-regressive model. This in turn may have implications to help solve the famous <a href="https://www.datasciencecentral.com/profiles/blogs/deep-visualizations-riemann-s-conjecture" target="_blank" rel="noopener">Riemann Hypothesis</a> (RH) which essentially consists in locating the values of <em>t</em> such that <em>X</em>(2<em>t</em>) = 0 simultaneously for the real and imaginary part of <em>η</em>. RH states that there is no such <em>t</em> in our particular case, where a parameter 0.75 is used in the definition of <em>X</em>(<em>t</em>). It is conjectured to also be true if you replace 0.75 by any value strictly between 0.5 and 1. See more <a href="https://www.datasciencecentral.com/profiles/blogs/deep-visualizations-riemann-s-conjecture" target="_blank" rel="noopener">here</a> and <a href="https://mathoverflow.net/questions/382043/incredibly-accurate-recursions-for-the-riemann-zeta-function" target="_blank" rel="noopener">here</a>. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8477209286?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8477209286?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>weighted moving average (WMA) with n = 60 (top), model fitting with cosine function (bottom)</em></p>
<p>Note that <em>X</em>(<em>t</em>), the blue curve, is non-periodic, while the red curve is almost perfectly periodic. If you use arbitrary moving averages instead of the one based on a convolution <em>h<span style="font-size: 8pt;">n</span></em> * <em>X</em>, you won't get a perfect fit in the bottom part of figure 2, certainly not a perfect fit with a simple cosine function. <a href="https://storage.ning.com/topology/rest/1.0/file/get/8477213652?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8477213652?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3</strong>: <em>same as top part of figure 2, but using a different X(t) for the blue curve</em></p>
<p>Also, the perfect fit can not be achieved if you replace the logarithm in the definition of <em>X</em>(<em>t</em>), by a much faster growing function. This is illustrated in figure 3, where the logarithm in <em>X</em>(<em>t</em>) was replaced by a square root.</p>
<p>The source code can be downloaded <a href="https://storage.ning.com/topology/rest/1.0/file/get/8477763473?profile=original" target="_blank" rel="noopener">here</a> (convol2b.pl.txt). Since it is dealing with convolutions, it can be further optimized using Fast Fourier Transforms (FFT), see <a href="http://www.dspguide.com/ch18/2.htm" target="_blank" rel="noopener">here</a>. Finally, it would be interesting to treat this case assuming the time <em>t</em> is continuous, using continuous rather than discrete convolutions.</p>
<p></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> He is also the founder and investor in <a href="https://www.parisrestaurantandbar.com/blog" target="_blank" rel="noopener">Paris Restaurant</a> in Anacortes, WA. You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
Machine Learning / Stats / BI: Mini Translation Dictionary
tag:www.datasciencecentral.com,2021-01-19:6448529:BlogPost:1008950
2021-01-19T06:12:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here I provide translations for various important terms, to help professionals from related backgrounds better understand each other. In particular, machine learning professionals versus statisticians.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8438181275?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8438181275?profile=RESIZE_710x" width="600"></img></a></p>
<p style="text-align: center;"><em>Source for picture:…</em></p>
<p>Here I provide translations for various important terms, to help professionals from related backgrounds better understand each other. In particular, machine learning professionals versus statisticians.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8438181275?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8438181275?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source for picture: <a href="https://www.datasciencecentral.com/profiles/blogs/machine-learning-vs-statistics-in-one-picture" target="_blank" rel="noopener">here</a></em></p>
<p><strong>Feature</strong> (machine learning)</p>
<p>A feature is known as a variable or independent variable in statistics. It is also known as a predictor by predictive analytics professionals. </p>
<p><strong>Response</strong></p>
<p>The response is called dependent variable in statistics. Machine learning professionals sometimes call it the output. </p>
<p><strong>R-square</strong></p>
<p>This is the statistics used by statisticians to measure the performance of a model. There are many better alternatives. Machine learning professionals sometimes call it goodness-of-fit metric. </p>
<p><strong>Regression</strong></p>
<p>Sometimes called maximum likelihood regression or linear regression by statisticians. Physicists and signal processing / operations research professionals use the term ordinary least squares instead. And yes, it is possible to compute confidence intervals (CI) without underlying models. They are called data-driven, and rely on simulations and empirical percentile distributions. </p>
<p><strong>Logistic transform</strong></p>
<p>The term used in the context of neural networks is sigmoid. Statisticians are more familiar with the word logistic, as in logistic regression.</p>
<p><strong>Neural networks</strong></p>
<p>While not exactly the same thing, statisticians have they own multi-layers hierarchical networks: they are called Bayesian hierarchical networks.</p>
<p><strong>Test of hypothesis</strong></p>
<p>Business intelligence professionals call it A/B testing, or multivariate testing.</p>
<p><strong>Boosted models</strong></p>
<p>Boosted models are used by machine learning professionals to blend multiple models and get the best of each model. Statisticians call them ensemble techniques.</p>
<p><strong>Confidence intervals</strong></p>
<p>We are all familiar with this concept invented by statisticians. Alternative terms include prediction intervals, or error (not to be confused with predictive or residual error, as it has its own meaning for statisticians).</p>
<p><strong>Grouping</strong></p>
<p>Also known as aggregating, and consisting in grouping values of some feature or independent variable, especially in decision trees to reduce the number of nodes. Machine learning professionals call it feature binning. </p>
<p><strong>Taxonomy</strong></p>
<p>When applied to unstructured text data, the creation of a taxonomy (sometimes called ontology) is referred to as natural language processing. It is basically clustering of text data.</p>
<p><strong>Clustering</strong></p>
<p>Statisticians call it clustering. In machine learning, the concept is referred to as unsupervised classification. To the contrary, supervised clustering is a learning technique based on training sets and cross-validation. </p>
<p><strong>Control set</strong></p>
<p>Machine learning professionals use control and test sets. Statisticians use the term cross-validation or bootstrapping, as well as training sets. </p>
<p><strong>Model fitting</strong></p>
<p>The terms favored by machine learning professionals is model selection, testing, and feature selection. Model performance has its own statistical related term: <em>p</em>-value, though it less used recently. </p>
<p><strong>False positives</strong></p>
<p>Instead of false positives and false negatives, statisticians favor type I and type II errors.</p>
<p>Another similar dictionary can be found <a href="https://insights.sei.cmu.edu/sei_blog/2018/11/translating-between-statistics-and-machine-learning.html" target="_blank" rel="noopener">here</a>. </p>
<p></p>
<p><br/> <em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
Deep visualizations to Help Solve Riemann's Conjecture
tag:www.datasciencecentral.com,2021-01-06:6448529:BlogPost:1007807
2021-01-06T06:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>This is the second part of my article <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" rel="noopener" target="_blank">Spectacular Visualization: The Eye of the Riemann Zeta Function</a>, focusing on the most infamous unsolved mathematical conjecture, one that has a $1 million dollar price attached to it. I used the word <em>deep</em> not in the sense of deep neural networks, but because the implications of these…</p>
<p>This is the second part of my article <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">Spectacular Visualization: The Eye of the Riemann Zeta Function</a>, focusing on the most infamous unsolved mathematical conjecture, one that has a $1 million dollar price attached to it. I used the word <em>deep</em> not in the sense of deep neural networks, but because the implications of these visualizations have deep consequences on how to solve this conjecture, opening a new path of attack and featuring non-standard generalizations leading to new perspectives and new approaches so solve RH (as the conjecture is called in mathematical circles). </p>
<p>This work is mostly based on data science, and the results presented here are experimental in nature and still need to be proved formally. The main visualization featuring 6 scatterplots is published here for the first time: it shows the orbits of 3 Riemann-like functions, their <em>eyes</em>, and their surprising ring-shaped error distribution when only the first few hundred terms are used in the series defining these functions. It deviates from classical pure-math approaches in the sense that what I do looks more like stochastic dynamical systems, attractors, wavelets, and should appeal to data analysts, engineers and physicists.</p>
<p>The problem is so popular that there are YouTube videos about it, some having gathered several million of views. One of them is also featured here. My own scatterplots show the behavior of a new class of Riemann-like functions, as well as interesting slices of the orbit that are rarely (if ever) displayed in the literature, revealing peculiar features that could help in solving RH.</p>
<p><span style="font-size: 14pt;"><strong>1. Orbits of Riemann-like Functions</strong></span></p>
<p>The main picture in this article consists of the 6 plots below. Click on the picture to zoom in.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8392563253?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8392563253?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong><em>: Orbit (top) and residual error (bottom) for cosine (left),</em> <em>triangular (middle) and square wave (right)</em></p>
<p>I explain later in this section what they represent. But first, I need to introduce some material. Let </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8392571275?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8392571275?profile=RESIZE_710x" width="350" class="align-center"/></a></p>
<p>be a function of <em>t</em>, with 0.5 < <em>σ</em> < 1 fixed, and <em>α</em>, <em>β</em>, <em>γ</em> three real parameters. This generalizes the function <em>ϕ</em> introduced <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">in my previous article</a>. This time, <em>λ</em>(<em>n</em>) = <em>n</em> and <em>α</em> = 0, <em>β</em> = 1. Also, we are dealing with two sister functions of <em>t</em>, namely <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em>(<em>σ</em>, <em>t</em>; <em>α</em>, <em>β</em>, <em>γ</em>) with<em> γ </em>= 0, and the shifted <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em>(<em>σ</em>, <em>t</em>; <em>α</em>, <em>β</em>, <em>γ</em>) with <em>γ </em>= -π/2. They represent respectively the real and imaginary part of some function defined on the complex plane. The Riemann Hypothesis (RH), corresponding to <em>W</em>(<em>x</em>) = cos <em>x</em>, states that there is no zero of the Riemann zeta function <span><em>ζ</em>(<em>s</em>), with <em>s</em> = <em>σ </em>+ <em>it</em> a complex number, if 0.5 < <em>σ</em> < 1. Here <em>i</em> represents the imaginary unit whose square is -1. In layman's term, it means that we can not have <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) = <em>ϕ<span style="font-size: 8pt;">2</span></em>(<em>σ</em>, <em>t</em>) = 0 if 0.5 < <em>σ</em> < 1. You win $1 million if you prove it, see <a href="https://www.claymath.org/millennium-problems/riemann-hypothesis" target="_blank" rel="noopener">here</a>. </span></p>
<p><span>The novelty in my method is the introduction of a periodic wave function <em>W</em> in the definition of <em>ϕ</em>, thus generalizing RH in a way different from what other mathematicians did, that is, without using complicated <a href="https://en.wikipedia.org/wiki/L-function" target="_blank" rel="noopener">L-functions</a>. </span>This offers more hopes to solve Riemann's conjecture (RH), by first trying to prove it for the easiest <em>W</em>, and understand what those <em>W</em>'s having an RH attached to them (as opposed to those that do not) have in common. </p>
<p>Figure 1 (upper part) displays the spectacular orbits for three different waves (cosine, triangular and alternating-quadratic) in the test case <em>σ</em> = 0.75 and 0 < <em>t</em> < 600, with the hole around the origin (I call it the <em>eye</em>) being the hallmark of RH behavior: that is, no root for that particular value of <em>σ</em>, regardless of <em>t</em>, because of the hole. Though not displayed here, in the case <em>σ</em> = 0.5, the hole is entirely gone and corresponds to the <em>critical line</em> (the name given by mathematicians) where all the zeroes are found.</p>
<p>The orbit consists, for a fixed <em>σ</em>, of the points (<em>X</em>(<em>t</em>),<em>Y</em>(<em>t</em>)) with <em>X</em>(<em>t</em>) = <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) and <em>Y</em>(<em>t</em>) = <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>). The bottom three plots represent the error between the true value (<em>X</em>(<em>t</em>),<em>Y</em>(<em>t</em>)) and its approximation based on using only the first 200 terms in the series that defines <em>ϕ</em>. The error distribution is very surprising; I was expecting the points to be radially but randomly distributed around the origin; instead, they are located on a ring. Note that for <em>t</em> > 600 (and for the triangular wave, for <em>t</em> > 80) you need to use more than 200 terms for the pattern to remain strong.</p>
<p>In Figure 1, the left part of the plot corresponds to the cosine wave (that is, classical RH), the middle part corresponds to the triangular wave, and the right part corresponds to the alternating quadratic wave. Interestingly, when <em>σ</em> = 1/2 the orbit does not have a hole anymore as predicted, yet the error points are still distributed on a similar ring.</p>
<p>The wave <em>W</em> is a continuous periodic function of period 2π, with one minimum equal to −1 and one maximum equal to +1 in the interval [0,2π], and the area below the X-axis equal to the area above the X-axis. It must have some symmetry. The waves used here are defined as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8392809497?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8392809497?profile=RESIZE_710x" width="500" class="align-full"/></a></p>
<p>For the cosine wave, the Taylor series for <em>ϕ</em> is discussed <a href="https://mathoverflow.net/questions/380308/about-the-coefficients-of-taylor-series-for-the-complex-riemann-zeta-function" target="_blank" rel="noopener">here</a>, while the representation as an infinite product is discussed <a href="https://mathoverflow.net/questions/380327/infinite-products-for-linear-combinations-of-sines-or-cosines" target="_blank" rel="noopener">here</a>.</p>
<p><span style="font-size: 14pt;"><strong>2. Other interesting visualizations</strong></span></p>
<p>The orbit for the standard RH case has been published countless time for <em>σ</em> = 0.5. In that case, there is no eye as the orbit crosses the origin infinitely many times. Some videos about the orbit trajectory have been posted on You Tube and viewed millions of times. Below is one of them. </p>
<p></p>
<p><iframe width="640" height="360" src="https://www.youtube.com/embed/zlm1aajH6gY?wmode=opaque" frameborder="0" allowfullscreen=""></iframe>
</p>
<p></p>
<p>Other popular visualizations include the time series for <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) and <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) when <em>σ</em> = 0.5. Below (Figure 2) is a version of mine, for <em>σ</em> = 0.75 and 0 < <em>t</em> < 600. Not only it displays the time series for the cosine wave (standard RH case) but also for the triangular wave, for the first time ever. The blue curve corresponds to <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>), the orange one to <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8392886055?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8392886055?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong><em>: Time series for ϕ<span style="font-size: 8pt;">1</span>(σ, t) and ϕ<span style="font-size: 8pt;">2</span>(σ, t) when σ = 0.75</em></p>
<p>It is interesting to notes that the peaks and valley floors of the triangular and cosine wave frameworks seem to be correlated, occurring at similar times. What's more, for the cosine wave, when a zero of the blue curve is close to a zero of the orange curve (that is then these curves cross the X-axis at a similar time), the zero of the orange curve occurs first. This seems to be true too for the triangular wave, at least when <em>t</em> < 600.</p>
<p><span style="font-size: 14pt;"><strong>3. Generalization and source code</strong></span></p>
<p><span>The Perl source code is available <a href="https://storage.ning.com/topology/rest/1.0/file/get/8393110255?profile=original" target="_blank" rel="noopener">here</a>. Note that convergence is very slow, as discussed <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">in my previous article</a>. A table of the first 100,000 zeros of <em>ζ</em>(<em>s</em>) can be found <a href="http://www.dtc.umn.edu/~odlyzko/zeta_tables/index.html" target="_blank" rel="noopener">here</a>. More general results are available <a href="https://mathoverflow.net/questions/380762/some-properties-of-special-dirichlet-series-connection-to-riemann-hypothesis" target="_blank" rel="noopener">here</a>. In short, if 0.5 < <em>σ </em> < 1, the hole around the origin (pictured in Figure 1) is also present in the following case. Let's define </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8409714885?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8409714885?profile=RESIZE_710x" width="380" class="align-center"/></a></span></p>
<p><span>together with <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) = 1 + <em>ϕ</em>(<em>σ</em>, <em>μ</em>, <em>t</em>; <em>α</em>, <em>β</em>, <em>γ</em>) with<em> γ </em>= 0, and <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em>(<em>σ</em>, <em>μ</em>, <em>t</em>; <em>α</em>, <em>β</em>, <em>γ</em>) with <em>γ </em>= -π/2. Then we still have a hole around the origin. That hole persists even if <em>σ</em> = 0.5, unless <em>μ</em> = 0. Here <em>μ</em>, <em>σ</em> are fixed but arbitrary, <em>λ</em>(<em>n</em>) = log <em>n</em>, and <em>α </em>= 0, <em>β </em>= 1; only <em>t</em> varies. It has been tested only for <em>W</em>(<em>x</em>) = cos <em>x</em>, and when 0 < <em>t</em> < 200.</span></p>
<p><strong>Exercise 1</strong></p>
<p>Show (numerically) that the cross-correlation between <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ, t</em>) and <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ, t</em>) is apparently zero, for the cosine wave <em>W</em>(<em>x</em>) = cos <em>x</em>. However, if you shift the orange curve in Figure 2, replacing <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ, t</em>) by <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ, t</em> +<em> τ</em>), the correlation may no longer be zero. Find <span><em>τ</em> (numerically) that maximizes the cross-correlation in question. </span></p>
<p><strong>Exercise 2</strong> </p>
<p>Prove that if <em>ζ</em>(<em>s</em>) = 0, with <em>s</em> = <em>σ</em> + <em>it</em> and 0 < <em>σ</em> < 1 then for all real <em>θ</em>, we have</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8400519688?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8400519688?profile=RESIZE_710x" width="250" class="align-center"/></a></p>
<p>See answer <a href="https://mathoverflow.net/questions/380577/on-some-property-of-the-zeros-of-zetas-in-the-complex-plane/" target="_blank" rel="noopener">here</a>. </p>
<p><strong>Exercise 3</strong></p>
<p>Prove that the centroid of the orbits pictured in Figure 1, is always (<em>W</em>(0), <em>W</em>(<span>-π/2)</span>). This is true for the cosine, triangular and alternate square waves. <strong>Hint</strong>: The integral of <em>W</em>(<em>x</em>) between <em>x</em> = 0 and <em>x</em> = 2<span>π (the period) is always zero. The coordinates of the centroid are </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8409760490?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8409760490?profile=RESIZE_710x" width="450" class="align-center"/></a></span></p>
<p>Since <em>ϕ</em><span style="font-size: 8pt;">1</span>, <em>ϕ</em><span><span style="font-size: 8pt;">2</span> are defined as infinite sums, swap the integral and sum operators, then proceed to the computation. The integral vanishes for all the terms in both series, except for the first one where it is equal to <em>W</em>(0) and <em>W</em>(-π/2), respectively for <em>ϕ</em><span style="font-size: 8pt;">1</span> and <em>ϕ<span style="font-size: 8pt;">2</span></em>.</span></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
Spectacular Visualization: The Eye of the Riemann Zeta Function
tag:www.datasciencecentral.com,2021-01-02:6448529:BlogPost:1006966
2021-01-02T20:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>We discuss here one of the most famous unsolved mathematical conjectures of all times, one among seven that has a $1 million award attached to it, see <a href="https://en.wikipedia.org/wiki/Millennium_Prize_Problems" rel="noopener" target="_blank">here</a>. It is known as the <a href="https://en.wikipedia.org/wiki/Riemann_hypothesis" rel="noopener" target="_blank">Riemann Hypothesis</a> and abbreviated as RH. Of course I did not solve it (yet), but the material presented here offers a new…</p>
<p>We discuss here one of the most famous unsolved mathematical conjectures of all times, one among seven that has a $1 million award attached to it, see <a href="https://en.wikipedia.org/wiki/Millennium_Prize_Problems" target="_blank" rel="noopener">here</a>. It is known as the <a href="https://en.wikipedia.org/wiki/Riemann_hypothesis" target="_blank" rel="noopener">Riemann Hypothesis</a> and abbreviated as RH. Of course I did not solve it (yet), but the material presented here offers a new path towards making significant progress. As usual, I wrote this article in such a way as to make it understandable by a large audience. You don't need to know more than relatively simple calculus to read it, and you don't even need to know anything about <a href="https://en.wikipedia.org/wiki/Complex_analysis" target="_blank" rel="noopener"></a><a href="https://en.wikipedia.org/wiki/Complex_analysis" target="_blank" rel="noopener">complex analysis</a>: I did the heavy lifting for you.</p>
<p>This is a typical illustration of experimental math blended with data science techniques, resulting in visualizations that provide great actionable insights. It is my hope that after reading this article, you will be tempted to further explore RH, create even better visualizations about it, and find new insights. The techniques used here apply to many other problems, including serious business analytics. </p>
<p><span style="font-size: 14pt;"><strong>1. The problem </strong></span></p>
<p>The Riemann hypothesis, dating back to 1859, states that the zeta function <em>ζ</em>(<em>s</em>), with <em>s</em> = <span><em>σ</em> </span>+ <em>it</em> a complex number (the letter <em>i</em> denoting the imaginary complex unit), has no zero in the critical strip 0 < <em>σ</em> < 1. If proved, it would have a profound impact not just in number theory, but in many other areas of mathematics and beyond. In layman's terms, it can be re-formulated as follows. </p>
<p>Let us introduce a parametric family of real-valued functions, defined as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8375731288?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8375731288?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>with 0 < <em>σ</em> < 1, <em>t</em> a real number, <em>α</em>, <em>β</em>, <em>γ</em> three real parameters, and <em>λ</em>(⋅) a real-valued function with logarithmic growth. Elementary computations show that <em>s</em> = <em>σ</em> + <em>it</em> is a complex root (also called <em>zero</em>) of <em>ζ</em>(<em>s</em>), with 0 < <em>σ</em> < 1, if and only if</p>
<ul>
<li><em>ϕ</em>(<em>σ</em>, <em>t</em>; 0, 1, 0) = 0,</li>
<li><em>ϕ</em>(<em>σ</em>, <em>t</em>; 0, 1, −π/2) = 0,</li>
<li><em>λ</em>(<em>n</em>) = log(n).</li>
</ul>
<p>For details about this formulation, see <a href="https://mathoverflow.net/questions/379650/more-mysteries-about-the-zeros-of-the-riemann-zeta-function" target="_blank" rel="noopener">here</a>. Moving forward, we will focus on RH as being a problem of finding the zeroes (or lack of) of a bivariate function in the standard plane: <em>σ</em> is the first variable, attached to the X-axis, and <em>t</em> is the second variable, attached o the Y-axis. A generalized version of RH seems to also be true: it corresponds to arbitrary values for <em>α</em>, <em>β</em>, <em>γ</em>. However we focus here on the classical RH. For ease of presentation, we use the following notation:</p>
<ul>
<li><em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em>(<em>σ</em>, <em>t</em>; 0, 1, 0)</li>
<li><em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em>(<em>σ</em>, <em>t</em>; 0, 1,−π/2 )</li>
</ul>
<p>Much of the discussion has to do with the orbit of (<em>ϕ</em><span><span style="font-size: 8pt;">1</span>, <em>ϕ</em><span style="font-size: 8pt;">2</span></span>) when <em>σ</em> is fixed but arbitrary, and only <em>t</em> is allowed to vary. The orbit consists of all the points (<em>X</em>(<em>t</em>), <em>Y</em>(<em>t</em>)) with <em>X</em>(<em>t</em>) = <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) and <em>Y</em>(<em>t</em>) = <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>). In short, we are dealing with a bivariate time series in continuous time, with strong cross-correlations between <em>X</em>(<em>t</em>) and <em>Y</em>(<em>t</em>). Without loss of generality, we assume that <em>t</em> is positive. The spectacular plot shown in section 2 is just a scatterplot of the orbit, computed for <em>σ</em> = 0.75<em>.</em> It easily generalizes to other values of <em>σ</em> that are strictly greater than 0.5. </p>
<p><span style="font-size: 14pt;"><strong>2. The visualization</strong></span></p>
<p>I call the plot below the <em>Eye of the Zeta Function</em>. It is the scatter plot described in the last paragraph in section 1, and probably the first time that such a plot was created for the Riemann zeta function. It corresponds to <em>σ </em>= 0.75, with <em>t</em> between 0 and 3,000, with <em>t</em> increments equal to 0.01. Thus 300,000 points of the orbit are displayed here. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8375847301?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8375847301?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>The spectacular feature in that plot is the hole around (0, 0). It has deep implications. It suggests that if <em>σ</em> = 0.75, not only <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) and <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) can not be simultaneously equal to zero (this is a particular case of RH, nothing new here), but most importantly, that it never jointly gets very close to zero. This is new and suggests that proving RH might be a little less challenging than initially thought. The same plot features a similar "eye" if you try various values of <em>σ</em>. In particular, the hole gets smaller and smaller as <em>σ</em> gets closer to 0.5. At <em>σ</em> = 0.5, the hole is entirely gone, and infinitely many values of <em>t</em> yield <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) = 0. The same is true for a generalized version of RH discussed in section 1. </p>
<p>Note that it is very tricky to get the scatterplot right. The series for <em>ϕ</em><span><span style="font-size: 8pt;">1</span> and <em>ϕ</em><span style="font-size: 8pt;">2</span> converge very slowly, and in chaotic, unpredictable way, </span>see <a href="https://mathoverflow.net/questions/379650/more-mysteries-about-the-zeros-of-the-riemann-zeta-function/380174#380174" target="_blank" rel="noopener">here</a>. This can result in false positives: points very close to zero due to approximation errors, artificially obfuscating the hole. Convergence boosting techniques are required, see <a href="https://www.datasciencecentral.com/profiles/blogs/simple-trick-to-dramatically-improve-speed-of-convergence" target="_blank" rel="noopener">here</a>. In addition, the frequency of oscillations in <em>ϕ</em><span><span style="font-size: 8pt;">1</span> and <em>ϕ</em><span style="font-size: 8pt;">2</span> increases more and more as <em>t</em> gets larger, and thus <em>t</em> increments should be made smaller and smaller accordingly, as <em>t</em> grows, in order to get a good coverage of the orbit and not miss potential true zeroes.</span></p>
<p>More plots can be found <a href="https://mathoverflow.net/questions/379650/more-mysteries-about-the-zeros-of-the-riemann-zeta-function" target="_blank" rel="noopener">here</a>. One (unpublished yet) is even more spectacular, though esthetically speaking, it looks just like a boring ring. I computed the approximation error (<em>E</em><span style="font-size: 8pt;">1</span>(<span style="font-size: 12pt;"><em>t</em></span>), <em>E</em><span style="font-size: 8pt;">2</span>(<span style="font-size: 12pt;"><em>t</em></span>)) when you use only the first 200 terms in the series defining <em>ϕ</em><span><span style="font-size: 8pt;">1</span> and <em>ϕ</em><span style="font-size: 8pt;">2</span>. If <span style="font-size: 12pt;"><em>t</em></span> < 300, these points are located on a very thin ring very close to 0. Their distribution thus has a strong pattern, making it possibly even less challenging to prove that if <em>σ</em> = 0.75, then the Riemann Zeta function has no zero with <em>t</em> in [0, 300]. The pattern quickly disappears if <em>t</em> is larger, but you can still retrieve it by increasing the number of terms that you use in your approximation, allowing you to identify an even bigger zero-free zone in the critical strip. Proving it is zero-free even narrowed down to these zones, would still remain a big challenge though. </span></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
Opening a New Restaurant in Covid Times
tag:www.datasciencecentral.com,2020-12-23:6448529:BlogPost:1005865
2020-12-23T06:44:07.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>I am a data scientist, and decided to open a restaurant last November, 10 days before the governor in my state banned dining-in (who knows for how long) and customers were already rare. Some data scientists in managerial positions dream about exiting the corporate world and envied me, at least before the Covid, when I told them my plan.</p>
<p>Here I explore the options and opportunities available, and this article reflects my optimism. I will also discuss analytics in some detail. The…</p>
<p>I am a data scientist, and decided to open a restaurant last November, 10 days before the governor in my state banned dining-in (who knows for how long) and customers were already rare. Some data scientists in managerial positions dream about exiting the corporate world and envied me, at least before the Covid, when I told them my plan.</p>
<p>Here I explore the options and opportunities available, and this article reflects my optimism. I will also discuss analytics in some detail. The reasons for opening a restaurant are varied, and in my case I saw the opportunity in a wealthy town with many foodies, mostly retired from companies such as Amazon, Boeing or Microsoft, who left the Seattle area to live on a little island where the pace of living is much slower, roads are not clogged with commuters, and the landscape is beautiful: Anacortes in Fidalgo Island, next to the San Juan islands, in the Pacific Northwest. Despite being next to the ocean, not a single restaurant offers fresh oysters or crab, there is no great restaurant, and if anything (after selling my company) I thought I would open a restaurant at least so that there is a dining venue I really love in Anacortes. I knew from the very beginning that we would fill a void, and that there was no competition.</p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8322989256?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8322989256?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><em>Our outdoor seating</em></p>
<p>After chasing locations throughout the Puget Sound without any luck, I found by chance the perfect spot in the very heart of historical downtown Anacortes. The landlord did not want a franchise, a chain, and even turned down a bank. Rent here is 3 times cheaper than in Seattle, and hourly rates for restaurant workers are much lower too, though it is impossible to find qualified people to serve fine cuisine (you must train them). We were lucky to find a great chef who worked in great restaurants in Seattle and left the city years ago for the same reasons that I did. We are also very close to farmers, and all our food comes from local farmers. Not exactly cheap, but people are willing to pay a bit more for fresh local ingredients - this is a long-lasting trend in this industry.</p>
<p>We agreed on a few statistics: food cost should be 1/3 of revenue, staff another 1/3, and 15-20% of the revenue going towards rent, utilities, insurances, etc. Now with Covid, we are operating at a controlled loss probably for the next three months, but we are on the path to success. Rather than closing for three months like plenty of restaurants do, we though we should take advantage of this to develop our brand and becoming known -- and stay open despite the extra cost. We also decided to stop expensive construction on the second floor, and instead focus on heated outdoor dining and cheaper solutions that have a direct positive impact. At the end of the construction stage, we even looked at purchasing used appliances, rather than brand new.</p>
<p>Despite having no experience in the restaurant industry, I am a foodie with tremendous experience as a customer. In particular, I told what the prices should be, given the town we are in and the kind of food we serve. The Chef focused on dishes where he could meet the goal of 1/3 of revenue spent on food (that is, a dish sold for $18 costs $6 in ingredients on average), with waste optimization also being a goal (for instance, unsold fresh oysters served as baked oysters the next day). I even purchased some ingredients myself such as excellent Islandic caviar 10 times cheaper than Beluga. People coming from the big city 90 miles South consider our restaurant as inexpensive, and capable of successfully competing with hip restaurants in Seattle if we were located in that town.</p>
<p><strong>Original ideas to succeed</strong></p>
<p>Here are some concepts that we embraced:</p>
<ul>
<li>Having a little retail store within the restaurant, selling home-made preparations made by the chef, and wines</li>
<li>Opening a wine club with paid membership</li>
<li>Using the second floor for storage, for the retail store, rather than for dine-in</li>
<li>Opening the patio in the back, the heated tent on the front street, and some other space outside to maximize occupancy</li>
<li>Discontinuing breakfast except weekends, due to negative ROI</li>
<li>Creating our home deliver service to be more affordable than Doordash</li>
<li>Organizing our menu items in such a way as to optimize revenue (by displaying best sellers at the top, revenue increased 5 times on Doordash)</li>
<li>Being the only European restaurant in the county</li>
<li>Using pictures of our dishes when posting on social networks, as well as on our website</li>
<li>Offering family meals to go, serving 2 or 4 people</li>
<li>Partnering with grocery stores to sell our products</li>
<li>Having weekly specials that we can announce in social networks and via our fast-growing mailing list, to keep customers returning</li>
<li>Serving the right size, that is less than the average restaurant, along with small dishes, in plates that are not as large as in many restaurants (this reduces waste and we can lower our prices accordingly)</li>
</ul>
<p><strong>Marketing and advertising</strong></p>
<p>We are present and very active on all local Facebook groups, including <a href="https://www.facebook.com/parisrestaurantandbar/" target="_blank" rel="noopener">our Facebook page</a> and the <a href="https://www.facebook.com/groups/424272282275831/" target="_blank" rel="noopener">Skagit Restaurant page</a> that we created for all restaurants in our county. Since our menu has new additions every week, we can post original content all the time. Many people in town use Facebook, thus this is our favorite platform. We also advertise with them. </p>
<p>We created our newsletter, growing to 500 subscribers in a month. Much of our advertising on Google is geared towards growing the newsletter. We are working on a blog (the first article will be <em>10 tips to help your favorite restaurant</em> applicable to any restaurant, we hope it will go viral) and in the long term, we plan on selling recipes from our chef on the website. Finally, as we grow, we plan on using the outdoor tent from our restaurant neighbors, when they are closed. We may even serve Tequila from our neighbor (Mexican restaurant) with revenue on hard liquor going directly to them, if we use their tent. </p>
<p>Advertising on Yelp was a failure, and we noticed and stopped it. Yelp clearly does not help its advertisers regarding reviews (a good thing) but it eliminates reviews randomly, good or bad, with their supposedly smart machine learning algorithm. Maybe to force us to advertise more? Phone calls coming from Yelp advertising were rarely a local number (unlike calls originating from Google ads), and lasted 2 seconds. Not different from click fraud. We are happy that Yelp represents less than 2% of our traffic, as we tried very hard to build our audience organically and via word of mouth thanks to the excellent and original food that we serve. </p>
<p>We also invited our partners (local farmers, accountant etc.) for a free diner during the short window of time when dining-in was allowed. The meal was free, but not the wine. We also plan on having our brochure distributed in all the local hotels, and maybe advertise our restaurant on all the receipts people get when they go shopping to a grocery store. </p>
<p><strong>The results</strong></p>
<p>The last few days have seen revenue growing fast to the point that we will probably operate at a loss for much less than 3 months, beating the expectations. And before Thanksgiving when dining-in was allowed, it was clear that we would be successful, being almost profitable while operating at 25% capacity.</p>
<p>You can find us at <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">ParisRestaurantAndBar.com</a>. </p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8322990662?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8322990662?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
Amazing Things You Did Not Know You Could Do in Excel
tag:www.datasciencecentral.com,2020-12-17:6448529:BlogPost:1005404
2020-12-17T05:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>I have included a lot of Excel spreadsheets in the numerous articles and books that I have written in the last 10 years, based either on real life problems or simulations to test algorithms, and featuring various machine learning techniques. It is time to create a new blog series focusing on these useful techniques that can easily be handled with Excel. Data scientists typically use programming languages and other visual tools for these techniques, mostly because they are unaware that it can…</p>
<p>I have included a lot of Excel spreadsheets in the numerous articles and books that I have written in the last 10 years, based either on real life problems or simulations to test algorithms, and featuring various machine learning techniques. It is time to create a new blog series focusing on these useful techniques that can easily be handled with Excel. Data scientists typically use programming languages and other visual tools for these techniques, mostly because they are unaware that it can be accomplished with Excel alone. This article is my first one in this new series. The series will appeal to BI analysts, managers presenting insights to decision makers, as well as software engineers or MBA people who do not have a strong data science background. It can also be used as a starting point to learn data science and machine learning, by first solving problems in Excel, before discovering Excel's limitations and then move to programming languages or AI-based automated coding. </p>
<p>Many of the techniques presented in my spreadsheets are data-driven (as opposed to model-driven), robust, simple yet efficient, sometimes entirely novel, and do not lead to problems such as over-fitting or numerical instability. Even in the absence of statistical models, confidence intervals can still be built - even in Excel - and are more intuitive and easy to understand than traditional ones. See my previous article <a href="https://www.datasciencecentral.com/profiles/blogs/introducing-an-all-purpose-robust-fast-simple-non-linear-r22" target="_blank" rel="noopener">here</a> on general regression, as an example. That article also features traditional regression performed with the not well-known Excel built-in function LINEREST; with a simple transformation, it could be used for logistic regression. Also, my spreadsheets are just basic Excel, without using special Excel libraries or add-ins, and are thus accessible to everyone. </p>
<p>In this first blog, I show you how to simulate clustered data and display it with multi-groups scatterplots, things that I used to do with R in the past.</p>
<p><strong>Excel scatterplots in clustering contexts</strong></p>
<p>The pictures below represents a simulation of clustered data: 177 two-dimensional data points spread across three clusters.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8296711259?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8296711259?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1:</strong> <em>Well separated clusters</em></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8296711493?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8296711493?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2:</strong> <em>Overlapping clusters</em></p>
<p>The spreadsheet used to produce these charts is interactive, and you can play with it to generate more clusters, fine-tune the level of overlapping, and to test various clustering algorithms on the simulated data that you create, using cross-validation techniques, to see how they perform. The points, within each of the three groups, are radially distributed around a center. That is, a random point (<em>X</em>, <em>Y</em>) in group #1, assuming the center of that group - also randomly distributed - is (<em>X</em><span style="font-size: 8pt;">1</span>, <em>Y</em><span style="font-size: 8pt;">1</span>) is generated as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8296725669?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8296725669?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>Here, different random deviates <span><em>ρ</em>, <em>θ</em> uniformly distributed on [0, 1] are used for each (<em>X</em>, <em>Y</em>) using the function RAND in Excel, and the constant <em>α</em><span style="font-size: 8pt;">1</span> is fixed for all points in group #1. In the spreadsheet, the three centers are uniformly distributed on [0, 1] x [0, 1], and <em>α</em><span style="font-size: 8pt;">1</span>, <em>α</em><span style="font-size: 8pt;">2</span>, <em>α</em><span style="font-size: 8pt;">3</span> are set to 1/3. </span></p>
<p><span>The scatterplots are produced using the scatter graph in Excel, applied to data separated in three groups as illustrated in the screenshot below. For group #1, point coordinates (<em>X</em>, <em>Y</em>) are stored in the first and second column respectively. For group #2, it's in the first and third column, and for group #3, it is in the first and fourth column as illustrated below.</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8296773488?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8296773488?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3:</strong> <em>Organizing the data in Excel to produce the scatterplots</em></p>
<p>The spreadsheet is available for download, <a href="https://storage.ning.com/topology/rest/1.0/file/get/8296781485?profile=original" target="_blank" rel="noopener">here</a> (<strong>scatter-cluster.xlsx</strong>). See also one of my previous spreadsheets to automatically detect the number of clusters, from one of my past articles, <a href="https://www.datasciencecentral.com/profiles/blogs/how-to-automatically-determine-the-number-of-clusters-in-your-dat" target="_blank" rel="noopener">here</a> (<strong>elbow.xlsx</strong>, in the the section <em>Elbow Strength with spreadsheet illustration</em>). Finally, many spreadsheets are available for download, from my most recent book <em>Statistics: new foundations, toolkit, and machine learning recipes</em>, <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">here</a>. Some of them even perform NLP algorithms.</p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
All-purpose, Robust, Fast, Simple Non-linear Regression
tag:www.datasciencecentral.com,2020-12-16:6448529:BlogPost:1005166
2020-12-16T18:22:17.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><strong>Announcements</strong></p>
<ul>
<li>Watch APEXX W3: <strong>The Data Science Workstation</strong>, and learn how an NVIDIA-certified BOXX workstation can accelerate your workflow. <a href="https://bit.ly/33Suwni" rel="noopener" target="_blank">Access video here</a>. </li>
<li>Use real-time anomaly detection reference patterns to combat fraud | Google. <a href="http://dsc.news/3gShgUZ" rel="noopener" target="_blank">Read full article</a>.</li>
<li>Merrimack College offers three online…</li>
</ul>
<p><strong>Announcements</strong></p>
<ul>
<li>Watch APEXX W3: <strong>The Data Science Workstation</strong>, and learn how an NVIDIA-certified BOXX workstation can accelerate your workflow. <a href="https://bit.ly/33Suwni" target="_blank" rel="noopener">Access video here</a>. </li>
<li>Use real-time anomaly detection reference patterns to combat fraud | Google. <a href="http://dsc.news/3gShgUZ" target="_blank" rel="noopener">Read full article</a>.</li>
<li>Merrimack College offers three online master's degrees in data science, business analytics, or healthcare analytics – all designed to accommodate working professionals and developed and taught by industry experts. Gain a deeper understanding of data visualization, statistical analysis, machine learning, and business strategy to deliver data-driven insights that impact real-world decisions. <a href="http://dsc.news/34kc07x" target="_blank" rel="noopener">Learn more here</a>. </li>
</ul>
<p><strong>All-purpose, Robust, Fast, Simple Non-linear Regression</strong></p>
<p><span>The model-free, data-driven technique discussed here is so basic that it can easily be implemented in Excel, and we actually provide an Excel implementation. It is surprising that this technique does not pre-date standard linear regression, and is rarely if ever used by statisticians and data scientists. It is related to kriging and nearest neighbor interpolation, and apparently first mentioned in 1965 by Harvard scientists working on GIS (geographic information systems). It was referred back then as Shepard's method or inverse distance weighting, and used for multivariate interpolation on non-regular grids</span><span>. We call this technique </span><em>simple regression</em><span>. Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/introducing-an-all-purpose-robust-fast-simple-non-linear-r22" target="_blank" rel="noopener">here</a>. </span></p>
<p></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8295194880?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8295194880?profile=RESIZE_710x" width="400" class="align-center"/></a></span></p>
<p><span style="font-size: 8pt;">This email, and all related content, is published by Data Science Central, a division of <a href="https://www.techtarget.com/" target="_blank" rel="noopener noreferrer">TechTarget, Inc</a>.<br/><span class="_2HwZTce1zKwQJyzgqXpmAy">275 Grove Street, Newton, Massachusetts, 02466</span> US</span></p>
<p><span style="font-size: 8pt;">You are receiving this email because you are a member of TechTarget. When you access content from this email, your information may be shared with the sponsors or future sponsors of that content and with our Partners, see up-to-date <a href="https://www.techtarget.com/privacy-partners" target="_blank" rel="noopener noreferrer">Partners List</a> below, as described in our <a href="https://www.techtarget.com/privacy-policy" target="_blank" rel="noopener noreferrer">Privacy Policy</a>. For additional assistance, please contact: <a href="mailto:webmaster@techtarget.com" target="_blank" rel="noopener noreferrer"></a></span><span style="font-size: 8pt;"><a href="mailto:webmaster@techtarget.com" target="_blank" rel="noopener noreferrer">webmaster@techtarget.com</a></span></p>
<p><span style="font-size: 8pt;">© 2020 TechTarget, Inc. all rights reserved. Designated trademarks, brands, logos and service marks are the property of their respective owners.<br/><a href="https://www.techtarget.com/privacy-policy" target="_blank" rel="noopener noreferrer">Privacy Policy</a> | <a href="https://www.techtarget.com/privacy-partners" target="_blank" rel="noopener noreferrer">Partners List</a></span></p>
New Tests of Randomness and Independence for Sequences of Observations
tag:www.datasciencecentral.com,2020-12-03:6448529:BlogPost:1004429
2020-12-03T01:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>There is no statistical test that assesses whether a sequence of observations, time series, or residuals in a regression model, exhibits independence or not. Typically, what data scientists do is to look at auto-correlations and see whether they are close enough to zero. If the data follows a Gaussian distribution, then absence of auto-correlations implies independence. Here however, we are dealing with non-Gaussian observations. The setting is similar to testing whether a pseudo-random…</p>
<p>There is no statistical test that assesses whether a sequence of observations, time series, or residuals in a regression model, exhibits independence or not. Typically, what data scientists do is to look at auto-correlations and see whether they are close enough to zero. If the data follows a Gaussian distribution, then absence of auto-correlations implies independence. Here however, we are dealing with non-Gaussian observations. The setting is similar to testing whether a pseudo-random number generator is random enough, or whether the digits of a number such as <span>π </span>behave in a way that looks random, even though the sequence of digits is deterministic. Batteries of statistical tests are available to address this problem, but there is no one-fit-all solution.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242402469?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242402469?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>Here we propose a new approach. Likewise, it is not a panacea, but rather a set of additional powerful tools to help test for independence and randomness. The data sets under consideration are specific mathematical sequences, some of which are known to exhibit independence / randomness or not. Thus, it constitutes a good setting to benchmark and compare various statistical tests and see how well they perform. This kind of data is also more natural and looks more real than synthetic data obtained via simulations. </p>
<p><span style="font-size: 14pt;"><strong>1. Definition of random-like sequences</strong></span></p>
<p>Since we are dealing with deterministic sequences (<em>x<span style="font-size: 8pt;">n</span></em>) indexed by <em>n</em> = 1, 2, and so on, it is worth defining what we mean by <em>independence</em> and <em>random-like</em>. These two elementary concepts are very intuitive, but a formal definition may help. You may skip this section if you have an intuitive understanding of the concepts in question, as the layman does. Independence in this context is sometimes called <em>asymptotic independence</em>, see <a href="https://mathoverflow.net/questions/372103/recursive-random-number-generator-based-on-irrational-numbers/" target="_blank" rel="noopener">here</a>. Also, for all the sequences investigated here, <em>x<span style="font-size: 8pt;">n</span></em> ∈ [0,1].</p>
<p><strong>1.1. Definition of random-like and independence</strong></p>
<p>A sequence (<em>x<span style="font-size: 8pt;">n</span></em>) with <em>x<span style="font-size: 8pt;">n</span></em> ∈ [0,1] is <em>random-like</em> if it satisfies the following property. For any finite index family <em>h</em><span style="font-size: 8pt;">1</span>,…, <em>h<span style="font-size: 8pt;">k</span></em> and for any <span style="font-size: 12pt;"><em>t<span style="font-size: 8pt;">1</span></em></span>,…, <em>t<span style="font-size: 8pt;">k</span></em> ∈ [0,1], we have </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8238499286?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8238499286?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>The probabilities are empirical probabilities, that is, based on frequency counts. For instance,</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8238501465?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8238501465?profile=RESIZE_710x" width="450" class="align-center"/></a></p>
<p>where χ(<em>A</em>) is the indicator function (equal to 1 if the event <em>A</em> is true, and equal to 0 otherwise). Random-like implies independence, but the converse is not true. A sequence is <em>independently distributed</em> if it satisfies the weaker property </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8238506260?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8238506260?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>Random-like means that the <em>x<span style="font-size: 8pt;">n</span></em>'s all have the same underlying uniform distribution on [0, 1], and are independently distributed. </p>
<p><strong>1.2. Definition of lag-<em>k</em> autocorrelation</strong></p>
<p>Again, this is just the standard definition of auto-correlations, but applied to infinite deterministic sequences. The lag-<em>k</em> auto-correlation ρ<span style="font-size: 8pt;"><em>k</em></span> is defined as follows. First define ρ<span style="font-size: 8pt;"><em>k</em></span>(<em>n</em>) as the empirical correlation between (<em>x</em><span style="font-size: 8pt;">1</span>,…, <em>x<span style="font-size: 8pt;">n</span></em>) and (<em>x<span style="font-size: 8pt;">k</span></em><span style="font-size: 8pt;">+1</span>,… ,<em>x<span style="font-size: 8pt;">k</span></em><span style="font-size: 8pt;">+<em>n</em></span>). Then ρ<span style="font-size: 8pt;"><em>k</em></span> is the limit (if it exists) of ρ<span style="font-size: 8pt;"><em>k</em></span>(<span style="font-size: 12pt;"><em>n</em></span>) as <em>n</em> tends to infinity. </p>
<p><strong>1.3. Equidistribution and fractional part denoted as { }</strong></p>
<p>The fractional part of a positive real number <em>x</em> is denoted as { <em>x</em> }. For instance, { 3.141592 } = 0.141592. The sequences investigated here come from number theory. In that context, concepts such as random-like and identically distributed are rarely used. Instead, mathematicians rely on the weaker concept of <em>equidistribution</em>, also called equidistribution modulo 1. Closer to independence is the concept of equidistribution in higher dimensions, for instance if two successive values (<em>x<span style="font-size: 8pt;">n</span></em>, <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span>) are equidistributed on [0, 1] x [0, 1].</p>
<p>A sequence can be equidistributed yet exhibits strong auto-correlations. The most famous example is the sequence <em>x<span style="font-size: 8pt;">n</span></em> = { <em>αn</em> } where <em>α</em> is a positive irrational number. While equidistributed, it has strong lag-<em>k</em> auto-correlations for every strictly positive integer <em>k</em>, and it is anything but random-like. A sequence that looks perfectly random-like is the digits of <span>π</span>: they can not be distinguished from a realization of a perfect <a href="https://en.wikipedia.org/wiki/Bernoulli_process" target="_blank" rel="noopener">Bernouilli process</a>. Such random-like sequences are very useful in cryptographic applications.</p>
<p><span style="font-size: 14pt;"><strong>2. Testing well-known sequences</strong></span> </p>
<p>The sequences we are interested in are <em>x<span style="font-size: 8pt;">n</span></em> = { <em>α n</em>^<em>p</em> }<b> </b> where { } is the fractional part function (see section 1.3), <em>p</em> > 1 is a real number and <em>α</em> is a positive irrational number. Other sequences are discussed in section 3. It is well known that these sequences are equidistributed. Also, if <em>p</em> = 1, these sequences are highly auto-correlated and thus the terms <em>x<span style="font-size: 8pt;">n</span></em>'s are not independently distributed, much less random-like; the exact theoretical lag-<em>k</em> auto-correlations are known. The question here is what happens if <em>p</em> > 1. It seems that in that case, there is much more randomness. In this section, we explore three statistical tests (including a new one) to assess how random these sequences can be depending on the parameters <em>p</em> and <em>α</em>. The theoretical answer to that question is known, thus this provides a good case study to check how various statistical tests perform to detect randomness, or lack of it.</p>
<p><strong>2.1. The gap test</strong></p>
<p>The gap test (some people may call it run test) proceeds as follows. Let us define the binary digit <em>d<span style="font-size: 8pt;">n</span></em> as <em>d<span style="font-size: 8pt;">n</span></em> = ⌊2<em>x<span style="font-size: 8pt;">n</span></em>⌋. The brackets represent the integer part function. Say <em>d<span style="font-size: 8pt;">n</span></em> = 0 and <em>d<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1 </span>= 1 for a specific n. If <em>d<span style="font-size: 8pt;">n</span></em> is followed by <em>G</em> successive digits <em>d<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span>,…, <em>d<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>G</em></span> all equal to 1 and then <em>d<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>G</em>+1</span> = 0, we have one instance of a gap of length <em>G</em>. Compute the empirical distribution of these gaps. Assuming 50% of the digits are 0 (this is the case in all our examples), then the empirical gap distribution converges to a geometric distribution of parameter 1/2 if the sequence <em>x<span style="font-size: 8pt;">n</span></em> is random-like.</p>
<p>This is best illustrated in chapter 4 of my book <em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems, </em>available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" target="_blank" rel="noopener">here</a>. </p>
<p><strong>2.2. The collinearity test</strong></p>
<p>Many sequences pass several tests yet fail the collinearity test. This test checks whether there are <em>k</em> constants <em>a</em><span style="font-size: 8pt;">1</span>, ..., <em>a<span style="font-size: 8pt;">k</span></em> with <em>a<span style="font-size: 8pt;">k</span></em> not equal to zero, such that <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k</em></span> = <em>a</em><span style="font-size: 8pt;">1</span> <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k-1</em></span> + <em>a</em><span style="font-size: 8pt;">2</span> <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k</em>-2</span> + ... + <em>a<span style="font-size: 8pt;">k</span></em> <em>x<span style="font-size: 8pt;">n</span></em> takes only on a finite (usually small) number of values. In short, it addresses this question: are <em>k</em> successive values of the sequence <em>x<span style="font-size: 8pt;">n</span></em> always lie (exactly, approximately, or asymptotically) in a finite number of hyperplanes of dimension <em>k</em> - 1? This test has been used to determine that some congruential pseudo-random number generators were of very poor quality, see <a href="https://en.wikipedia.org/wiki/RANDU" target="_blank" rel="noopener">here</a>. It is illustrated in section 3, with <em>k</em> = 2. </p>
<p>Source code and examples for <em>k</em> = 3 can be found <a href="https://mathoverflow.net/questions/372103/recursive-random-number-generator-based-on-irrational-numbers/" target="_blank" rel="noopener">here</a>. </p>
<p><strong>2.3. The independence test</strong></p>
<p>This may be a new test: I could not find any reference to it in the literature. It does not test for full independence, but rather for random-like behavior in small dimensions (<em>k</em> = 2, 3, 4). Beyond <em>k</em> = 4, it becomes somewhat unpractical as it requires a number of observations (that is, the number of computed terms in the sequence) growing exponentially fast with <em>k</em>. However, it is a very intuitive test. It proceeds as follows, for a fixed <em>k</em>:</p>
<ul>
<li>Let <em>N </em> > 100 be an integer</li>
<li>Let <em>T</em> be a <em>k</em>-uple (<em>t</em><span style="font-size: 8pt;">1</span>,..., <em>t<span style="font-size: 8pt;">k</span></em>) with <i>t<span style="font-size: 8pt;">j</span></i><span style="font-size: 8pt;"> </span>∈ [0,1] for <em>j</em> = 1, ..., <em>k.</em></li>
<li>Compute the following two quantities, with χ being the indicator function as in section 1.2:</li>
</ul>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242040856?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242040856?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<ul>
<li>Repeat this computation for <em>M</em> different <em>k</em>-uples randomly selected in the <em>k</em>-dimensional unit hypercube</li>
</ul>
<p>Now plot the <em>M</em> vectors (<em>P<span style="font-size: 8pt;">T</span>, Q<span style="font-size: 8pt;">T</span></em>), each corresponding to a different <em>k</em>-uple, on a scatterplot. Unless the <em>M</em> points lie very close to the main diagonal on the scatterplot, the sequence <em>x<span style="font-size: 8pt;">n</span></em> is not random-like. To see how far away you can be from the main diagonal without violating the random-like assumption, do the same computations for 10 different sequences consisting this time of truly random terms. This will give you a confidence band around the main diagonal, and vectors (<em>P<span style="font-size: 8pt;">T</span>, Q<span style="font-size: 8pt;">T</span></em>) lying outside that band, for the original sequence you are interested in, suggests areas where the randomness assumption is violated. This is illustrated in the picture below, originally posted <a href="https://mathoverflow.net/questions/372103/recursive-random-number-generator-based-on-irrational-numbers/" target="_blank" rel="noopener">here</a>: </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242055058?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242055058?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong></p>
<p>As you can see, there is a strong enough departure from the main diagonal, and the sequence in question (see same reference) is known not to be random-like. The X-axis features <em>P<span style="font-size: 8pt;">T</span></em>, and the Y-axis features <em>Q<span style="font-size: 8pt;">T</span></em>. An example with known random-like behavior, resulting in an almost perfect diagonal, is also featured in the same article. Notice that there are fewer and fewer points as you move towards the upper right corner. The higher <em>k</em>, the more sparse the upper right corner will be. In the above example, <em>k</em> = 3. To address this issue, proceed as follows, stretching the point distribution along the diagonal:</p>
<ul>
<li>Let <em>P*<span style="font-size: 8pt;">T</span></em> = (- 2 log <em>P<span style="font-size: 8pt;">T</span></em>) / <em>k</em> and <em>Q</em>*<span style="font-size: 8pt;"><em>T</em></span> = (- 2 log <em>Q<span style="font-size: 8pt;">T</span></em>) / <em>k</em>. This is a transformation leading to a Gamma(<em>k</em>, 2/<span style="font-size: 10pt;"><em>k</em></span>) distribution. See explanations <a href="https://stats.stackexchange.com/questions/89949/geometric-mean-of-uniform-variables" target="_blank" rel="noopener">here</a>. </li>
<li>Let <em>P</em>**<span style="font-size: 8pt;"><em>T</em></span> = <em>F</em>(<span style="font-size: 12pt;"><em>P</em></span>*<span style="font-size: 8pt;"><em>T</em></span>) and <em>Q</em>**<span style="font-size: 8pt;"><em>T</em></span> = <em>F</em>(<i>Q</i>*<span style="font-size: 8pt;"><em>T</em></span>) where <em>F</em> is the cumulative distribution function of a Gamma(<em>k</em>, 2/<span style="font-size: 10pt;"><em>k</em></span>) random variable.</li>
</ul>
<p>By virtue of the <a href="https://en.wikipedia.org/wiki/Inverse_transform_sampling" target="_blank" rel="noopener">inverse transform sampling theorem</a>, the points (<em>P</em>**<span style="font-size: 8pt;"><em>T</em></span>, <em>Q</em>**<span style="font-size: 8pt;"><em>T</em></span>) are now uniformly stretched along the main diagonal. </p>
<p><span style="font-size: 14pt;"><strong>3. Results and generalization</strong></span></p>
<p>Let's get back to our sequence <em>x<span style="font-size: 8pt;">n</span></em> = { <em>α n</em>^<em>p</em> } with <em>p</em> > 1 and <em>α</em> irrational. Before showing and discussing some charts, I want to discuss a few issues. First, if <em>p</em> is large, machine accuracy will quickly result in erroneous computations for <em>x<span style="font-size: 8pt;">n</span></em>. You need to detect when loss of accuracy becomes a critical problem, usually well below <em>n</em> = 1,000 if <em>p</em> = 5. Working with double precision arithmetic will help. Another issue, if <em>p</em> is close to 1, is the fact that randomness does not kick in until <em>n</em> is large enough. You may have to ignore the first few hundreds terms of the sequence in that case. If <em>p</em> = 1, randomness never occurs. Also, we have assumed that the marginal distributions are uniform on [0, 1]. From the theoretical point of view, they indeed are, and it will show if you compute the empirical percentile distribution of <em>x<span style="font-size: 8pt;">n</span></em>, even in the presence of strong auto-correlations (the reason why is because of the ergodic nature of the sequences in question, but this topic is beyond the scope of the present article). So it would be a good exercise to use various statistical tools or libraries to assess whether they can confirm the uniform distribution assumption.</p>
<p><strong>3.1. Examples</strong></p>
<p>The exact theoretical value of the lag-<em>k</em> auto-correlation is known for all <em>k</em> if <em>p</em> = 1. See section 5.4 in <a href="https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness" target="_blank" rel="noopener">this article</a>. It is almost never equal to zero, but it turns out that if <em>k</em> = 1, <em>p</em> = 1 and <em>α</em> = (3 + SQRT(3))/6, it is indeed equal to zero. Use a statistical package to see if it can detect this fact, or ask your team to do the test. Also, if <em>p</em> is an integer, show (using statistical techniques) that for some <em>a</em><span style="font-size: 8pt;">1</span>, ..., <em>a</em><span style="font-size: 8pt;">k</span>, we have <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k</em></span> = <em>a</em><span style="font-size: 8pt;">1</span> <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k-1</em></span> + <em>a</em><span style="font-size: 8pt;">2</span> <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k</em>-2</span> + ... + <em>a<span style="font-size: 8pt;">k</span></em> <em>x<span style="font-size: 8pt;">n</span></em> takes only on a finite number of values as discussed in section 2.2, and thus, the random-like assumption is always violated. In particular, <em>k</em> = 2 if <em>p</em> = 1. This is also true <em>asymptotically</em> if <em>p</em> is not an integer, see <a href="https://mathoverflow.net/questions/377697/sequences-similar-to-n-alpha-that-are-both-equidistributed-and-truly-rando/377748#377748" target="_blank" rel="noopener">here</a> for details. Yet, if <em>p</em> > 1, the auto-correlations are very close to zero, unlike the case <em>p</em> = 1. But are they truly identical to zero? What about the sequence <em>x<span style="font-size: 8pt;">n</span></em> = { <em>α</em>^<em>n</em> } with say <em>α</em> = log 3? Is it random-like? Nobody knows. Of course, if <em>α</em> = (1 + SQRT(5))/2, that sequence is anything but random, so it depends on <em>α</em>. </p>
<p>Below are three scatterplots showing the distribution of (<em>x<span style="font-size: 8pt;">n</span></em>, <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span>) for a few hundreds value of <em>n</em>, for various <em>α</em> and <em>p</em>, for the sequence <em>x<span style="font-size: 8pt;">n</span></em> = { <em>α</em> <em>n</em>^<em>p</em> }. The X-axis represents <em>x<span style="font-size: 8pt;">n</span></em>, the Y-axis represents <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span>. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242305270?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242305270?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>p = SQRT(7), α = 1</em></p>
<p>Even to the trained naked eye, Figure 2 shows randomness in 2 dimensions. Independence may fail in higher dimensions (k > 2) as the sequence is known not to be random-like. There is no apparent collinearity pattern as discussed in section 2.2, at least for <em>k</em> = 2. Can you run some test to detect lack of randomness in higher dimensions?</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242307701?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242307701?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3</strong>: <em>p = 1.4, α = log 2</em></p>
<p>To the trained naked eye, Figure 3 shows lack of randomness as highlighted in the red band. Can you do a test to confirm this? If the test is inclusive or provide the wrong answer, than the naked eye performs better, in this case, than statistical software.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242319869?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242319869?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 4</strong>: <em>p = 1.1, α = log 2</em></p>
<p>Here (Figure 4) any statistical software and any human being, even the layman, can identify lack of randomness in more than one way. As <em>p</em> gets closer and closer to 1, lack of randomness is obvious, and the collinearity issue discussed in section 1.2, even if fuzzy, becomes more apparent even in two dimensions.</p>
<p><strong>3.2. Independence between two sequences</strong></p>
<p>It is known that if <em>α</em> and <em>β</em> are irrational numbers linearly independent over the set of rational numbers, then the sequences { <em>αn</em> } and { <em>βn</em> } are not correlated, even though each one taken separately is heavily auto-correlated. A sketch proof of this result can be found in the Appendix of <a href="https://www.datasciencecentral.com/profiles/blogs/state-of-the-art-statistical-science-to-address-famous-number-the" target="_blank" rel="noopener">this article</a>. But are they really independent? Test, using statistical software, the absence of correlation if <em>α </em>= log 2 and <em>β</em> = log 3. How would you do to test independence? The methodology presented in section 2.3 can be adapted and used to answer this question empirically (although not theoretically). </p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
Covid-19: My Predictions for 2021
tag:www.datasciencecentral.com,2020-11-30:6448529:BlogPost:1003991
2020-11-30T07:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here I share my predictions as well as personal opinion about the pandemic. My thoughts are not derived from running sophisticated models on vast amounts of data. Much of the data available has major issues anyway, something I am also about to discuss. There are some bad news and some good news. This article discusses what I believe are the good news and bad news, as well a some attempt at explaining people behavior and reactions, and resulting consequences. My opinion is very different from…</p>
<p>Here I share my predictions as well as personal opinion about the pandemic. My thoughts are not derived from running sophisticated models on vast amounts of data. Much of the data available has major issues anyway, something I am also about to discuss. There are some bad news and some good news. This article discusses what I believe are the good news and bad news, as well a some attempt at explaining people behavior and reactions, and resulting consequences. My opinion is very different from what you have read in the news, whatever the political color. Mine has I think, no political color. It offers a different, possibly refreshing perspective to gauge and interpret what is happening.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8230291873?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8230291873?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>I will start by mentioning Belgium, one of the countries with the highest death rate. Very recently, it went from 10,000 deaths to 15,000 in the last wave, in a matter of days. They are back in some lock-down, and the situation has dramatically improved in the last few days. But 15,000 deaths out of 10,000,000 people would translate to 500,000 deaths in US. We are far from there yet. Had they not mandated a new lock-down, killing restaurants and other businesses but keeping schools open along the way, they would probably have 20,000 deaths now, probably quickly peaking at 25,000 before things improve. Now we are comparing apples and oranges. In Belgium, everyone believed to have died from covid was listed as having actually died from the virus even if un-tested. Also, the population density is very high compared to US, and use of public transportation is widespread. Areas with lower population density have initially fewer deaths per 100,000 inhabitants, until complacency eventually creates the same spike.</p>
<p>The bad news is that I think we will surpass 500,000 deaths in US by the end of February. But I don't think we will ever reach 1,000,000 by the end of 2021. A vaccine has been announced for months, but won't be available to the public at large in time: only to some specific groups of people (hospital workers) in the next few months. By the time it will be widely available, we will all have been contaminated / infected and recovered (99.8% of us) or dead (0.2% of us). The vaccine will therefore be useless to curtail the pandemic, which by then will have died out of its own due to lack of new people to infect. It may still be useful for the future, but not to spare the lives of another 300,000 who will have died between now and end of February. </p>
<p>You may wonder: why not imposing a full lock-down until March? Yes this will save many lives but kill many others in what I think is a zero-sum sinister game. Economic destruction, suicide, drug abuse, crime, riots would follow and would be just as bad. And with surge in unemployment and massive losses in tax revenue, I don't think any local or state government has the financial ability to do it, it is just financially unsustainable. So I think lock-downs can only last so long, probably about a month or so maximum. What is likely to happen is more and more people not following un-enforced regulations anymore, and those who really need to protect themselves, will stay at home and continue to live in a self-imposed state of lock-down.</p>
<p>Now some good news at least. It is said that for anyone who tests positive, 8 go untested because symptoms are too mild or inexistent to require medical help, and thus are not diagnosed. Me and my whole family and close friends fit in that category: never tested, but fully recovered, with no long-term side effects. Have we been re-infected again? Possibly, but it was even milder the second time, and again none of us were tested. One reason for not being tested / treated is that going to an hospital is much more risky than dining-in in a restaurant (many hospital workers died from covid, much fewer restaurant workers did). Another reason is to not have a potentially worrisome medical record attached to my name. Now you can say we were never infected in the first place, but it's like saying the virus is not contagious at all. Or you can say we will be re-infected again, but it's like saying the vaccine, even two doses six months apart, won't work. Indeed we are very optimistic about our future, as are all the people currently boosting the stock market to incredible highs. What I am saying here is that probably up to half of the population (150 million Americans) are currently at the end of the tunnel by now: recovered for most of us, or dead. </p>
<p>Some people like myself who had a worse-than-average (still mild) case realize that wearing a mask causes difficulty breathing worse than the virus itself. I don't have time to wash my mask and hands all the time, or buy new masks and so on, when I believe me and my family are done with it. Unwashed, re-used masks are probably full of germs and worse than no mask, once immune. As more and more people recover every day in very large numbers these days (but the media never mention it) you are going to see more and more people who spontaneously return to a normal life. These people are not anti-science, anti-social, or anti-government - quite the contrary, they are acting rationally, not driven by fear. They don't believe in conspiracy theories, and are from all political affiliations or apolitical. Forcing these people to isolate via mandated lock-downs won't work: some will have big parties in private homes, a hair-dresser may decide to provide her services privately in the homes of her clients, and be paid under the table. People still want to eat great food with friends and will continue to do so. People still want to date. Even if the city of Los Angeles makes it illegal to meet in your home with members from another household, you can't stop young (or less young) people from dating, not any more than you can stop the law of gravity no matter how hard you try.</p>
<p>Of course, if all the people acting this way were immune, it would not be an issue. Unfortunately, many people who behave that way today are just careless (or ignorant, maybe not reading the news anymore). But as time goes by, even many of the careless people are going to get infected and then immune; it's a matter of weeks. So the intensity of this situation may peak in a few weeks and then naturally slow down, as dramatically as it rose.</p>
<p>In conclusion, I believe that by the end of March we will be back to much better times, and covid will be a thing of the past for most of us. Like the Spanish flu. Though it is said that the current yearly flu is just remnants from the 1918 pandemic. The same may apply to covid, but it will be less lethal moving forward, after having killed those who were most susceptible to it. Already the death rate has plummeted. This of course won't help people who have lost a family member or friend, you can't make her come back. This is the sad part.</p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
Introducing an All-purpose, Robust, Fast, Simple Non-linear Regression
tag:www.datasciencecentral.com,2020-11-24:6448529:BlogPost:1003574
2020-11-24T03:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>The model-free, data-driven technique discussed here is so basic that it can easily be implemented in Excel, and we actually provide an Excel implementation. It is surprising that this technique does not pre-date standard linear regression, and is rarely if ever used by statisticians and data scientists. It is related to kriging and nearest neighbor interpolation, and apparently first mentioned in 1965 by Harvard scientists working on GIS (geographic information systems). It was referred…</p>
<p>The model-free, data-driven technique discussed here is so basic that it can easily be implemented in Excel, and we actually provide an Excel implementation. It is surprising that this technique does not pre-date standard linear regression, and is rarely if ever used by statisticians and data scientists. It is related to kriging and nearest neighbor interpolation, and apparently first mentioned in 1965 by Harvard scientists working on GIS (geographic information systems). It was referred back then as Shepard's method or inverse distance weighting, and used for multivariate interpolation on non-regular grids (see <a href="https://en.wikipedia.org/wiki/Multivariate_interpolation" target="_blank" rel="noopener">here</a> and <a href="https://en.wikipedia.org/wiki/Inverse_distance_weighting" target="_blank" rel="noopener">here</a>). We call this technique <em>simple regression</em>.</p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8209321855?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8209321855?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source for picture: <a href="https://www.datasciencecentral.com/profiles/blogs/3-types-of-regression-in-one-picture-baba-png" target="_blank" rel="noopener">here</a></em></p>
<p>In this article, we show how simple regression can be generalized and used in regression problems especially when standard regression fails due to multi-collinearity or other issues. It can safely be used by non-experts without risking misinterpretation of the results or over-fitting. We also show how to build confidence intervals for predicted values, compare it to linear regression on test data sets, and apply it to a non-linear context (regression on a circle) where standard regression fails. Not only it works for prediction inside the domain (equivalent to interpolation) but also, to a lesser extent and with extra care, outside the domain (equivalent to extrapolation). No matrix inversion or gradient descend is needed in the computations, making it a faster alternative to linear or logistic regression.</p>
<p><span style="font-size: 14pt;"><strong>1. Simple regression explained</strong></span></p>
<p>For ease of presentation, we only discuss the two-dimensional case. Generalization to any dimension is straightforward. Let us assume that the data set (also called training set) consists of <em>n</em> points or locations (<em>X</em><span style="font-size: 8pt;">1</span>, <em>Y</em><span style="font-size: 8pt;">1</span>), ..., (<em>X<span style="font-size: 8pt;">n</span></em>, <em>Y<span style="font-size: 8pt;">n</span></em>) together with the response (also called dependent values) <em>Z</em><span style="font-size: 8pt;">1</span>, ..., <em>Z<span style="font-size: 8pt;">n</span></em> attached to each observation. Then the predicted value <em>Z</em> at an arbitrary location (<em>X</em>, <em>Y</em>) is computed as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8208229253?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8208229253?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>Throughout this article, we used </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8208207489?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8208207489?profile=RESIZE_710x" width="370" class="align-center"/></a></p>
<p>with <em>β</em> = 5.<b> </b>The parameter <em>β</em> controls the smoothness and is actually an hyper-parameter. It should be set to at least twice the dimension of the problem. A large value of <em>β </em>decreases the influence of far-away points in the predictions. In a Bayesian framework, a prior could be attached to <em>β</em>. Also note that if (<em>X</em>, <em>Y</em>) is one of the <em>n</em> training set points, say (<em>X</em>, <em>Y</em>) = (<em>X<span style="font-size: 8pt;">j</span></em>, <em>Y<span style="font-size: 8pt;">j</span></em>) for some <em>j</em>, then <em>Z</em> must be set to <em>Z<span style="font-size: 8pt;">j</span></em>. In short, the predicted value is exact for points belonging to the training set. If <span>(<em>X</em>, <em>Y</em>)</span> is very close to say (<em>X<span style="font-size: 8pt;">j</span></em>, <em>Y<span style="font-size: 8pt;">j</span></em>) and further away from the other training set points, then the computed <em>Z</em> is very close to <em>Z<span style="font-size: 8pt;">j</span></em>. It is assumed here that there are no duplicate locations in the training set otherwise, the formula needs adjustments. </p>
<p><span style="font-size: 14pt;"><strong>2. Case studies and Excel spreadsheet with computations</strong></span></p>
<p>We did some simulations to compare the performance of simple regression versus linear regression. In the first example, the training set consists of <em>n</em> = 100 data points generated as follows. The locations are random points (<em>X<span style="font-size: 8pt;">k</span></em>, <em>Y<span style="font-size: 8pt;">k</span></em>) in the two-dimensional unit square [0, 1] x [0, 1]. The response was set to <em>Z<span style="font-size: 8pt;">k</span></em> = SQRT[(<em>X<span style="font-size: 8pt;">k</span></em>)^2 + (<em>Y<span style="font-size: 8pt;">k</span></em>)^2]. The control set consists of another <em>n</em> = 100 points, also randomly distributed on the same unit square. The predicted values were computed on the control set, and the goal is to check how well they approximate the theoretical (true) value SQRT(<em>X</em>^2 + <em>Y</em>^2). Both the simple and linear regression perform well, though the R-squared is a little better for the simple regression, for most training and control sets of this type. The picture below shows the quality of the fit. A perfect fit would correspond to a perfect diagonal line rather than a cloud, with 0.9886 and 0.0089 (the slope and intercept of the red line) replaced respectively by 1 and 0. Note that the R-squared 0.9897 is very close to 1.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8208321887?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8208321887?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>data set doing well with both simple and linear regression</em></p>
<p><span><strong>2.1. Regression on the circle</strong></span></p>
<p>In this second example, both the training set and control points are located on the unit circle (on the border of the circle, not inside or outside, so technically this a one-dimensional case). As expected the R-squared for the linear regression is terrible, and close to zero, while it is close to one for the simple regression. Note the weird distribution for the linear regression: this is not a glitch, it is expected to be that way.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8208423294?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8208423294?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>Good fit with simple regression (points distributed on a circle)</em></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8208428655?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8208428655?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3</strong>: <em>Bad fit with linear regression (points distributed on the same circle as in Figure 2)</em></p>
<p><strong>2.2. Extrapolation</strong></p>
<p>In the third example, we used the same training set with random locations on the unit circle. The control set consists this time of <em>n</em> = 100 points located in a square away from the circle, with no intersection with the circle. This corresponds to extrapolation. Both the linear and simple regression perform badly this time. The R-squared associated with the linear regression is close to zero, so no amount of re-scaling can fix it. The predicted values appear random.</p>
<p>However, even though the simple regression results are almost as much off as those coming from the linear regression with respect to bias, they can be substantially improved, easily. The picture below illustrates this fact. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8209018659?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8209018659?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 4</strong>: <em>Testing predictions outside the domain (extrapolation)</em></p>
<p>The slope in figure 4 is 0.3784. For a perfect fit, it should be equal to one. However the R-squared for the simple regression is pretty good: 0.842. So if we multiply the predicted values by a constant so that the average predicted value, in the square outside the circle, if not heavily biased anymore, we would have a good fit with the same R-squared. Of course, this assumes that the true average value on the unit square domain is known, at least approximately. It is significantly different from the average value computed on the training set (the circle), thus the bias. This fix won't work for the linear regression, with the R-squared staying unchanged and close to zero after rescaling, even if we remove the bias. </p>
<p><strong>2.3. Confidence intervals for predicted values</strong></p>
<p>Here, we are back to using the first data set that worked well both for linear and simple regression, doing interpolation rather than extrapolation, as at the beginning of section 2. The control set is fixed, but we split the training set (consisting this time of 500 points) into 5 subsets. This approach is similar to cross-validation or bootstrapping, and allows us to compute confidence intervals for the predicted values. It works as follows:</p>
<ul>
<li>Repeat the whole procedure 5 times, using each time a different subset of the training set</li>
<li>Estimate <em>Z</em> based on the location (<em>X</em>, <em>Y</em>) for each point in the control set, using the formula in section 1: we will have 5 different estimates for each point, one for each subset of the training set</li>
<li>For each point in the control set, compute the minimum and maximum estimated value, out of the 5 predictions</li>
<li>The confidence interval for each point has the minimum predicted value as lower bound, and the maximum as upper bound. </li>
</ul>
<p>Of course the technique can be further refined, using percentiles rather than minimum and maximum for the bounds of the confidence intervals. The most modern way to do it is described in my book <em>Statistics: New Foundations, Toolkit and Machine Learning Recipes</em>, available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">here</a> to DSC members. See chapters 15-16, pages 107-132.</p>
<p>The <strong>striking conclusions</strong> based on this test are as follows:</p>
<ul>
<li>The CI (confidence interval) based on simple regression is about 50% larger on average than the one based on linear regression</li>
<li>The CI based on simple regression contains the true value 92% of the time, versus 24% of the time for the linear regression.</li>
</ul>
<p>What is striking is the 92% achieved by the simple regression. Part of it is because the simple regression CI's are larger, but there is more to it. </p>
<p><strong>2.4. Excel spreadsheet</strong></p>
<p>All the data and tests discussed, including the computations, are available in my spreadsheet, allowing you to replicate the results or use it on your own data. You can download it <a href="https://storage.ning.com/topology/rest/1.0/file/get/8209116672?profile=original" target="_blank" rel="noopener">here</a> (krigi2.xlsx). The main tabs in the spreadsheet are</p>
<ul>
<li>Square</li>
<li>Circle-Interpolation</li>
<li>Circle-Extrapolation</li>
<li>Square-CI-Summary</li>
</ul>
<p>The remaining tabs are used for auxiliary computations and can be ignored.</p>
<p><span style="font-size: 14pt;"><strong>4. Generalization</strong></span></p>
<p>If you look at the main formula in section 1, the predicted <em>Z</em> is the quotient of two arithmetic means. The one at the numerator is a weighted mean, and the one at the denominator is a standard mean. But the formula will also work with other types of means, for example with the exponential mean discussed in one of my previous articles, <a href="https://www.datasciencecentral.com/profiles/blogs/alternative-to-the-arithmetic-geometric-and-harmonic-means" target="_blank" rel="noopener">here</a>. The advantage of using such means, over the arithmetic mean, is that there are hyperparameters attached to them, thus allowing for more granular fine-tuning. </p>
<p>For example, the exponential mean of <em>n</em> numbers <em>A</em><span style="font-size: 8pt;">1</span>, ..., <em>A<span style="font-size: 8pt;">n</span></em> is defined as</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8209146656?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8209146656?profile=RESIZE_710x" width="350" class="align-center"/></a></p>
<p>When the hyperparameter <em>p</em> tends to 1, it corresponds to the arithmetic mean. Here, use the exponential mean with</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8209189858?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8209189858?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>respectively for the numerator and denominator in the first formula in section 1. You can even use a different <em>p</em> for the numerator and denominator.</p>
<p>Other original exact interpolation techniques based on Fourier methods, in one dimension and for points equally spaced, are described <a href="https://mathoverflow.net/questions/376081/infinite-partial-fraction-expansions-to-compute-fractional-iterations-and-recurr" target="_blank" rel="noopener">in this article</a>. Indeed, it was this type of interpolation that led me to investigate the material presented here. Robust, simple linear regression techniques are also described in chapter 1 in my book <em>Statistics: New Foundations, Toolkit and Machine Learning Recipes</em>, available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">here</a> to DSC members.</p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
Interesting Application of the Poisson-Binomial Distribution
tag:www.datasciencecentral.com,2020-11-11:6448529:BlogPost:1000712
2020-11-11T03:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>While the Bernoulli and binomial distributions are among the first ones taught in any elementary statistical course, the Poisson-Binomial is rarely mentioned. It is however one of the simplest discrete distributions, with applications in survey analysis, see <a href="https://www.researchgate.net/publication/228718793_Statistical_Applications_of_the_Poisson-Binomial_and_conditional_Bernoulli_distributions" rel="noopener" target="_blank">here</a>. In this article, we are dealing with…</p>
<p>While the Bernoulli and binomial distributions are among the first ones taught in any elementary statistical course, the Poisson-Binomial is rarely mentioned. It is however one of the simplest discrete distributions, with applications in survey analysis, see <a href="https://www.researchgate.net/publication/228718793_Statistical_Applications_of_the_Poisson-Binomial_and_conditional_Bernoulli_distributions" target="_blank" rel="noopener">here</a>. In this article, we are dealing with experimental / probabilistic number theory, leading to a more efficient detection of large prime numbers, with applications in cryptography and IT security. </p>
<p>This article is accessible to people with minimal math or statistical knowledge, as we avoid jargon and theory, favoring simplicity. Yet we are able to present original research-level results that will be of interest to professional data scientists, mathematicians, and machine learning experts. The data set explored here is the set of numbers, and thus accessible to anyone. We also explain computational techniques, even mentioning online tools, to deal with very large integers that are beyond what standard programming languages or Excel can handle. </p>
<p><span style="font-size: 14pt;"><strong>1. The Poisson-Binomial Distribution</strong></span></p>
<p>We are all familiar with the most basic of all random variables: the Bernoulli. If <i>Y</i> is such a variable, it is equal to 0 with probability <em>p</em>, and to 1 with probability 1 - <em>p</em>. Here the parameter <em>p</em> is a real number between 0 and 1. If you run <em>n</em> trials, independent from each other, and each with the same potential outcome, then the number of successes, defined as the number of times the outcome is equal to 1, is a Binomial variable of parameters <em>n</em> and <em>p</em>. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124664081?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124664081?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source for picture: <a href="https://blogs.sas.com/content/iml/2020/10/07/poisson-binomial-hundreds-of-parameters.html" target="_blank" rel="noopener">here</a></em></p>
<p>If the trials are independent but a different <em>p</em> is attached to each of them, then this time the number of successes has a Poisson-binomial distribution. In short, let's say that we have <em>n</em> independent Bernoulli random variables <i>Y</i><span style="font-size: 8pt;">1</span>, ..., Y<em><span style="font-size: 8pt;">n</span></em> respectively with parameter <em>p</em><span style="font-size: 8pt;">1</span>, ..., <em>p<span style="font-size: 8pt;">n</span></em>, then the number of successes <i>X</i> = <i>Y</i><span style="font-size: 8pt;">1</span> + ... + Y<em><span style="font-size: 8pt;">n</span></em> has a Poisson-binomial distribution of parameters <em>p</em><span style="font-size: 8pt;">1</span>, ..., <em>p<span style="font-size: 8pt;">n</span></em> and <em>n</em>. The exact probability density function is cumbersome to compute as it is combinatorial in nature, but a Poisson approximation is available and will be used in this article, thus the name <em>Poisson-binomial</em>. </p>
<p>The first two moments (expectation and variance) are as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124556881?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124556881?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>The exact formula for the PDF (probability density function) involves an exponentially growing number of terms as <em>n</em> becomes large. For instance, P(<em>X</em> = <em>n</em> - 2) which is the probability that exactly two out of <em>n</em> trials fail, is given by the following formula:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124558097?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124558097?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>For this reason, whenever possible, approximations are used. </p>
<p><strong>1.1. Poisson approximation</strong></p>
<p>When the parameters <em>p<span style="font-size: 8pt;">k</span></em> are small, say <em>p<span style="font-size: 8pt;">k</span></em> < 0.1, then the following Poisson approximation applies. Let <span><em>λ</em> = <em>p</em><span style="font-size: 8pt;">1</span> + ... + <em>p<span style="font-size: 8pt;">n</span></em>. Then for <em>m</em> = 0, ..., <em>n</em>, we have: </span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124637257?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124637257?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>When <em>n</em> becomes large, we can use the <a href="https://www.datasciencecentral.com/profiles/blogs/new-perspective-on-central-limit-theorem-and-related-stats-topics" target="_blank" rel="noopener">Central Limit Theorem</a> to compute more complicated probabilities such as P(<em>X</em> > <em>m</em>), based on the Poisson approximation. See also the <a href="https://en.wikipedia.org/wiki/Le_Cam%27s_theorem" target="_blank" rel="noopener">Le Cam theorem</a> for more precise approximations. </p>
<p><span style="font-size: 14pt;"><strong>2. Case study: Odds to observe many primes in a random sequence</strong></span></p>
<p>The 12 integers below were produced with a special sequence described in the second example in <a href="https://mathoverflow.net/questions/374305/sequences-with-high-densities-of-primes-how-to-boost-them-to-get-even-more-and" target="_blank" rel="noopener">this article</a>. It quickly produces a large volume of numbers with no small divisors. How likely it is to produce such a sequence of numbers just by chance? The numbers <span style="font-size: 12pt;">q[5], q[6], q[7], q[12]</span> have divisors smaller than 1,000 and the remaining eight numbers have no divisor smaller than <em>N</em> = 15,485,863. Note that <em>N</em> (the one-millionth prime) is the largest divisor that I tried in that test. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124676862?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124676862?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>Here is the answer. The probability for a large number <em>x</em> to be prime is about 1 / log <em>x</em>, by virtue of the <a href="https://www.datasciencecentral.com/profiles/blogs/simple-proof-of-prime-number-theorem" target="_blank" rel="noopener">Prime Number Theorem</a>. The probability for a large number <em>x</em> to have no divisor smaller than <em>N</em> is</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124736673?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124736673?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>where the product is over all primes <em>p</em> < <em>N</em> and <em>γ</em> = 0.577215… is the Euler–Mascheroni constant. Here <em>ρ<span style="font-size: 8pt;">N</span></em> ≈ 0.033913. See <a href="https://www.datasciencecentral.com/profiles/blogs/88-per-cent-of-all-integers-have-a-factor-under-100" target="_blank" rel="noopener">here</a> for an explanation of the equality on the left side. The right-hand formula is known as the <a href="https://en.wikipedia.org/wiki/Mertens%27_theorems" target="_blank" rel="noopener">Mertens theorem</a>. See also <a href="https://mathoverflow.net/questions/374824/asymptotics-for-prod1-frac1p-over-all-primes-p-leq-x-with-p-equiv" target="_blank" rel="noopener">here</a>. The symbol ~ represents <a href="https://en.wikipedia.org/wiki/Asymptotic_analysis" target="_blank" rel="noopener">asymptotic equivalence</a>. Thus the probability to observe 4 large numbers out of 12 having no divisor smaller than <em>N</em> is</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124740889?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124740889?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>Note that we used a binomial distribution here to answer the question. Also, the probability for <em>x</em> to be prime if it has no divisor smaller than <em>N</em> is equal to<a href="https://storage.ning.com/topology/rest/1.0/file/get/8148564499?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8148564499?profile=RESIZE_710x" width="550" class="align-center"/></a></p>
<p>For the above numbers q[1],⋯,q[12], the probability in question is not small. For instance, it is equal to 0.47, 0.36 and 0.23 respectively for q[1], q[2] and q[11]. Other sequences producing a high density of prime numbers are discussed <a href="https://mathoverflow.net/questions/374305/sequences-with-high-densities-of-primes-how-to-boost-them-to-get-even-more-and" target="_blank" rel="noopener">here</a> and <a href="https://mathoverflow.net/questions/375133/quadratic-progressions-with-very-high-prime-density" target="_blank" rel="noopener">here</a>. </p>
<p><strong>2.1. Computations based on the Poisson-Binomial distribution</strong></p>
<p>Let us denote as <em>p<span style="font-size: 8pt;">k</span></em> the probability that q[<em>k</em>] is prime, for <em>k</em> =1, ...,12. As discussed earlier in section 2, <em>p<span style="font-size: 8pt;">k</span></em> = 1 / log q[<em>k</em>] is small, and the Poisson approximation can be used when dealing with the Poisson-binomial distribution. So we can use the formula in section 1.1. with <span><em>λ</em> </span>= <em>p</em><span style="font-size: 8pt;">1</span> + ... + <em>p<span style="font-size: 8pt;">n</span></em> and <em>n</em> = 12. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8148672493?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8148672493?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>Thus, <span><em>λ</em> = 0.11920 (approx.) Now we can compute <em>P</em>(<em>X</em> = <em>m</em>) for <em>m</em> = 8, 9, 10, 11,12:</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8148678090?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8148678090?profile=RESIZE_710x" width="120" class="align-center"/></a></p>
<p>The chance that 8 or more large numbers are prime among q[1],⋯,q[12] is the sum of the 5 probabilities in the above table. It is equal to 9.1068 / 10^13. That is, less than one in a trillion. </p>
<p><strong>2.2. Technical note: handling very large numbers</strong></p>
<p>Numbers investigated in this research have dozens and even hundreds of digits. The author has routinely worked with numbers with millions of digits. Below are some useful tools to deal with such large numbers.</p>
<ul>
<li>If you use a programming language, check if it has a BigNum or BigInt library. Here I used the Perl programming language, with the BigNum library. A similar library is available in Python. See examples of code, <a href="https://www.datasciencecentral.com/forum/topics/question-how-precision-computing-in-python" target="_blank" rel="noopener">here</a>. </li>
<li>A list of all prime numbers up to one trillion is available <a href="http://compoasso.free.fr/primelistweb/page/prime/liste_online_en.php" target="_blank" rel="noopener">here</a>. </li>
<li>To check if a large number <em>p</em> is prime or not, use the command PrimeQ[<em>p</em>] in Mathematica, also available online <a href="https://www.wolframalpha.com/input/?i=PrimeQ%5B29*%2880%21%29+%2B+1%5D" target="_blank" rel="noopener">here</a>. Another online tool, allowing you to test many numbers in batch to find which ones are prime, is available <a href="https://www.alpertron.com.ar/ECM.HTM" target="_blank" rel="noopener">here</a>.</li>
<li>The online Sagemath symbolic calculator is also useful. I used it e.g. to compute millions of binary digits of numbers such as SQRT(2), see <a href="https://sagecell.sagemath.org/?z=eJzz0yguLCrRMNLUKShKTbY1NAACTb3ikiKNpMTiVFsjTQCp3gnT&lang=sage" target="_blank" rel="noopener">here</a>. </li>
<li>For those interested in experimental number theory, the <a href="https://oeis.org/" target="_blank" rel="noopener">OEIS online tool</a> is also very valuable. If you discover a sequence of integers, and you are wondering if it has been discovered before, you can do a reverse lookup to find references to the sequence in question. You can also do a reverse lookup on math constants, entering the first 15 digits to see if it matches a known math constant.</li>
</ul>
<p><span style="font-size: 14pt;"><strong>3. Cryptography application </strong></span></p>
<p>Many cryptography systems rely on public and private keys that feature the product of two large primes, typically with hundreds or thousands of binary digits. Producing such large primes was not an easy task until efficient algorithms were created to check if a number is prime or not. These algorithms are known as <a href="https://en.wikipedia.org/wiki/Primality_test" target="_blank" rel="noopener">primality tests</a>. Some are very fast but only provide a probabilistic answer: the probability that the number in question is a prime number, which is either zero or extremely close to one. These algorithms rely on sampling a large number of primes to identify prime candidates, and then determine their status (prime or not prime) with an exact but more costly test. </p>
<p>Remember that the probability for a random, large integer <em>p</em> to be prime, is about 1 / log <em>p</em>. So if you test 100,000 numbers close to 10^300, you'd expect to find 145 primes. Not a very efficient strategy. One way to improve these odds by an order of magnitude, is to pick up integers belonging to sequences that are prime-rich: such sequences can contain 10 times more primes than random sequences. This is where the methodology discussed here becomes handy. Such sequences are discussed in two of my articles: <a href="https://mathoverflow.net/questions/375133/quadratic-progressions-with-very-high-prime-density" target="_blank" rel="noopener">here</a> and <a href="https://mathoverflow.net/questions/374305/sequences-with-high-densities-of-primes-how-to-boost-them-to-get-even-more-and" target="_blank" rel="noopener">here</a>. </p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
Thursday News, October 29
tag:www.datasciencecentral.com,2020-10-29:6448529:BlogPost:999995
2020-10-29T18:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of featured articles and technical resources posted since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/31QaZCZ">Fully online MS in Data Science at CUNY</a></li>
</ul>
<p><strong>DSC Articles</strong></p>
<ul>
<li><div class="ib"><span><a href="https://www.datasciencecentral.com/profiles/blogs/how-kids-channel-their-internal-data-scientist-to-become-candy">How Kids Channel Their Internal Data Scientist to Become Candy Optimization…</a></span></div>
</li>
</ul>
<p>Here is our selection of featured articles and technical resources posted since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/31QaZCZ">Fully online MS in Data Science at CUNY</a></li>
</ul>
<p><strong>DSC Articles</strong></p>
<ul>
<li><div class="ib"><span><a href="https://www.datasciencecentral.com/profiles/blogs/how-kids-channel-their-internal-data-scientist-to-become-candy">How Kids Channel Their Internal Data Scientist to Become Candy Optimization Machines</a>...</span></div>
</li>
<li><div class="ib"><a href="https://www.datasciencecentral.com/profiles/blogs/fintech-trends-ai-smart-contracts-neobanks-open-banking-and" target="_blank" rel="noopener">FinTech Trends: AI, Smart Contracts, Neobanks, Open Banking, and Blockchain</a></div>
</li>
<li><div class="ib"><span><span><a href="https://www.datasciencecentral.com/profiles/blogs/conjunction-vs-disjunction"></a></span></span><span><a href="https://www.datasciencecentral.com/profiles/blogs/conjunction-vs-disjunction"></a></span><a href="https://www.datasciencecentral.com/profiles/blogs/digital-twin-virtual-manufacturing-and-the-coming-diamond-age">Digital Twins, Virtual Manufacturing, and the Coming Diamond Age</a></div>
</li>
<li><div class="ib"><a href="https://www.datasciencecentral.com/profiles/blogs/conjunction-vs-disjunction">Conjunction vs Disjunction: Bad Apples and Other Analogies</a></div>
</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-connection-between-transparency-auditability-and-ai"><span>The</span> Connection Between Transparency, Auditability, and AI</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-most-essential-skills-you-need-to-know-to-start-doing-machine-1">Essential Skills Needed to Start Doing Machine Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/world-s-top-5-data-analytics-companies-in-2020" target="_blank" rel="noopener">World's Top 5 Data Analytics Companies in 2020</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-most-essential-skills-you-need-to-know-to-start-doing-machine">Job opportunities in Data Science with Python</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/insights-from-the-free-state-of-ai-repost"><span>Insights from the free state of AI repost</span></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-steps-to-collect-high-quality-data">5 Steps to Collect High-quality Data</a><span><a href="https://www.datasciencecentral.com/profiles/blogs/insights-from-the-free-state-of-ai-repost"></a></span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-are-ensemble-techniques">What are Ensemble Techniques?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-mlops-stack">The MLOps Stack</a></li>
</ul>
<p><b>Published On Tech Target</b></p>
<ul>
<li><a href="https://searchenterpriseai.techtarget.com/news/252491270/Wordtune-AI-tool-from-AI21-Labs-rewrites-sentences-using-NLG" target="_blank" rel="noopener">Wordtune AI tool from AI21 Labs rewrites sentences using NLG</a></li>
<li><a href="https://searchhrsoftware.techtarget.com/news/252491265/Firms-dive-into-data-for-diversity-and-inclusion-strategies" target="_blank" rel="noopener"></a><a href="https://searchhrsoftware.techtarget.com/news/252491265/Firms-dive-into-data-for-diversity-and-inclusion-strategies" target="_blank" rel="noopener">Firms</a> <a href="https://searchhrsoftware.techtarget.com/news/252491265/Firms-dive-into-data-for-diversity-and-inclusion-strategies" target="_blank" rel="noopener">dive into data for diversity and inclusion strategies</a></li>
<li><a href="https://searchenterpriseai.techtarget.com/feature/AI-fraud-detection-tools-can-help-rising-e-commerce-fraud" target="_blank" rel="noopener">AI fraud detection tools can help fight rising e-commerce fraud</a></li>
<li><a href="https://searchcontentmanagement.techtarget.com/feature/Baseball-team-digitizes-media-uses-AI-to-uncover-metadata" target="_blank" rel="noopener">Baseball team digitizes media, uses AI to uncover metadata</a></li>
<li><a href="https://searchbusinessanalytics.techtarget.com/feature/How-DataOps-architecture-benefits-your-analytics-strategy" target="_blank" rel="noopener">How DataOps architecture benefits your analytics strategy</a></li>
</ul>
<p></p>
<p><strong>Technical Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/free-book-cloud-native-containers-and-next-gen-apps">Free book - Cloud Native, Containers and Next-Gen Apps</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/8-best-big-data-hadoop-analytics-tools-in-2021">Best Big Data Hadoop Analytics Tools in 2021</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/rpa-guide-for-fintech-industry">RPA Guide For Fintech Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/applied-data-science-with-python">Applied Data Science with Python</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/so-you-want-to-write-for-dsc-1">So You Want to Write for Data Science Central</a></li>
</ul>
<p></p>
<hr/><p>For more news. information, and commentary in the AI, analytics and enterprise data realms, subscribe to the <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">Data Science Cental Newsletter</a>.</p>
Weekly Digest, October 26
tag:www.datasciencecentral.com,2020-10-25:6448529:BlogPost:999591
2020-10-25T21:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><span><strong>Featured Resources and Technical…</strong></span></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/so-you-want-to-write-for-dsc-1">So You Want to Write for Data Science Central</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/statistical-machine-learning-in-python">Statistical Machine Learning in Python</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/approaches-to-time-series-data-with-weak-seasonality">Approaches to Time Series Data with Weak Seasonality</a> +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/odds-vs-probability-vs-likelihood">Odds vs Probability vs Chance</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/free-book-cloud-native-containers-and-next-gen-apps">Free book - Cloud Native, Containers and Next-Gen Apps</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/8-best-big-data-hadoop-analytics-tools-in-2021">Best Big Data Hadoop Analytics Tools in 2021</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/rpa-guide-for-fintech-industry">RPA Guide For Fintech Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/applied-data-science-with-python">Applied Data Science with Python</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/data-science-techniques-to-eliminate-false-negatives">Question: Techniques to eliminate False Negatives</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-implications-of-huang-s-law-for-the-artificial-intelligence">The implications of Huang’s law for the AI stack</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/digital-dreams-analog-processes"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/digital-dreams-analog-processes">Digital Dreams – Analog Processes</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/waiting-for-godot-developing-competitive-differentiation"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/waiting-for-godot-developing-competitive-differentiation">Waiting for Godot: Developing Competitive Differentiation</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/next-generation-chip-wars-as-amd-eyes-xilink-acquisition"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/next-generation-chip-wars-as-amd-eyes-xilink-acquisition">Next Generation Chip Wars Heat Up </a>as AMD Eyes Xilinx acquisition<br/> <a href="https://www.datasciencecentral.com/profiles/blogs/technology-in-education-trends-edtech-the-future-of-e-learning/"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/technology-in-education-trends-edtech-the-future-of-e-learning/">Edtech - the Future of E-learning Software</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/data-preparation-need-not-be-cumbersome-or-time-consuming"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/data-preparation-need-not-be-cumbersome-or-time-consuming">Data Preparation Need Not Be Cumbersome Or Time Consuming</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/how-artificial-intelligence-is-reshaping-small-businesses"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-artificial-intelligence-is-reshaping-small-businesses">How Artificial Intelligence Is Reshaping Small Businesses</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/5-tried-and-tested-saas-marketing-strategies-to-generate-leads"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-tried-and-tested-saas-marketing-strategies-to-generate-leads">5 Tried and Tested SaaS Marketing Strategies to Generate Leads</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/how-cognitive-chatbots-provide-supreme-customer-experience-and"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-cognitive-chatbots-provide-supreme-customer-experience-and">How cognitive chatbots transform service desk interactions</a><br/> <a href="https://www.datasciencecentral.com/forum/topics/data-science-as-a-service-industry-overview-and-growth-outlook"></a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/data-science-as-a-service-industry-overview-and-growth-outlook">Data Science as a Service Industry: </a>Overview and Growth Outlook</li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8073414678?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8073414678?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>
Thursday News, October 22
tag:www.datasciencecentral.com,2020-10-22:6448529:BlogPost:999318
2020-10-22T17:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of featured articles and technical resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/31s5M46">Learn why 63% of firms will be advancing their adoption of AI<span> </span></a><span>by 2023.</span></li>
</ul>
<p><strong>Technical Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/free-book-cloud-native-containers-and-next-gen-apps">Free book - Cloud Native, Containers and…</a></li>
</ul>
<p>Here is our selection of featured articles and technical resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/31s5M46">Learn why 63% of firms will be advancing their adoption of AI<span> </span></a><span>by 2023.</span></li>
</ul>
<p><strong>Technical Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/free-book-cloud-native-containers-and-next-gen-apps">Free book - Cloud Native, Containers and Next-Gen Apps</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/8-best-big-data-hadoop-analytics-tools-in-2021">Best Big Data Hadoop Analytics Tools in 2021</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/rpa-guide-for-fintech-industry">RPA Guide For Fintech Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/applied-data-science-with-python">Applied Data Science with Python</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/data-science-techniques-to-eliminate-false-negatives">Question: Techniques to eliminate False Negatives</a></li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/waiting-for-godot-developing-competitive-differentiation">Waiting for Godot: Developing Competitive Differentiation</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/next-generation-chip-wars-as-amd-eyes-xilink-acquisition">Next Generation Chip Wars Heat Up<span> </span></a>as AMD Eyes Xilinx acquisition</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-artificial-intelligence-is-reshaping-small-businesses">How Artificial Intelligence Is Reshaping Small Businesses</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-tried-and-tested-saas-marketing-strategies-to-generate-leads">5 Tried and Tested SaaS Marketing Strategies to Generate Leads</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-cognitive-chatbots-provide-supreme-customer-experience-and">How cognitive chatbots transform service desk interactions</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/data-science-as-a-service-industry-overview-and-growth-outlook">Data Science as a Service Industry:<span> </span></a>Overview and Growth Outlook</li>
</ul>
<p>Enjoy the reading!</p>
Weekly Digest, October 19
tag:www.datasciencecentral.com,2020-10-18:6448529:BlogPost:997575
2020-10-18T23:49:59.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><span><strong>Featured Resources and Technical…</strong></span></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/genius-tool-to-compare-best-time-series-models-for-multi-step">Best Models For Multi-step Time Series Modeling</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/types-of-variables-in-data-science-in-one-picture">Types of Variables in Data Science in One Picture</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-quick-demonstration-of-polling-confidence-interval-calculations">A quick demonstration of polling confidence interval calculations </a>using simulation</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-should-never-run-a-logistic-regression-unless-you-have-to">Why you should NEVER run a Logistic Regression </a>(unless you have to)</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/model-evaluation-model-selection-and-algorithm-selection-in">Cross-validation and hyperparameter tuning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-best-data-science-courses-2020">5 Great Data Science Courses</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/complete-hands-off-automated-machine-learning">Complete Hands-Off Automated Machine Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-should-learn-sitecore-cms-in-2021">Why You Should Learn Sitecore CMS?</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-is-driving-software-2-0-with-minimal-human-intervention">AI is Driving Software 2.0… with Minimal Human Intervention</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/data-observability-how-to-fix-your-broken-data-pipelines">Data Observability: How to Fix Your Broken Data Pipelines</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/applications-of-machine-learning-in-fintech">Applications of Machine Learning in FinTech</a></li>
<li><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/where-synthetic-data-brings-value">Where synthetic data brings value</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-fintech-is-the-future-of-banking">Why Fintech is the Future of Banking?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/real-estate-how-it-is-impacted-by-business-intelligence">Real Estate: How it is Impacted by Business Intelligence</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/determining-how-cloud-computing-benefits-data-science">Determining How Cloud Computing Benefits Data Science</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-is-mobile-banking-advantages-and-disadvantages-of-mobile-1">Advantages And Disadvantages Of Mobile Banking</a></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8048875067?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8048875067?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>
Thursday News, October 15
tag:www.datasciencecentral.com,2020-10-15:6448529:BlogPost:995450
2020-10-15T17:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of articles and technical contributions featured on DSC since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/3lK98qI">Penn State Master’s in Data Analytics<span> </span></a>– 100% Online</li>
<li><a href="https://dsc.news/310p28s">eBook: Data Preparation for Dummies</a></li>
</ul>
<p><strong>Technical Contributions…</strong></p>
<p>Here is our selection of articles and technical contributions featured on DSC since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/3lK98qI">Penn State Master’s in Data Analytics<span> </span></a>– 100% Online</li>
<li><a href="https://dsc.news/310p28s">eBook: Data Preparation for Dummies</a></li>
</ul>
<p><strong>Technical Contributions</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-quick-demonstration-of-polling-confidence-interval-calculations">A quick demonstration of polling confidence interval calculations<span> </span></a>using simulation</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-should-never-run-a-logistic-regression-unless-you-have-to">Why you should NEVER run a Logistic Regression<span> </span></a>(unless you have to)</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/model-evaluation-model-selection-and-algorithm-selection-in">Cross-validation and hyperparameter tuning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-should-learn-sitecore-cms-in-2021">Why You Should Learn Sitecore CMS?</a></li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-is-driving-software-2-0-with-minimal-human-intervention">AI is Driving Software 2.0… with Minimal Human Intervention</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/applications-of-machine-learning-in-fintech">Applications of Machine Learning in FinTech</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-fintech-is-the-future-of-banking">Why Fintech is the Future of Banking?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/real-estate-how-it-is-impacted-by-business-intelligence">Real Estate: How it is Impacted by Business Intelligence</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/determining-how-cloud-computing-benefits-data-science">Determining How Cloud Computing Benefits Data Science</a></li>
</ul>
<p>Enjoy the reading!</p>
<p></p>
<p></p>
Weekly Digest, October 12
tag:www.datasciencecentral.com,2020-10-11:6448529:BlogPost:992588
2020-10-11T22:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><strong>Announcement…</strong></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/36FbDXk">Customized data science workstations equipped with NVIDIA Rapids</a></li>
</ul>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/importance-of-service-mesh-networks-for-scaling-enterprise-ai-1">Importance of Service Mesh Networks for Scaling Enterprise AI Solutions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-rules-of-probability-in-one-picture">5 Rules of Probability in One Picture (Cat and Dog Edition) </a>+</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/no-causation-without-representation">Free book: No Causation without representation!</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-neural-network-zoo">The Neural Network Zoo</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/using-ai-to-super-compress-images">Using AI to Super Compress Images</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/explainable-artificial-intelligence-xai-1">Explainable Artificial Intelligence (XAI)</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/7-reasons-why-flutter-is-development-trend-of-2020">7 Reasons why Flutter is Development Trend of 2020</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-simple-guide-to-ai-machine-learning-and-deep-learning-or-as">A simple guide to AI, Machine Learning and Deep Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/kotlin-vs-flutter-find-your-perfect-fit-for-cross-platform-app-3">Kotlin vs Flutter </a>- Find Your Perfect Fit For Cross-platform App Development</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/natural-language-processing-how-this-innovative-technology-is">NLP in Chatbots</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/example-of-traffic-camera-maintenance-dashboard">Question: Traffic Camera Maintenance Dashboard</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/machine-learning-in-one-picture">Machine Learning with Applications in One Picture</a> +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-simply-deep-yet-convoluted-world-of-supervised-vs">The Convoluted World of Supervised vs Unsupervised Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-middle-east-to-become-the-world-s-leading-ai-hub">The Middle East to Become the World’s Leading AI Hub</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-and-machine-learning-top-priority-with-corporate-executives">AI and Machine Learning: Top Priority with Corporate Executives</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/voice-payment-in-banking-the-new-revolution-in-fintech">Voice Payment in Banking: The New Revolution in Fintech</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-finance-and-banking-mobile-app-is-vital-in-this-digital-era">Why finance and banking mobile app is vital in this digital era?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-iot-is-better-for-monitoring-gas-concentration-levels">Why IoT is Better for Monitoring Gas Concentration Levels?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-is-shaping-the-future-of-appointment-scheduling">AI is Shaping the Future of Appointment Scheduling</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/6-ways-through-which-data-science-in-finance-is-reinventing-the">Data Science in Finance is Reinventing the Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/big-data-how-it-is-reshaping-retail">Big Data: How it is Reshaping Retail</a></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8024569282?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8024569282?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>
Thursday News, October 8
tag:www.datasciencecentral.com,2020-10-08:6448529:BlogPost:990578
2020-10-08T20:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of featured articles and resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/36FbDXk">Customized data science workstations equipped with NVIDIA Rapids</a></li>
</ul>
<p><strong>Technical</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/using-ai-to-super-compress-images">Using AI to Super Compress Images…</a></li>
</ul>
<p>Here is our selection of featured articles and resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/36FbDXk">Customized data science workstations equipped with NVIDIA Rapids</a></li>
</ul>
<p><strong>Technical</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/using-ai-to-super-compress-images">Using AI to Super Compress Images</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-simple-guide-to-ai-machine-learning-and-deep-learning-or-as">A simple guide to AI, Machine Learning and Deep Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/kotlin-vs-flutter-find-your-perfect-fit-for-cross-platform-app-3">Kotlin vs Flutter<span> </span></a>- Find Your Perfect Fit For Cross-platform App Development</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/natural-language-processing-how-this-innovative-technology-is">NLP in Chatbots</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/example-of-traffic-camera-maintenance-dashboard">Question: Traffic Camera Maintenance Dashboard</a></li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-middle-east-to-become-the-world-s-leading-ai-hub">The Middle East to Become the World’s Leading AI Hub</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-and-machine-learning-top-priority-with-corporate-executives">AI and Machine Learning: Top Priority with Corporate Executives</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/voice-payment-in-banking-the-new-revolution-in-fintech">Voice Payment in Banking: The New Revolution in Fintech</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-iot-is-better-for-monitoring-gas-concentration-levels">Why IoT is Better for Monitoring Gas Concentration Levels?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/6-ways-through-which-data-science-in-finance-is-reinventing-the">Data Science in Finance is Reinventing the Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/big-data-how-it-is-reshaping-retail">Big Data: How it is Reshaping Retail</a></li>
</ul>
<p>Enjoy the reading!</p>
Weekly Digest, October 5
tag:www.datasciencecentral.com,2020-10-04:6448529:BlogPost:987889
2020-10-04T17:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><strong>Announcements…</strong></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/36xR5j5">See DataRobot in Action </a>- Webinar, October 7</li>
<li><a href="https://dsc.news/2Gjhgzu">Databricks' virtual hands-on lab </a>- October 14</li>
<li><a href="https://dsc.news/2Gh9eHz">Create powerful dashboards that answer questions quickly </a>- Tableau Whitepaper</li>
</ul>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/free-book-artificial-intelligence-foundations-of-computational">Free book - AI: Foundations of Computational Agents</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-need-to-know-those-probability-distributions">Why You Need to Know Those Probability Distributions</a> +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/time-series-forecasting-knn-vs-arima">Time Series Forecasting: KNN vs. ARIMA</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/post-scripting-to-deal-with-complex-sql-queries">Post-scripting to Deal with Complex SQL Queries</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/10-node-js-advantages">10 Node JS Advantages</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-scale-out-milvus-vector-similarity-search-engine/">Vector Similarity Search Engine</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/advantages-and-disadvantages-of-python-for-your-business">Advantages And Disadvantages Of Python For Your Business</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/intersystems-iris-the-all-purpose-universal-platform-for-real">All-Purpose Universal Platform for Real-Time AI/ML</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/graduate-programs-in-healthcare-data-science">Question: Graduate Programs in Healthcare Data Science</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/on-the-nature-of-data-flights-of-birds-and-new-beginnings" target="_blank" rel="noopener">On the Nature of Data, Flights of Birds, and New Beginnings</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/data-detectives">Data Detectives</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/introducing-analytics-to-a-product">Introducing Analytics To A Product</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/personal-updates-and-dsc">Personal updates and DSC</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-secret-weapons-of-fake-news">The Secret Weapons of Fake News</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/here-s-how-to-fix-a-haphazard-data-driven-approach-to-education">How to Fix a Haphazard Data-Driven Approach to Education</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/hardware-appliances-vs-software-defined-storage">Hardware Appliances vs. Software Defined Storage</a></li>
<li><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/which-data-protection-techniques-do-you-need-to-guarantee-privacy">Which data protection techniques do you need to guarantee privacy?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-an-apple-is-changing-the-qsr-industry">How An Apple Is Changing the QSR Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-is-augmented-data-preparation-and-why-is-it-important">What is Augmented Data Preparation and Why is it Important?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-is-the-banking-industry-coping-with-the-digital">How is the banking industry coping digital transformation</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/know-your-support-options-for-dynamics-365">ERP: Know Your Support Options for Dynamics 365</a></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/7999338085?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/7999338085?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>