Vincent Granville's Posts - Data Science Central
2021-01-25T09:15:26Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
https://storage.ning.com/topology/rest/1.0/file/get/2800211702?profile=RESIZE_48X48&width=48&height=48&crop=1%3A1
https://www.datasciencecentral.com/profiles/blog/feed?user=3v6n5b6g08kgn&xn_auth=no
Machine Learning / Stats / BI: Mini Translation Dictionary
tag:www.datasciencecentral.com,2021-01-19:6448529:BlogPost:1008950
2021-01-19T06:12:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here I provide translations for various important terms, to help professionals from related backgrounds better understand each other. In particular, machine learning professionals versus statisticians.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8438181275?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/8438181275?profile=RESIZE_710x" width="600"></img></a></p>
<p style="text-align: center;"><em>Source for picture:…</em></p>
<p>Here I provide translations for various important terms, to help professionals from related backgrounds better understand each other. In particular, machine learning professionals versus statisticians.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8438181275?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8438181275?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source for picture: <a href="https://www.datasciencecentral.com/profiles/blogs/machine-learning-vs-statistics-in-one-picture" target="_blank" rel="noopener">here</a></em></p>
<p><strong>Feature</strong> (machine learning)</p>
<p>A feature is known as a variable or independent variable in statistics. It is also known as a predictor by predictive analytics professionals. </p>
<p><strong>Response</strong></p>
<p>The response is called dependent variable in statistics. Machine learning professionals sometimes call it the output. </p>
<p><strong>R-square</strong></p>
<p>This is the statistics used by statisticians to measure the performance of a model. There are many better alternatives. Machine learning professionals sometimes call it goodness-of-fit metric. </p>
<p><strong>Regression</strong></p>
<p>Sometimes called maximum likelihood regression or linear regression by statisticians. Physicists and signal processing / operations research professionals use the term ordinary least squares instead. And yes, it is possible to compute confidence intervals (CI) without underlying models. They are called data-driven, and rely on simulations and empirical percentile distributions. </p>
<p><strong>Logistic transform</strong></p>
<p>The term used in the context of neural networks is sigmoid. Statisticians are more familiar with the word logistic, as in logistic regression.</p>
<p><strong>Neural networks</strong></p>
<p>While not exactly the same thing, statisticians have they own multi-layers hierarchical networks: they are called Bayesian hierarchical networks.</p>
<p><strong>Test of hypothesis</strong></p>
<p>Business intelligence professionals call it A/B testing, or multivariate testing.</p>
<p><strong>Boosted models</strong></p>
<p>Boosted models are used by machine learning professionals to blend multiple models and get the best of each model. Statisticians call them ensemble techniques.</p>
<p><strong>Confidence intervals</strong></p>
<p>We are all familiar with this concept invented by statisticians. Alternative terms include prediction intervals, or error (not to be confused with predictive or residual error, as it has its own meaning for statisticians).</p>
<p><strong>Grouping</strong></p>
<p>Also known as aggregating, and consisting in grouping values of some feature or independent variable, especially in decision trees to reduce the number of nodes. Machine learning professionals call it feature binning. </p>
<p><strong>Taxonomy</strong></p>
<p>When applied to unstructured text data, the creation of a taxonomy (sometimes called ontology) is referred to as natural language processing. It is basically clustering of text data.</p>
<p><strong>Clustering</strong></p>
<p>Statisticians call it clustering. In machine learning, the concept is referred to as unsupervised classification. To the contrary, supervised clustering is a learning technique based on training sets and cross-validation. </p>
<p><strong>Control set</strong></p>
<p>Machine learning professionals use control and test sets. Statisticians use the term cross-validation or bootstrapping, as well as training sets. </p>
<p><strong>Model fitting</strong></p>
<p>The terms favored by machine learning professionals is model selection, testing, and feature selection. Model performance has its own statistical related term: <em>p</em>-value, though it less used recently. </p>
<p><strong>False positives</strong></p>
<p>Instead of false positives and false negatives, statisticians favor type I and type II errors.</p>
<p>Another similar dictionary can be found <a href="https://insights.sei.cmu.edu/sei_blog/2018/11/translating-between-statistics-and-machine-learning.html" target="_blank" rel="noopener">here</a>. </p>
<p></p>
<p><br/> <em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
Deep visualizations to Help Solve Riemann's Conjecture
tag:www.datasciencecentral.com,2021-01-06:6448529:BlogPost:1007807
2021-01-06T06:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>This is the second part of my article <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" rel="noopener" target="_blank">Spectacular Visualization: The Eye of the Riemann Zeta Function</a>, focusing on the most infamous unsolved mathematical conjecture, one that has a $1 million dollar price attached to it. I used the word <em>deep</em> not in the sense of deep neural networks, but because the implications of these…</p>
<p>This is the second part of my article <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">Spectacular Visualization: The Eye of the Riemann Zeta Function</a>, focusing on the most infamous unsolved mathematical conjecture, one that has a $1 million dollar price attached to it. I used the word <em>deep</em> not in the sense of deep neural networks, but because the implications of these visualizations have deep consequences on how to solve this conjecture, opening a new path of attack and featuring non-standard generalizations leading to new perspectives and new approaches so solve RH (as the conjecture is called in mathematical circles). </p>
<p>This work is mostly based on data science, and the results presented here are experimental in nature and still need to be proved formally. The main visualization featuring 6 scatterplots is published here for the first time: it shows the orbits of 3 Riemann-like functions, their <em>eyes</em>, and their surprising ring-shaped error distribution when only the first few hundred terms are used in the series defining these functions. It deviates from classical pure-math approaches in the sense that what I do looks more like stochastic dynamical systems, attractors, wavelets, and should appeal to data analysts, engineers and physicists.</p>
<p>The problem is so popular that there are YouTube videos about it, some having gathered several million of views. One of them is also featured here. My own scatterplots show the behavior of a new class of Riemann-like functions, as well as interesting slices of the orbit that are rarely (if ever) displayed in the literature, revealing peculiar features that could help in solving RH.</p>
<p><span style="font-size: 14pt;"><strong>1. Orbits of Riemann-like Functions</strong></span></p>
<p>The main picture in this article consists of the 6 plots below. Click on the picture to zoom in.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8392563253?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8392563253?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong><em>: Orbit (top) and residual error (bottom) for cosine (left),</em> <em>triangular (middle) and square wave (right)</em></p>
<p>I explain later in this section what they represent. But first, I need to introduce some material. Let </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8392571275?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8392571275?profile=RESIZE_710x" width="350" class="align-center"/></a></p>
<p>be a function of <em>t</em>, with 0.5 < <em>σ</em> < 1 fixed, and <em>α</em>, <em>β</em>, <em>γ</em> three real parameters. This generalizes the function <em>ϕ</em> introduced <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">in my previous article</a>. This time, <em>λ</em>(<em>n</em>) = <em>n</em> and <em>α</em> = 0, <em>β</em> = 1. Also, we are dealing with two sister functions of <em>t</em>, namely <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em>(<em>σ</em>, <em>t</em>; <em>α</em>, <em>β</em>, <em>γ</em>) with<em> γ </em>= 0, and the shifted <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em>(<em>σ</em>, <em>t</em>; <em>α</em>, <em>β</em>, <em>γ</em>) with <em>γ </em>= -π/2. They represent respectively the real and imaginary part of some function defined on the complex plane. The Riemann Hypothesis (RH), corresponding to <em>W</em>(<em>x</em>) = cos <em>x</em>, states that there is no zero of the Riemann zeta function <span><em>ζ</em>(<em>s</em>), with <em>s</em> = <em>σ </em>+ <em>it</em> a complex number, if 0.5 < <em>σ</em> < 1. Here <em>i</em> represents the imaginary unit whose square is -1. In layman's term, it means that we can not have <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) = <em>ϕ<span style="font-size: 8pt;">2</span></em>(<em>σ</em>, <em>t</em>) = 0 if 0.5 < <em>σ</em> < 1. You win $1 million if you prove it, see <a href="https://www.claymath.org/millennium-problems/riemann-hypothesis" target="_blank" rel="noopener">here</a>. </span></p>
<p><span>The novelty in my method is the introduction of a periodic wave function <em>W</em> in the definition of <em>ϕ</em>, thus generalizing RH in a way different from what other mathematicians did, that is, without using complicated <a href="https://en.wikipedia.org/wiki/L-function" target="_blank" rel="noopener">L-functions</a>. </span>This offers more hopes to solve Riemann's conjecture (RH), by first trying to prove it for the easiest <em>W</em>, and understand what those <em>W</em>'s having an RH attached to them (as opposed to those that do not) have in common. </p>
<p>Figure 1 (upper part) displays the spectacular orbits for three different waves (cosine, triangular and alternating-quadratic) in the test case <em>σ</em> = 0.75 and 0 < <em>t</em> < 600, with the hole around the origin (I call it the <em>eye</em>) being the hallmark of RH behavior: that is, no root for that particular value of <em>σ</em>, regardless of <em>t</em>, because of the hole. Though not displayed here, in the case <em>σ</em> = 0.5, the hole is entirely gone and corresponds to the <em>critical line</em> (the name given by mathematicians) where all the zeroes are found.</p>
<p>The orbit consists, for a fixed <em>σ</em>, of the points (<em>X</em>(<em>t</em>),<em>Y</em>(<em>t</em>)) with <em>X</em>(<em>t</em>) = <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) and <em>Y</em>(<em>t</em>) = <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>). The bottom three plots represent the error between the true value (<em>X</em>(<em>t</em>),<em>Y</em>(<em>t</em>)) and its approximation based on using only the first 200 terms in the series that defines <em>ϕ</em>. The error distribution is very surprising; I was expecting the points to be radially but randomly distributed around the origin; instead, they are located on a ring. Note that for <em>t</em> > 600 (and for the triangular wave, for <em>t</em> > 80) you need to use more than 200 terms for the pattern to remain strong.</p>
<p>In Figure 1, the left part of the plot corresponds to the cosine wave (that is, classical RH), the middle part corresponds to the triangular wave, and the right part corresponds to the alternating quadratic wave. Interestingly, when <em>σ</em> = 1/2 the orbit does not have a hole anymore as predicted, yet the error points are still distributed on a similar ring.</p>
<p>The wave <em>W</em> is a continuous periodic function of period 2π, with one minimum equal to −1 and one maximum equal to +1 in the interval [0,2π], and the area below the X-axis equal to the area above the X-axis. It must have some symmetry. The waves used here are defined as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8392809497?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8392809497?profile=RESIZE_710x" width="500" class="align-full"/></a></p>
<p>For the cosine wave, the Taylor series for <em>ϕ</em> is discussed <a href="https://mathoverflow.net/questions/380308/about-the-coefficients-of-taylor-series-for-the-complex-riemann-zeta-function" target="_blank" rel="noopener">here</a>, while the representation as an infinite product is discussed <a href="https://mathoverflow.net/questions/380327/infinite-products-for-linear-combinations-of-sines-or-cosines" target="_blank" rel="noopener">here</a>.</p>
<p><span style="font-size: 14pt;"><strong>2. Other interesting visualizations</strong></span></p>
<p>The orbit for the standard RH case has been published countless time for <em>σ</em> = 0.5. In that case, there is no eye as the orbit crosses the origin infinitely many times. Some videos about the orbit trajectory have been posted on You Tube and viewed millions of times. Below is one of them. </p>
<p></p>
<p><iframe width="640" height="360" src="https://www.youtube.com/embed/zlm1aajH6gY?wmode=opaque" frameborder="0" allowfullscreen=""></iframe>
</p>
<p></p>
<p>Other popular visualizations include the time series for <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) and <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) when <em>σ</em> = 0.5. Below (Figure 2) is a version of mine, for <em>σ</em> = 0.75 and 0 < <em>t</em> < 600. Not only it displays the time series for the cosine wave (standard RH case) but also for the triangular wave, for the first time ever. The blue curve corresponds to <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>), the orange one to <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8392886055?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8392886055?profile=RESIZE_710x" width="600" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong><em>: Time series for ϕ<span style="font-size: 8pt;">1</span>(σ, t) and ϕ<span style="font-size: 8pt;">2</span>(σ, t) when σ = 0.75</em></p>
<p>It is interesting to notes that the peaks and valley floors of the triangular and cosine wave frameworks seem to be correlated, occurring at similar times. What's more, for the cosine wave, when a zero of the blue curve is close to a zero of the orange curve (that is then these curves cross the X-axis at a similar time), the zero of the orange curve occurs first. This seems to be true too for the triangular wave, at least when <em>t</em> < 600.</p>
<p><span style="font-size: 14pt;"><strong>3. Generalization and source code</strong></span></p>
<p><span>The Perl source code is available <a href="https://storage.ning.com/topology/rest/1.0/file/get/8393110255?profile=original" target="_blank" rel="noopener">here</a>. Note that convergence is very slow, as discussed <a href="https://www.datasciencecentral.com/profiles/blogs/spectacular-visualization-the-eye-of-the-riemann-zeta-function" target="_blank" rel="noopener">in my previous article</a>. A table of the first 100,000 zeros of <em>ζ</em>(<em>s</em>) can be found <a href="http://www.dtc.umn.edu/~odlyzko/zeta_tables/index.html" target="_blank" rel="noopener">here</a>. More general results are available <a href="https://mathoverflow.net/questions/380762/some-properties-of-special-dirichlet-series-connection-to-riemann-hypothesis" target="_blank" rel="noopener">here</a>. In short, if 0.5 < <em>σ </em> < 1, the hole around the origin (pictured in Figure 1) is also present in the following case. Let's define </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8409714885?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8409714885?profile=RESIZE_710x" width="380" class="align-center"/></a></span></p>
<p><span>together with <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) = 1 + <em>ϕ</em>(<em>σ</em>, <em>μ</em>, <em>t</em>; <em>α</em>, <em>β</em>, <em>γ</em>) with<em> γ </em>= 0, and <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em>(<em>σ</em>, <em>μ</em>, <em>t</em>; <em>α</em>, <em>β</em>, <em>γ</em>) with <em>γ </em>= -π/2. Then we still have a hole around the origin. That hole persists even if <em>σ</em> = 0.5, unless <em>μ</em> = 0. Here <em>μ</em>, <em>σ</em> are fixed but arbitrary, <em>λ</em>(<em>n</em>) = log <em>n</em>, and <em>α </em>= 0, <em>β </em>= 1; only <em>t</em> varies. It has been tested only for <em>W</em>(<em>x</em>) = cos <em>x</em>, and when 0 < <em>t</em> < 200.</span></p>
<p><strong>Exercise 1</strong></p>
<p>Show (numerically) that the cross-correlation between <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ, t</em>) and <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ, t</em>) is apparently zero, for the cosine wave <em>W</em>(<em>x</em>) = cos <em>x</em>. However, if you shift the orange curve in Figure 2, replacing <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ, t</em>) by <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ, t</em> +<em> τ</em>), the correlation may no longer be zero. Find <span><em>τ</em> (numerically) that maximizes the cross-correlation in question. </span></p>
<p><strong>Exercise 2</strong> </p>
<p>Prove that if <em>ζ</em>(<em>s</em>) = 0, with <em>s</em> = <em>σ</em> + <em>it</em> and 0 < <em>σ</em> < 1 then for all real <em>θ</em>, we have</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8400519688?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8400519688?profile=RESIZE_710x" width="250" class="align-center"/></a></p>
<p>See answer <a href="https://mathoverflow.net/questions/380577/on-some-property-of-the-zeros-of-zetas-in-the-complex-plane/" target="_blank" rel="noopener">here</a>. </p>
<p><strong>Exercise 3</strong></p>
<p>Prove that the centroid of the orbits pictured in Figure 1, is always (<em>W</em>(0), <em>W</em>(<span>-π/2)</span>). This is true for the cosine, triangular and alternate square waves. <strong>Hint</strong>: The integral of <em>W</em>(<em>x</em>) between <em>x</em> = 0 and <em>x</em> = 2<span>π (the period) is always zero. The coordinates of the centroid are </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8409760490?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8409760490?profile=RESIZE_710x" width="450" class="align-center"/></a></span></p>
<p>Since <em>ϕ</em><span style="font-size: 8pt;">1</span>, <em>ϕ</em><span><span style="font-size: 8pt;">2</span> are defined as infinite sums, swap the integral and sum operators, then proceed to the computation. The integral vanishes for all the terms in both series, except for the first one where it is equal to <em>W</em>(0) and <em>W</em>(-π/2), respectively for <em>ϕ</em><span style="font-size: 8pt;">1</span> and <em>ϕ<span style="font-size: 8pt;">2</span></em>.</span></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
Spectacular Visualization: The Eye of the Riemann Zeta Function
tag:www.datasciencecentral.com,2021-01-02:6448529:BlogPost:1006966
2021-01-02T20:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>We discuss here one of the most famous unsolved mathematical conjectures of all times, one among seven that has a $1 million award attached to it, see <a href="https://en.wikipedia.org/wiki/Millennium_Prize_Problems" rel="noopener" target="_blank">here</a>. It is known as the <a href="https://en.wikipedia.org/wiki/Riemann_hypothesis" rel="noopener" target="_blank">Riemann Hypothesis</a> and abbreviated as RH. Of course I did not solve it (yet), but the material presented here offers a new…</p>
<p>We discuss here one of the most famous unsolved mathematical conjectures of all times, one among seven that has a $1 million award attached to it, see <a href="https://en.wikipedia.org/wiki/Millennium_Prize_Problems" target="_blank" rel="noopener">here</a>. It is known as the <a href="https://en.wikipedia.org/wiki/Riemann_hypothesis" target="_blank" rel="noopener">Riemann Hypothesis</a> and abbreviated as RH. Of course I did not solve it (yet), but the material presented here offers a new path towards making significant progress. As usual, I wrote this article in such a way as to make it understandable by a large audience. You don't need to know more than relatively simple calculus to read it, and you don't even need to know anything about <a href="https://en.wikipedia.org/wiki/Complex_analysis" target="_blank" rel="noopener"></a><a href="https://en.wikipedia.org/wiki/Complex_analysis" target="_blank" rel="noopener">complex analysis</a>: I did the heavy lifting for you.</p>
<p>This is a typical illustration of experimental math blended with data science techniques, resulting in visualizations that provide great actionable insights. It is my hope that after reading this article, you will be tempted to further explore RH, create even better visualizations about it, and find new insights. The techniques used here apply to many other problems, including serious business analytics. </p>
<p><span style="font-size: 14pt;"><strong>1. The problem </strong></span></p>
<p>The Riemann hypothesis, dating back to 1859, states that the zeta function <em>ζ</em>(<em>s</em>), with <em>s</em> = <span><em>σ</em> </span>+ <em>it</em> a complex number (the letter <em>i</em> denoting the imaginary complex unit), has no zero in the critical strip 0 < <em>σ</em> < 1. If proved, it would have a profound impact not just in number theory, but in many other areas of mathematics and beyond. In layman's terms, it can be re-formulated as follows. </p>
<p>Let us introduce a parametric family of real-valued functions, defined as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8375731288?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8375731288?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>with 0 < <em>σ</em> < 1, <em>t</em> a real number, <em>α</em>, <em>β</em>, <em>γ</em> three real parameters, and <em>λ</em>(⋅) a real-valued function with logarithmic growth. Elementary computations show that <em>s</em> = <em>σ</em> + <em>it</em> is a complex root (also called <em>zero</em>) of <em>ζ</em>(<em>s</em>), with 0 < <em>σ</em> < 1, if and only if</p>
<ul>
<li><em>ϕ</em>(<em>σ</em>, <em>t</em>; 0, 1, 0) = 0,</li>
<li><em>ϕ</em>(<em>σ</em>, <em>t</em>; 0, 1, −π/2) = 0,</li>
<li><em>λ</em>(<em>n</em>) = log(n).</li>
</ul>
<p>For details about this formulation, see <a href="https://mathoverflow.net/questions/379650/more-mysteries-about-the-zeros-of-the-riemann-zeta-function" target="_blank" rel="noopener">here</a>. Moving forward, we will focus on RH as being a problem of finding the zeroes (or lack of) of a bivariate function in the standard plane: <em>σ</em> is the first variable, attached to the X-axis, and <em>t</em> is the second variable, attached o the Y-axis. A generalized version of RH seems to also be true: it corresponds to arbitrary values for <em>α</em>, <em>β</em>, <em>γ</em>. However we focus here on the classical RH. For ease of presentation, we use the following notation:</p>
<ul>
<li><em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em>(<em>σ</em>, <em>t</em>; 0, 1, 0)</li>
<li><em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em>(<em>σ</em>, <em>t</em>; 0, 1,−π/2 )</li>
</ul>
<p>Much of the discussion has to do with the orbit of (<em>ϕ</em><span><span style="font-size: 8pt;">1</span>, <em>ϕ</em><span style="font-size: 8pt;">2</span></span>) when <em>σ</em> is fixed but arbitrary, and only <em>t</em> is allowed to vary. The orbit consists of all the points (<em>X</em>(<em>t</em>), <em>Y</em>(<em>t</em>)) with <em>X</em>(<em>t</em>) = <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) and <em>Y</em>(<em>t</em>) = <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>). In short, we are dealing with a bivariate time series in continuous time, with strong cross-correlations between <em>X</em>(<em>t</em>) and <em>Y</em>(<em>t</em>). Without loss of generality, we assume that <em>t</em> is positive. The spectacular plot shown in section 2 is just a scatterplot of the orbit, computed for <em>σ</em> = 0.75<em>.</em> It easily generalizes to other values of <em>σ</em> that are strictly greater than 0.5. </p>
<p><span style="font-size: 14pt;"><strong>2. The visualization</strong></span></p>
<p>I call the plot below the <em>Eye of the Zeta Function</em>. It is the scatter plot described in the last paragraph in section 1, and probably the first time that such a plot was created for the Riemann zeta function. It corresponds to <em>σ </em>= 0.75, with <em>t</em> between 0 and 3,000, with <em>t</em> increments equal to 0.01. Thus 300,000 points of the orbit are displayed here. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8375847301?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8375847301?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>The spectacular feature in that plot is the hole around (0, 0). It has deep implications. It suggests that if <em>σ</em> = 0.75, not only <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) and <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) can not be simultaneously equal to zero (this is a particular case of RH, nothing new here), but most importantly, that it never jointly gets very close to zero. This is new and suggests that proving RH might be a little less challenging than initially thought. The same plot features a similar "eye" if you try various values of <em>σ</em>. In particular, the hole gets smaller and smaller as <em>σ</em> gets closer to 0.5. At <em>σ</em> = 0.5, the hole is entirely gone, and infinitely many values of <em>t</em> yield <em>ϕ</em><span style="font-size: 8pt;">1</span>(<em>σ</em>, <em>t</em>) = <em>ϕ</em><span style="font-size: 8pt;">2</span>(<em>σ</em>, <em>t</em>) = 0. The same is true for a generalized version of RH discussed in section 1. </p>
<p>Note that it is very tricky to get the scatterplot right. The series for <em>ϕ</em><span><span style="font-size: 8pt;">1</span> and <em>ϕ</em><span style="font-size: 8pt;">2</span> converge very slowly, and in chaotic, unpredictable way, </span>see <a href="https://mathoverflow.net/questions/379650/more-mysteries-about-the-zeros-of-the-riemann-zeta-function/380174#380174" target="_blank" rel="noopener">here</a>. This can result in false positives: points very close to zero due to approximation errors, artificially obfuscating the hole. Convergence boosting techniques are required, see <a href="https://www.datasciencecentral.com/profiles/blogs/simple-trick-to-dramatically-improve-speed-of-convergence" target="_blank" rel="noopener">here</a>. In addition, the frequency of oscillations in <em>ϕ</em><span><span style="font-size: 8pt;">1</span> and <em>ϕ</em><span style="font-size: 8pt;">2</span> increases more and more as <em>t</em> gets larger, and thus <em>t</em> increments should be made smaller and smaller accordingly, as <em>t</em> grows, in order to get a good coverage of the orbit and not miss potential true zeroes.</span></p>
<p>More plots can be found <a href="https://mathoverflow.net/questions/379650/more-mysteries-about-the-zeros-of-the-riemann-zeta-function" target="_blank" rel="noopener">here</a>. One (unpublished yet) is even more spectacular, though esthetically speaking, it looks just like a boring ring. I computed the approximation error (<em>E</em><span style="font-size: 8pt;">1</span>(<span style="font-size: 12pt;"><em>t</em></span>), <em>E</em><span style="font-size: 8pt;">2</span>(<span style="font-size: 12pt;"><em>t</em></span>)) when you use only the first 200 terms in the series defining <em>ϕ</em><span><span style="font-size: 8pt;">1</span> and <em>ϕ</em><span style="font-size: 8pt;">2</span>. If <span style="font-size: 12pt;"><em>t</em></span> < 300, these points are located on a very thin ring very close to 0. Their distribution thus has a strong pattern, making it possibly even less challenging to prove that if <em>σ</em> = 0.75, then the Riemann Zeta function has no zero with <em>t</em> in [0, 300]. The pattern quickly disappears if <em>t</em> is larger, but you can still retrieve it by increasing the number of terms that you use in your approximation, allowing you to identify an even bigger zero-free zone in the critical strip. Proving it is zero-free even narrowed down to these zones, would still remain a big challenge though. </span></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
Opening a New Restaurant in Covid Times
tag:www.datasciencecentral.com,2020-12-23:6448529:BlogPost:1005865
2020-12-23T06:44:07.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>I am a data scientist, and decided to open a restaurant last November, 10 days before the governor in my state banned dining-in (who knows for how long) and customers were already rare. Some data scientists in managerial positions dream about exiting the corporate world and envied me, at least before the Covid, when I told them my plan.</p>
<p>Here I explore the options and opportunities available, and this article reflects my optimism. I will also discuss analytics in some detail. The…</p>
<p>I am a data scientist, and decided to open a restaurant last November, 10 days before the governor in my state banned dining-in (who knows for how long) and customers were already rare. Some data scientists in managerial positions dream about exiting the corporate world and envied me, at least before the Covid, when I told them my plan.</p>
<p>Here I explore the options and opportunities available, and this article reflects my optimism. I will also discuss analytics in some detail. The reasons for opening a restaurant are varied, and in my case I saw the opportunity in a wealthy town with many foodies, mostly retired from companies such as Amazon, Boeing or Microsoft, who left the Seattle area to live on a little island where the pace of living is much slower, roads are not clogged with commuters, and the landscape is beautiful: Anacortes in Fidalgo Island, next to the San Juan islands, in the Pacific Northwest. Despite being next to the ocean, not a single restaurant offers fresh oysters or crab, there is no great restaurant, and if anything (after selling my company) I thought I would open a restaurant at least so that there is a dining venue I really love in Anacortes. I knew from the very beginning that we would fill a void, and that there was no competition.</p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8322989256?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8322989256?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><em>Our outdoor seating</em></p>
<p>After chasing locations throughout the Puget Sound without any luck, I found by chance the perfect spot in the very heart of historical downtown Anacortes. The landlord did not want a franchise, a chain, and even turned down a bank. Rent here is 3 times cheaper than in Seattle, and hourly rates for restaurant workers are much lower too, though it is impossible to find qualified people to serve fine cuisine (you must train them). We were lucky to find a great chef who worked in great restaurants in Seattle and left the city years ago for the same reasons that I did. We are also very close to farmers, and all our food comes from local farmers. Not exactly cheap, but people are willing to pay a bit more for fresh local ingredients - this is a long-lasting trend in this industry.</p>
<p>We agreed on a few statistics: food cost should be 1/3 of revenue, staff another 1/3, and 15-20% of the revenue going towards rent, utilities, insurances, etc. Now with Covid, we are operating at a controlled loss probably for the next three months, but we are on the path to success. Rather than closing for three months like plenty of restaurants do, we though we should take advantage of this to develop our brand and becoming known -- and stay open despite the extra cost. We also decided to stop expensive construction on the second floor, and instead focus on heated outdoor dining and cheaper solutions that have a direct positive impact. At the end of the construction stage, we even looked at purchasing used appliances, rather than brand new.</p>
<p>Despite having no experience in the restaurant industry, I am a foodie with tremendous experience as a customer. In particular, I told what the prices should be, given the town we are in and the kind of food we serve. The Chef focused on dishes where he could meet the goal of 1/3 of revenue spent on food (that is, a dish sold for $18 costs $6 in ingredients on average), with waste optimization also being a goal (for instance, unsold fresh oysters served as baked oysters the next day). I even purchased some ingredients myself such as excellent Islandic caviar 10 times cheaper than Beluga. People coming from the big city 90 miles South consider our restaurant as inexpensive, and capable of successfully competing with hip restaurants in Seattle if we were located in that town.</p>
<p><strong>Original ideas to succeed</strong></p>
<p>Here are some concepts that we embraced:</p>
<ul>
<li>Having a little retail store within the restaurant, selling home-made preparations made by the chef, and wines</li>
<li>Opening a wine club with paid membership</li>
<li>Using the second floor for storage, for the retail store, rather than for dine-in</li>
<li>Opening the patio in the back, the heated tent on the front street, and some other space outside to maximize occupancy</li>
<li>Discontinuing breakfast except weekends, due to negative ROI</li>
<li>Creating our home deliver service to be more affordable than Doordash</li>
<li>Organizing our menu items in such a way as to optimize revenue (by displaying best sellers at the top, revenue increased 5 times on Doordash)</li>
<li>Being the only European restaurant in the county</li>
<li>Using pictures of our dishes when posting on social networks, as well as on our website</li>
<li>Offering family meals to go, serving 2 or 4 people</li>
<li>Partnering with grocery stores to sell our products</li>
<li>Having weekly specials that we can announce in social networks and via our fast-growing mailing list, to keep customers returning</li>
<li>Serving the right size, that is less than the average restaurant, along with small dishes, in plates that are not as large as in many restaurants (this reduces waste and we can lower our prices accordingly)</li>
</ul>
<p><strong>Marketing and advertising</strong></p>
<p>We are present and very active on all local Facebook groups, including <a href="https://www.facebook.com/parisrestaurantandbar/" target="_blank" rel="noopener">our Facebook page</a> and the <a href="https://www.facebook.com/groups/424272282275831/" target="_blank" rel="noopener">Skagit Restaurant page</a> that we created for all restaurants in our county. Since our menu has new additions every week, we can post original content all the time. Many people in town use Facebook, thus this is our favorite platform. We also advertise with them. </p>
<p>We created our newsletter, growing to 500 subscribers in a month. Much of our advertising on Google is geared towards growing the newsletter. We are working on a blog (the first article will be <em>10 tips to help your favorite restaurant</em> applicable to any restaurant, we hope it will go viral) and in the long term, we plan on selling recipes from our chef on the website. Finally, as we grow, we plan on using the outdoor tent from our restaurant neighbors, when they are closed. We may even serve Tequila from our neighbor (Mexican restaurant) with revenue on hard liquor going directly to them, if we use their tent. </p>
<p>Advertising on Yelp was a failure, and we noticed and stopped it. Yelp clearly does not help its advertisers regarding reviews (a good thing) but it eliminates reviews randomly, good or bad, with their supposedly smart machine learning algorithm. Maybe to force us to advertise more? Phone calls coming from Yelp advertising were rarely a local number (unlike calls originating from Google ads), and lasted 2 seconds. Not different from click fraud. We are happy that Yelp represents less than 2% of our traffic, as we tried very hard to build our audience organically and via word of mouth thanks to the excellent and original food that we serve. </p>
<p>We also invited our partners (local farmers, accountant etc.) for a free diner during the short window of time when dining-in was allowed. The meal was free, but not the wine. We also plan on having our brochure distributed in all the local hotels, and maybe advertise our restaurant on all the receipts people get when they go shopping to a grocery store. </p>
<p><strong>The results</strong></p>
<p>The last few days have seen revenue growing fast to the point that we will probably operate at a loss for much less than 3 months, beating the expectations. And before Thanksgiving when dining-in was allowed, it was clear that we would be successful, being almost profitable while operating at 25% capacity.</p>
<p>You can find us at <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">ParisRestaurantAndBar.com</a>. </p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8322990662?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8322990662?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
Amazing Things You Did Not Know You Could Do in Excel
tag:www.datasciencecentral.com,2020-12-17:6448529:BlogPost:1005404
2020-12-17T05:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>I have included a lot of Excel spreadsheets in the numerous articles and books that I have written in the last 10 years, based either on real life problems or simulations to test algorithms, and featuring various machine learning techniques. It is time to create a new blog series focusing on these useful techniques that can easily be handled with Excel. Data scientists typically use programming languages and other visual tools for these techniques, mostly because they are unaware that it can…</p>
<p>I have included a lot of Excel spreadsheets in the numerous articles and books that I have written in the last 10 years, based either on real life problems or simulations to test algorithms, and featuring various machine learning techniques. It is time to create a new blog series focusing on these useful techniques that can easily be handled with Excel. Data scientists typically use programming languages and other visual tools for these techniques, mostly because they are unaware that it can be accomplished with Excel alone. This article is my first one in this new series. The series will appeal to BI analysts, managers presenting insights to decision makers, as well as software engineers or MBA people who do not have a strong data science background. It can also be used as a starting point to learn data science and machine learning, by first solving problems in Excel, before discovering Excel's limitations and then move to programming languages or AI-based automated coding. </p>
<p>Many of the techniques presented in my spreadsheets are data-driven (as opposed to model-driven), robust, simple yet efficient, sometimes entirely novel, and do not lead to problems such as over-fitting or numerical instability. Even in the absence of statistical models, confidence intervals can still be built - even in Excel - and are more intuitive and easy to understand than traditional ones. See my previous article <a href="https://www.datasciencecentral.com/profiles/blogs/introducing-an-all-purpose-robust-fast-simple-non-linear-r22" target="_blank" rel="noopener">here</a> on general regression, as an example. That article also features traditional regression performed with the not well-known Excel built-in function LINEREST; with a simple transformation, it could be used for logistic regression. Also, my spreadsheets are just basic Excel, without using special Excel libraries or add-ins, and are thus accessible to everyone. </p>
<p>In this first blog, I show you how to simulate clustered data and display it with multi-groups scatterplots, things that I used to do with R in the past.</p>
<p><strong>Excel scatterplots in clustering contexts</strong></p>
<p>The pictures below represents a simulation of clustered data: 177 two-dimensional data points spread across three clusters.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8296711259?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8296711259?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1:</strong> <em>Well separated clusters</em></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8296711493?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8296711493?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2:</strong> <em>Overlapping clusters</em></p>
<p>The spreadsheet used to produce these charts is interactive, and you can play with it to generate more clusters, fine-tune the level of overlapping, and to test various clustering algorithms on the simulated data that you create, using cross-validation techniques, to see how they perform. The points, within each of the three groups, are radially distributed around a center. That is, a random point (<em>X</em>, <em>Y</em>) in group #1, assuming the center of that group - also randomly distributed - is (<em>X</em><span style="font-size: 8pt;">1</span>, <em>Y</em><span style="font-size: 8pt;">1</span>) is generated as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8296725669?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8296725669?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>Here, different random deviates <span><em>ρ</em>, <em>θ</em> uniformly distributed on [0, 1] are used for each (<em>X</em>, <em>Y</em>) using the function RAND in Excel, and the constant <em>α</em><span style="font-size: 8pt;">1</span> is fixed for all points in group #1. In the spreadsheet, the three centers are uniformly distributed on [0, 1] x [0, 1], and <em>α</em><span style="font-size: 8pt;">1</span>, <em>α</em><span style="font-size: 8pt;">2</span>, <em>α</em><span style="font-size: 8pt;">3</span> are set to 1/3. </span></p>
<p><span>The scatterplots are produced using the scatter graph in Excel, applied to data separated in three groups as illustrated in the screenshot below. For group #1, point coordinates (<em>X</em>, <em>Y</em>) are stored in the first and second column respectively. For group #2, it's in the first and third column, and for group #3, it is in the first and fourth column as illustrated below.</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8296773488?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8296773488?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3:</strong> <em>Organizing the data in Excel to produce the scatterplots</em></p>
<p>The spreadsheet is available for download, <a href="https://storage.ning.com/topology/rest/1.0/file/get/8296781485?profile=original" target="_blank" rel="noopener">here</a> (<strong>scatter-cluster.xlsx</strong>). See also one of my previous spreadsheets to automatically detect the number of clusters, from one of my past articles, <a href="https://www.datasciencecentral.com/profiles/blogs/how-to-automatically-determine-the-number-of-clusters-in-your-dat" target="_blank" rel="noopener">here</a> (<strong>elbow.xlsx</strong>, in the the section <em>Elbow Strength with spreadsheet illustration</em>). Finally, many spreadsheets are available for download, from my most recent book <em>Statistics: new foundations, toolkit, and machine learning recipes</em>, <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">here</a>. Some of them even perform NLP algorithms.</p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
All-purpose, Robust, Fast, Simple Non-linear Regression
tag:www.datasciencecentral.com,2020-12-16:6448529:BlogPost:1005166
2020-12-16T18:22:17.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><strong>Announcements</strong></p>
<ul>
<li>Watch APEXX W3: <strong>The Data Science Workstation</strong>, and learn how an NVIDIA-certified BOXX workstation can accelerate your workflow. <a href="https://bit.ly/33Suwni" rel="noopener" target="_blank">Access video here</a>. </li>
<li>Use real-time anomaly detection reference patterns to combat fraud | Google. <a href="http://dsc.news/3gShgUZ" rel="noopener" target="_blank">Read full article</a>.</li>
<li>Merrimack College offers three online…</li>
</ul>
<p><strong>Announcements</strong></p>
<ul>
<li>Watch APEXX W3: <strong>The Data Science Workstation</strong>, and learn how an NVIDIA-certified BOXX workstation can accelerate your workflow. <a href="https://bit.ly/33Suwni" target="_blank" rel="noopener">Access video here</a>. </li>
<li>Use real-time anomaly detection reference patterns to combat fraud | Google. <a href="http://dsc.news/3gShgUZ" target="_blank" rel="noopener">Read full article</a>.</li>
<li>Merrimack College offers three online master's degrees in data science, business analytics, or healthcare analytics – all designed to accommodate working professionals and developed and taught by industry experts. Gain a deeper understanding of data visualization, statistical analysis, machine learning, and business strategy to deliver data-driven insights that impact real-world decisions. <a href="http://dsc.news/34kc07x" target="_blank" rel="noopener">Learn more here</a>. </li>
</ul>
<p><strong>All-purpose, Robust, Fast, Simple Non-linear Regression</strong></p>
<p><span>The model-free, data-driven technique discussed here is so basic that it can easily be implemented in Excel, and we actually provide an Excel implementation. It is surprising that this technique does not pre-date standard linear regression, and is rarely if ever used by statisticians and data scientists. It is related to kriging and nearest neighbor interpolation, and apparently first mentioned in 1965 by Harvard scientists working on GIS (geographic information systems). It was referred back then as Shepard's method or inverse distance weighting, and used for multivariate interpolation on non-regular grids</span><span>. We call this technique </span><em>simple regression</em><span>. Read full article <a href="https://www.datasciencecentral.com/profiles/blogs/introducing-an-all-purpose-robust-fast-simple-non-linear-r22" target="_blank" rel="noopener">here</a>. </span></p>
<p></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/8295194880?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8295194880?profile=RESIZE_710x" width="400" class="align-center"/></a></span></p>
<p><span style="font-size: 8pt;">This email, and all related content, is published by Data Science Central, a division of <a href="https://www.techtarget.com/" target="_blank" rel="noopener noreferrer">TechTarget, Inc</a>.<br/><span class="_2HwZTce1zKwQJyzgqXpmAy">275 Grove Street, Newton, Massachusetts, 02466</span> US</span></p>
<p><span style="font-size: 8pt;">You are receiving this email because you are a member of TechTarget. When you access content from this email, your information may be shared with the sponsors or future sponsors of that content and with our Partners, see up-to-date <a href="https://www.techtarget.com/privacy-partners" target="_blank" rel="noopener noreferrer">Partners List</a> below, as described in our <a href="https://www.techtarget.com/privacy-policy" target="_blank" rel="noopener noreferrer">Privacy Policy</a>. For additional assistance, please contact: <a href="mailto:webmaster@techtarget.com" target="_blank" rel="noopener noreferrer"></a></span><span style="font-size: 8pt;"><a href="mailto:webmaster@techtarget.com" target="_blank" rel="noopener noreferrer">webmaster@techtarget.com</a></span></p>
<p><span style="font-size: 8pt;">© 2020 TechTarget, Inc. all rights reserved. Designated trademarks, brands, logos and service marks are the property of their respective owners.<br/><a href="https://www.techtarget.com/privacy-policy" target="_blank" rel="noopener noreferrer">Privacy Policy</a> | <a href="https://www.techtarget.com/privacy-partners" target="_blank" rel="noopener noreferrer">Partners List</a></span></p>
New Tests of Randomness and Independence for Sequences of Observations
tag:www.datasciencecentral.com,2020-12-03:6448529:BlogPost:1004429
2020-12-03T01:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>There is no statistical test that assesses whether a sequence of observations, time series, or residuals in a regression model, exhibits independence or not. Typically, what data scientists do is to look at auto-correlations and see whether they are close enough to zero. If the data follows a Gaussian distribution, then absence of auto-correlations implies independence. Here however, we are dealing with non-Gaussian observations. The setting is similar to testing whether a pseudo-random…</p>
<p>There is no statistical test that assesses whether a sequence of observations, time series, or residuals in a regression model, exhibits independence or not. Typically, what data scientists do is to look at auto-correlations and see whether they are close enough to zero. If the data follows a Gaussian distribution, then absence of auto-correlations implies independence. Here however, we are dealing with non-Gaussian observations. The setting is similar to testing whether a pseudo-random number generator is random enough, or whether the digits of a number such as <span>π </span>behave in a way that looks random, even though the sequence of digits is deterministic. Batteries of statistical tests are available to address this problem, but there is no one-fit-all solution.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242402469?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242402469?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>Here we propose a new approach. Likewise, it is not a panacea, but rather a set of additional powerful tools to help test for independence and randomness. The data sets under consideration are specific mathematical sequences, some of which are known to exhibit independence / randomness or not. Thus, it constitutes a good setting to benchmark and compare various statistical tests and see how well they perform. This kind of data is also more natural and looks more real than synthetic data obtained via simulations. </p>
<p><span style="font-size: 14pt;"><strong>1. Definition of random-like sequences</strong></span></p>
<p>Since we are dealing with deterministic sequences (<em>x<span style="font-size: 8pt;">n</span></em>) indexed by <em>n</em> = 1, 2, and so on, it is worth defining what we mean by <em>independence</em> and <em>random-like</em>. These two elementary concepts are very intuitive, but a formal definition may help. You may skip this section if you have an intuitive understanding of the concepts in question, as the layman does. Independence in this context is sometimes called <em>asymptotic independence</em>, see <a href="https://mathoverflow.net/questions/372103/recursive-random-number-generator-based-on-irrational-numbers/" target="_blank" rel="noopener">here</a>. Also, for all the sequences investigated here, <em>x<span style="font-size: 8pt;">n</span></em> ∈ [0,1].</p>
<p><strong>1.1. Definition of random-like and independence</strong></p>
<p>A sequence (<em>x<span style="font-size: 8pt;">n</span></em>) with <em>x<span style="font-size: 8pt;">n</span></em> ∈ [0,1] is <em>random-like</em> if it satisfies the following property. For any finite index family <em>h</em><span style="font-size: 8pt;">1</span>,…, <em>h<span style="font-size: 8pt;">k</span></em> and for any <span style="font-size: 12pt;"><em>t<span style="font-size: 8pt;">1</span></em></span>,…, <em>t<span style="font-size: 8pt;">k</span></em> ∈ [0,1], we have </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8238499286?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8238499286?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>The probabilities are empirical probabilities, that is, based on frequency counts. For instance,</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8238501465?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8238501465?profile=RESIZE_710x" width="450" class="align-center"/></a></p>
<p>where χ(<em>A</em>) is the indicator function (equal to 1 if the event <em>A</em> is true, and equal to 0 otherwise). Random-like implies independence, but the converse is not true. A sequence is <em>independently distributed</em> if it satisfies the weaker property </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8238506260?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8238506260?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>Random-like means that the <em>x<span style="font-size: 8pt;">n</span></em>'s all have the same underlying uniform distribution on [0, 1], and are independently distributed. </p>
<p><strong>1.2. Definition of lag-<em>k</em> autocorrelation</strong></p>
<p>Again, this is just the standard definition of auto-correlations, but applied to infinite deterministic sequences. The lag-<em>k</em> auto-correlation ρ<span style="font-size: 8pt;"><em>k</em></span> is defined as follows. First define ρ<span style="font-size: 8pt;"><em>k</em></span>(<em>n</em>) as the empirical correlation between (<em>x</em><span style="font-size: 8pt;">1</span>,…, <em>x<span style="font-size: 8pt;">n</span></em>) and (<em>x<span style="font-size: 8pt;">k</span></em><span style="font-size: 8pt;">+1</span>,… ,<em>x<span style="font-size: 8pt;">k</span></em><span style="font-size: 8pt;">+<em>n</em></span>). Then ρ<span style="font-size: 8pt;"><em>k</em></span> is the limit (if it exists) of ρ<span style="font-size: 8pt;"><em>k</em></span>(<span style="font-size: 12pt;"><em>n</em></span>) as <em>n</em> tends to infinity. </p>
<p><strong>1.3. Equidistribution and fractional part denoted as { }</strong></p>
<p>The fractional part of a positive real number <em>x</em> is denoted as { <em>x</em> }. For instance, { 3.141592 } = 0.141592. The sequences investigated here come from number theory. In that context, concepts such as random-like and identically distributed are rarely used. Instead, mathematicians rely on the weaker concept of <em>equidistribution</em>, also called equidistribution modulo 1. Closer to independence is the concept of equidistribution in higher dimensions, for instance if two successive values (<em>x<span style="font-size: 8pt;">n</span></em>, <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span>) are equidistributed on [0, 1] x [0, 1].</p>
<p>A sequence can be equidistributed yet exhibits strong auto-correlations. The most famous example is the sequence <em>x<span style="font-size: 8pt;">n</span></em> = { <em>αn</em> } where <em>α</em> is a positive irrational number. While equidistributed, it has strong lag-<em>k</em> auto-correlations for every strictly positive integer <em>k</em>, and it is anything but random-like. A sequence that looks perfectly random-like is the digits of <span>π</span>: they can not be distinguished from a realization of a perfect <a href="https://en.wikipedia.org/wiki/Bernoulli_process" target="_blank" rel="noopener">Bernouilli process</a>. Such random-like sequences are very useful in cryptographic applications.</p>
<p><span style="font-size: 14pt;"><strong>2. Testing well-known sequences</strong></span> </p>
<p>The sequences we are interested in are <em>x<span style="font-size: 8pt;">n</span></em> = { <em>α n</em>^<em>p</em> }<b> </b> where { } is the fractional part function (see section 1.3), <em>p</em> > 1 is a real number and <em>α</em> is a positive irrational number. Other sequences are discussed in section 3. It is well known that these sequences are equidistributed. Also, if <em>p</em> = 1, these sequences are highly auto-correlated and thus the terms <em>x<span style="font-size: 8pt;">n</span></em>'s are not independently distributed, much less random-like; the exact theoretical lag-<em>k</em> auto-correlations are known. The question here is what happens if <em>p</em> > 1. It seems that in that case, there is much more randomness. In this section, we explore three statistical tests (including a new one) to assess how random these sequences can be depending on the parameters <em>p</em> and <em>α</em>. The theoretical answer to that question is known, thus this provides a good case study to check how various statistical tests perform to detect randomness, or lack of it.</p>
<p><strong>2.1. The gap test</strong></p>
<p>The gap test (some people may call it run test) proceeds as follows. Let us define the binary digit <em>d<span style="font-size: 8pt;">n</span></em> as <em>d<span style="font-size: 8pt;">n</span></em> = ⌊2<em>x<span style="font-size: 8pt;">n</span></em>⌋. The brackets represent the integer part function. Say <em>d<span style="font-size: 8pt;">n</span></em> = 0 and <em>d<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1 </span>= 1 for a specific n. If <em>d<span style="font-size: 8pt;">n</span></em> is followed by <em>G</em> successive digits <em>d<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span>,…, <em>d<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>G</em></span> all equal to 1 and then <em>d<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>G</em>+1</span> = 0, we have one instance of a gap of length <em>G</em>. Compute the empirical distribution of these gaps. Assuming 50% of the digits are 0 (this is the case in all our examples), then the empirical gap distribution converges to a geometric distribution of parameter 1/2 if the sequence <em>x<span style="font-size: 8pt;">n</span></em> is random-like.</p>
<p>This is best illustrated in chapter 4 of my book <em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems, </em>available <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" target="_blank" rel="noopener">here</a>. </p>
<p><strong>2.2. The collinearity test</strong></p>
<p>Many sequences pass several tests yet fail the collinearity test. This test checks whether there are <em>k</em> constants <em>a</em><span style="font-size: 8pt;">1</span>, ..., <em>a<span style="font-size: 8pt;">k</span></em> with <em>a<span style="font-size: 8pt;">k</span></em> not equal to zero, such that <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k</em></span> = <em>a</em><span style="font-size: 8pt;">1</span> <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k-1</em></span> + <em>a</em><span style="font-size: 8pt;">2</span> <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k</em>-2</span> + ... + <em>a<span style="font-size: 8pt;">k</span></em> <em>x<span style="font-size: 8pt;">n</span></em> takes only on a finite (usually small) number of values. In short, it addresses this question: are <em>k</em> successive values of the sequence <em>x<span style="font-size: 8pt;">n</span></em> always lie (exactly, approximately, or asymptotically) in a finite number of hyperplanes of dimension <em>k</em> - 1? This test has been used to determine that some congruential pseudo-random number generators were of very poor quality, see <a href="https://en.wikipedia.org/wiki/RANDU" target="_blank" rel="noopener">here</a>. It is illustrated in section 3, with <em>k</em> = 2. </p>
<p>Source code and examples for <em>k</em> = 3 can be found <a href="https://mathoverflow.net/questions/372103/recursive-random-number-generator-based-on-irrational-numbers/" target="_blank" rel="noopener">here</a>. </p>
<p><strong>2.3. The independence test</strong></p>
<p>This may be a new test: I could not find any reference to it in the literature. It does not test for full independence, but rather for random-like behavior in small dimensions (<em>k</em> = 2, 3, 4). Beyond <em>k</em> = 4, it becomes somewhat unpractical as it requires a number of observations (that is, the number of computed terms in the sequence) growing exponentially fast with <em>k</em>. However, it is a very intuitive test. It proceeds as follows, for a fixed <em>k</em>:</p>
<ul>
<li>Let <em>N </em> > 100 be an integer</li>
<li>Let <em>T</em> be a <em>k</em>-uple (<em>t</em><span style="font-size: 8pt;">1</span>,..., <em>t<span style="font-size: 8pt;">k</span></em>) with <i>t<span style="font-size: 8pt;">j</span></i><span style="font-size: 8pt;"> </span>∈ [0,1] for <em>j</em> = 1, ..., <em>k.</em></li>
<li>Compute the following two quantities, with χ being the indicator function as in section 1.2:</li>
</ul>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242040856?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242040856?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<ul>
<li>Repeat this computation for <em>M</em> different <em>k</em>-uples randomly selected in the <em>k</em>-dimensional unit hypercube</li>
</ul>
<p>Now plot the <em>M</em> vectors (<em>P<span style="font-size: 8pt;">T</span>, Q<span style="font-size: 8pt;">T</span></em>), each corresponding to a different <em>k</em>-uple, on a scatterplot. Unless the <em>M</em> points lie very close to the main diagonal on the scatterplot, the sequence <em>x<span style="font-size: 8pt;">n</span></em> is not random-like. To see how far away you can be from the main diagonal without violating the random-like assumption, do the same computations for 10 different sequences consisting this time of truly random terms. This will give you a confidence band around the main diagonal, and vectors (<em>P<span style="font-size: 8pt;">T</span>, Q<span style="font-size: 8pt;">T</span></em>) lying outside that band, for the original sequence you are interested in, suggests areas where the randomness assumption is violated. This is illustrated in the picture below, originally posted <a href="https://mathoverflow.net/questions/372103/recursive-random-number-generator-based-on-irrational-numbers/" target="_blank" rel="noopener">here</a>: </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242055058?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242055058?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong></p>
<p>As you can see, there is a strong enough departure from the main diagonal, and the sequence in question (see same reference) is known not to be random-like. The X-axis features <em>P<span style="font-size: 8pt;">T</span></em>, and the Y-axis features <em>Q<span style="font-size: 8pt;">T</span></em>. An example with known random-like behavior, resulting in an almost perfect diagonal, is also featured in the same article. Notice that there are fewer and fewer points as you move towards the upper right corner. The higher <em>k</em>, the more sparse the upper right corner will be. In the above example, <em>k</em> = 3. To address this issue, proceed as follows, stretching the point distribution along the diagonal:</p>
<ul>
<li>Let <em>P*<span style="font-size: 8pt;">T</span></em> = (- 2 log <em>P<span style="font-size: 8pt;">T</span></em>) / <em>k</em> and <em>Q</em>*<span style="font-size: 8pt;"><em>T</em></span> = (- 2 log <em>Q<span style="font-size: 8pt;">T</span></em>) / <em>k</em>. This is a transformation leading to a Gamma(<em>k</em>, 2/<span style="font-size: 10pt;"><em>k</em></span>) distribution. See explanations <a href="https://stats.stackexchange.com/questions/89949/geometric-mean-of-uniform-variables" target="_blank" rel="noopener">here</a>. </li>
<li>Let <em>P</em>**<span style="font-size: 8pt;"><em>T</em></span> = <em>F</em>(<span style="font-size: 12pt;"><em>P</em></span>*<span style="font-size: 8pt;"><em>T</em></span>) and <em>Q</em>**<span style="font-size: 8pt;"><em>T</em></span> = <em>F</em>(<i>Q</i>*<span style="font-size: 8pt;"><em>T</em></span>) where <em>F</em> is the cumulative distribution function of a Gamma(<em>k</em>, 2/<span style="font-size: 10pt;"><em>k</em></span>) random variable.</li>
</ul>
<p>By virtue of the <a href="https://en.wikipedia.org/wiki/Inverse_transform_sampling" target="_blank" rel="noopener">inverse transform sampling theorem</a>, the points (<em>P</em>**<span style="font-size: 8pt;"><em>T</em></span>, <em>Q</em>**<span style="font-size: 8pt;"><em>T</em></span>) are now uniformly stretched along the main diagonal. </p>
<p><span style="font-size: 14pt;"><strong>3. Results and generalization</strong></span></p>
<p>Let's get back to our sequence <em>x<span style="font-size: 8pt;">n</span></em> = { <em>α n</em>^<em>p</em> } with <em>p</em> > 1 and <em>α</em> irrational. Before showing and discussing some charts, I want to discuss a few issues. First, if <em>p</em> is large, machine accuracy will quickly result in erroneous computations for <em>x<span style="font-size: 8pt;">n</span></em>. You need to detect when loss of accuracy becomes a critical problem, usually well below <em>n</em> = 1,000 if <em>p</em> = 5. Working with double precision arithmetic will help. Another issue, if <em>p</em> is close to 1, is the fact that randomness does not kick in until <em>n</em> is large enough. You may have to ignore the first few hundreds terms of the sequence in that case. If <em>p</em> = 1, randomness never occurs. Also, we have assumed that the marginal distributions are uniform on [0, 1]. From the theoretical point of view, they indeed are, and it will show if you compute the empirical percentile distribution of <em>x<span style="font-size: 8pt;">n</span></em>, even in the presence of strong auto-correlations (the reason why is because of the ergodic nature of the sequences in question, but this topic is beyond the scope of the present article). So it would be a good exercise to use various statistical tools or libraries to assess whether they can confirm the uniform distribution assumption.</p>
<p><strong>3.1. Examples</strong></p>
<p>The exact theoretical value of the lag-<em>k</em> auto-correlation is known for all <em>k</em> if <em>p</em> = 1. See section 5.4 in <a href="https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness" target="_blank" rel="noopener">this article</a>. It is almost never equal to zero, but it turns out that if <em>k</em> = 1, <em>p</em> = 1 and <em>α</em> = (3 + SQRT(3))/6, it is indeed equal to zero. Use a statistical package to see if it can detect this fact, or ask your team to do the test. Also, if <em>p</em> is an integer, show (using statistical techniques) that for some <em>a</em><span style="font-size: 8pt;">1</span>, ..., <em>a</em><span style="font-size: 8pt;">k</span>, we have <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k</em></span> = <em>a</em><span style="font-size: 8pt;">1</span> <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k-1</em></span> + <em>a</em><span style="font-size: 8pt;">2</span> <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+<em>k</em>-2</span> + ... + <em>a<span style="font-size: 8pt;">k</span></em> <em>x<span style="font-size: 8pt;">n</span></em> takes only on a finite number of values as discussed in section 2.2, and thus, the random-like assumption is always violated. In particular, <em>k</em> = 2 if <em>p</em> = 1. This is also true <em>asymptotically</em> if <em>p</em> is not an integer, see <a href="https://mathoverflow.net/questions/377697/sequences-similar-to-n-alpha-that-are-both-equidistributed-and-truly-rando/377748#377748" target="_blank" rel="noopener">here</a> for details. Yet, if <em>p</em> > 1, the auto-correlations are very close to zero, unlike the case <em>p</em> = 1. But are they truly identical to zero? What about the sequence <em>x<span style="font-size: 8pt;">n</span></em> = { <em>α</em>^<em>n</em> } with say <em>α</em> = log 3? Is it random-like? Nobody knows. Of course, if <em>α</em> = (1 + SQRT(5))/2, that sequence is anything but random, so it depends on <em>α</em>. </p>
<p>Below are three scatterplots showing the distribution of (<em>x<span style="font-size: 8pt;">n</span></em>, <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span>) for a few hundreds value of <em>n</em>, for various <em>α</em> and <em>p</em>, for the sequence <em>x<span style="font-size: 8pt;">n</span></em> = { <em>α</em> <em>n</em>^<em>p</em> }. The X-axis represents <em>x<span style="font-size: 8pt;">n</span></em>, the Y-axis represents <em>x<span style="font-size: 8pt;">n</span></em><span style="font-size: 8pt;">+1</span>. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242305270?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242305270?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>p = SQRT(7), α = 1</em></p>
<p>Even to the trained naked eye, Figure 2 shows randomness in 2 dimensions. Independence may fail in higher dimensions (k > 2) as the sequence is known not to be random-like. There is no apparent collinearity pattern as discussed in section 2.2, at least for <em>k</em> = 2. Can you run some test to detect lack of randomness in higher dimensions?</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242307701?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242307701?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3</strong>: <em>p = 1.4, α = log 2</em></p>
<p>To the trained naked eye, Figure 3 shows lack of randomness as highlighted in the red band. Can you do a test to confirm this? If the test is inclusive or provide the wrong answer, than the naked eye performs better, in this case, than statistical software.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8242319869?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8242319869?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 4</strong>: <em>p = 1.1, α = log 2</em></p>
<p>Here (Figure 4) any statistical software and any human being, even the layman, can identify lack of randomness in more than one way. As <em>p</em> gets closer and closer to 1, lack of randomness is obvious, and the collinearity issue discussed in section 1.2, even if fuzzy, becomes more apparent even in two dimensions.</p>
<p><strong>3.2. Independence between two sequences</strong></p>
<p>It is known that if <em>α</em> and <em>β</em> are irrational numbers linearly independent over the set of rational numbers, then the sequences { <em>αn</em> } and { <em>βn</em> } are not correlated, even though each one taken separately is heavily auto-correlated. A sketch proof of this result can be found in the Appendix of <a href="https://www.datasciencecentral.com/profiles/blogs/state-of-the-art-statistical-science-to-address-famous-number-the" target="_blank" rel="noopener">this article</a>. But are they really independent? Test, using statistical software, the absence of correlation if <em>α </em>= log 2 and <em>β</em> = log 3. How would you do to test independence? The methodology presented in section 2.3 can be adapted and used to answer this question empirically (although not theoretically). </p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
Covid-19: My Predictions for 2021
tag:www.datasciencecentral.com,2020-11-30:6448529:BlogPost:1003991
2020-11-30T07:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here I share my predictions as well as personal opinion about the pandemic. My thoughts are not derived from running sophisticated models on vast amounts of data. Much of the data available has major issues anyway, something I am also about to discuss. There are some bad news and some good news. This article discusses what I believe are the good news and bad news, as well a some attempt at explaining people behavior and reactions, and resulting consequences. My opinion is very different from…</p>
<p>Here I share my predictions as well as personal opinion about the pandemic. My thoughts are not derived from running sophisticated models on vast amounts of data. Much of the data available has major issues anyway, something I am also about to discuss. There are some bad news and some good news. This article discusses what I believe are the good news and bad news, as well a some attempt at explaining people behavior and reactions, and resulting consequences. My opinion is very different from what you have read in the news, whatever the political color. Mine has I think, no political color. It offers a different, possibly refreshing perspective to gauge and interpret what is happening.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8230291873?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8230291873?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>I will start by mentioning Belgium, one of the countries with the highest death rate. Very recently, it went from 10,000 deaths to 15,000 in the last wave, in a matter of days. They are back in some lock-down, and the situation has dramatically improved in the last few days. But 15,000 deaths out of 10,000,000 people would translate to 500,000 deaths in US. We are far from there yet. Had they not mandated a new lock-down, killing restaurants and other businesses but keeping schools open along the way, they would probably have 20,000 deaths now, probably quickly peaking at 25,000 before things improve. Now we are comparing apples and oranges. In Belgium, everyone believed to have died from covid was listed as having actually died from the virus even if un-tested. Also, the population density is very high compared to US, and use of public transportation is widespread. Areas with lower population density have initially fewer deaths per 100,000 inhabitants, until complacency eventually creates the same spike.</p>
<p>The bad news is that I think we will surpass 500,000 deaths in US by the end of February. But I don't think we will ever reach 1,000,000 by the end of 2021. A vaccine has been announced for months, but won't be available to the public at large in time: only to some specific groups of people (hospital workers) in the next few months. By the time it will be widely available, we will all have been contaminated / infected and recovered (99.8% of us) or dead (0.2% of us). The vaccine will therefore be useless to curtail the pandemic, which by then will have died out of its own due to lack of new people to infect. It may still be useful for the future, but not to spare the lives of another 300,000 who will have died between now and end of February. </p>
<p>You may wonder: why not imposing a full lock-down until March? Yes this will save many lives but kill many others in what I think is a zero-sum sinister game. Economic destruction, suicide, drug abuse, crime, riots would follow and would be just as bad. And with surge in unemployment and massive losses in tax revenue, I don't think any local or state government has the financial ability to do it, it is just financially unsustainable. So I think lock-downs can only last so long, probably about a month or so maximum. What is likely to happen is more and more people not following un-enforced regulations anymore, and those who really need to protect themselves, will stay at home and continue to live in a self-imposed state of lock-down.</p>
<p>Now some good news at least. It is said that for anyone who tests positive, 8 go untested because symptoms are too mild or inexistent to require medical help, and thus are not diagnosed. Me and my whole family and close friends fit in that category: never tested, but fully recovered, with no long-term side effects. Have we been re-infected again? Possibly, but it was even milder the second time, and again none of us were tested. One reason for not being tested / treated is that going to an hospital is much more risky than dining-in in a restaurant (many hospital workers died from covid, much fewer restaurant workers did). Another reason is to not have a potentially worrisome medical record attached to my name. Now you can say we were never infected in the first place, but it's like saying the virus is not contagious at all. Or you can say we will be re-infected again, but it's like saying the vaccine, even two doses six months apart, won't work. Indeed we are very optimistic about our future, as are all the people currently boosting the stock market to incredible highs. What I am saying here is that probably up to half of the population (150 million Americans) are currently at the end of the tunnel by now: recovered for most of us, or dead. </p>
<p>Some people like myself who had a worse-than-average (still mild) case realize that wearing a mask causes difficulty breathing worse than the virus itself. I don't have time to wash my mask and hands all the time, or buy new masks and so on, when I believe me and my family are done with it. Unwashed, re-used masks are probably full of germs and worse than no mask, once immune. As more and more people recover every day in very large numbers these days (but the media never mention it) you are going to see more and more people who spontaneously return to a normal life. These people are not anti-science, anti-social, or anti-government - quite the contrary, they are acting rationally, not driven by fear. They don't believe in conspiracy theories, and are from all political affiliations or apolitical. Forcing these people to isolate via mandated lock-downs won't work: some will have big parties in private homes, a hair-dresser may decide to provide her services privately in the homes of her clients, and be paid under the table. People still want to eat great food with friends and will continue to do so. People still want to date. Even if the city of Los Angeles makes it illegal to meet in your home with members from another household, you can't stop young (or less young) people from dating, not any more than you can stop the law of gravity no matter how hard you try.</p>
<p>Of course, if all the people acting this way were immune, it would not be an issue. Unfortunately, many people who behave that way today are just careless (or ignorant, maybe not reading the news anymore). But as time goes by, even many of the careless people are going to get infected and then immune; it's a matter of weeks. So the intensity of this situation may peak in a few weeks and then naturally slow down, as dramatically as it rose.</p>
<p>In conclusion, I believe that by the end of March we will be back to much better times, and covid will be a thing of the past for most of us. Like the Spanish flu. Though it is said that the current yearly flu is just remnants from the 1918 pandemic. The same may apply to covid, but it will be less lethal moving forward, after having killed those who were most susceptible to it. Already the death rate has plummeted. This of course won't help people who have lost a family member or friend, you can't make her come back. This is the sad part.</p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
Introducing an All-purpose, Robust, Fast, Simple Non-linear Regression
tag:www.datasciencecentral.com,2020-11-24:6448529:BlogPost:1003574
2020-11-24T03:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>The model-free, data-driven technique discussed here is so basic that it can easily be implemented in Excel, and we actually provide an Excel implementation. It is surprising that this technique does not pre-date standard linear regression, and is rarely if ever used by statisticians and data scientists. It is related to kriging and nearest neighbor interpolation, and apparently first mentioned in 1965 by Harvard scientists working on GIS (geographic information systems). It was referred…</p>
<p>The model-free, data-driven technique discussed here is so basic that it can easily be implemented in Excel, and we actually provide an Excel implementation. It is surprising that this technique does not pre-date standard linear regression, and is rarely if ever used by statisticians and data scientists. It is related to kriging and nearest neighbor interpolation, and apparently first mentioned in 1965 by Harvard scientists working on GIS (geographic information systems). It was referred back then as Shepard's method or inverse distance weighting, and used for multivariate interpolation on non-regular grids (see <a href="https://en.wikipedia.org/wiki/Multivariate_interpolation" target="_blank" rel="noopener">here</a> and <a href="https://en.wikipedia.org/wiki/Inverse_distance_weighting" target="_blank" rel="noopener">here</a>). We call this technique <em>simple regression</em>.</p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8209321855?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8209321855?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source for picture: <a href="https://www.datasciencecentral.com/profiles/blogs/3-types-of-regression-in-one-picture-baba-png" target="_blank" rel="noopener">here</a></em></p>
<p>In this article, we show how simple regression can be generalized and used in regression problems especially when standard regression fails due to multi-collinearity or other issues. It can safely be used by non-experts without risking misinterpretation of the results or over-fitting. We also show how to build confidence intervals for predicted values, compare it to linear regression on test data sets, and apply it to a non-linear context (regression on a circle) where standard regression fails. Not only it works for prediction inside the domain (equivalent to interpolation) but also, to a lesser extent and with extra care, outside the domain (equivalent to extrapolation). No matrix inversion or gradient descend is needed in the computations, making it a faster alternative to linear or logistic regression.</p>
<p><span style="font-size: 14pt;"><strong>1. Simple regression explained</strong></span></p>
<p>For ease of presentation, we only discuss the two-dimensional case. Generalization to any dimension is straightforward. Let us assume that the data set (also called training set) consists of <em>n</em> points or locations (<em>X</em><span style="font-size: 8pt;">1</span>, <em>Y</em><span style="font-size: 8pt;">1</span>), ..., (<em>X<span style="font-size: 8pt;">n</span></em>, <em>Y<span style="font-size: 8pt;">n</span></em>) together with the response (also called dependent values) <em>Z</em><span style="font-size: 8pt;">1</span>, ..., <em>Z<span style="font-size: 8pt;">n</span></em> attached to each observation. Then the predicted value <em>Z</em> at an arbitrary location (<em>X</em>, <em>Y</em>) is computed as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8208229253?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8208229253?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>Throughout this article, we used </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8208207489?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8208207489?profile=RESIZE_710x" width="370" class="align-center"/></a></p>
<p>with <em>β</em> = 5.<b> </b>The parameter <em>β</em> controls the smoothness and is actually an hyper-parameter. It should be set to at least twice the dimension of the problem. A large value of <em>β </em>decreases the influence of far-away points in the predictions. In a Bayesian framework, a prior could be attached to <em>β</em>. Also note that if (<em>X</em>, <em>Y</em>) is one of the <em>n</em> training set points, say (<em>X</em>, <em>Y</em>) = (<em>X<span style="font-size: 8pt;">j</span></em>, <em>Y<span style="font-size: 8pt;">j</span></em>) for some <em>j</em>, then <em>Z</em> must be set to <em>Z<span style="font-size: 8pt;">j</span></em>. In short, the predicted value is exact for points belonging to the training set. If <span>(<em>X</em>, <em>Y</em>)</span> is very close to say (<em>X<span style="font-size: 8pt;">j</span></em>, <em>Y<span style="font-size: 8pt;">j</span></em>) and further away from the other training set points, then the computed <em>Z</em> is very close to <em>Z<span style="font-size: 8pt;">j</span></em>. It is assumed here that there are no duplicate locations in the training set otherwise, the formula needs adjustments. </p>
<p><span style="font-size: 14pt;"><strong>2. Case studies and Excel spreadsheet with computations</strong></span></p>
<p>We did some simulations to compare the performance of simple regression versus linear regression. In the first example, the training set consists of <em>n</em> = 100 data points generated as follows. The locations are random points (<em>X<span style="font-size: 8pt;">k</span></em>, <em>Y<span style="font-size: 8pt;">k</span></em>) in the two-dimensional unit square [0, 1] x [0, 1]. The response was set to <em>Z<span style="font-size: 8pt;">k</span></em> = SQRT[(<em>X<span style="font-size: 8pt;">k</span></em>)^2 + (<em>Y<span style="font-size: 8pt;">k</span></em>)^2]. The control set consists of another <em>n</em> = 100 points, also randomly distributed on the same unit square. The predicted values were computed on the control set, and the goal is to check how well they approximate the theoretical (true) value SQRT(<em>X</em>^2 + <em>Y</em>^2). Both the simple and linear regression perform well, though the R-squared is a little better for the simple regression, for most training and control sets of this type. The picture below shows the quality of the fit. A perfect fit would correspond to a perfect diagonal line rather than a cloud, with 0.9886 and 0.0089 (the slope and intercept of the red line) replaced respectively by 1 and 0. Note that the R-squared 0.9897 is very close to 1.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8208321887?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8208321887?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>data set doing well with both simple and linear regression</em></p>
<p><span><strong>2.1. Regression on the circle</strong></span></p>
<p>In this second example, both the training set and control points are located on the unit circle (on the border of the circle, not inside or outside, so technically this a one-dimensional case). As expected the R-squared for the linear regression is terrible, and close to zero, while it is close to one for the simple regression. Note the weird distribution for the linear regression: this is not a glitch, it is expected to be that way.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8208423294?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8208423294?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>Good fit with simple regression (points distributed on a circle)</em></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8208428655?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8208428655?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 3</strong>: <em>Bad fit with linear regression (points distributed on the same circle as in Figure 2)</em></p>
<p><strong>2.2. Extrapolation</strong></p>
<p>In the third example, we used the same training set with random locations on the unit circle. The control set consists this time of <em>n</em> = 100 points located in a square away from the circle, with no intersection with the circle. This corresponds to extrapolation. Both the linear and simple regression perform badly this time. The R-squared associated with the linear regression is close to zero, so no amount of re-scaling can fix it. The predicted values appear random.</p>
<p>However, even though the simple regression results are almost as much off as those coming from the linear regression with respect to bias, they can be substantially improved, easily. The picture below illustrates this fact. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8209018659?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8209018659?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 4</strong>: <em>Testing predictions outside the domain (extrapolation)</em></p>
<p>The slope in figure 4 is 0.3784. For a perfect fit, it should be equal to one. However the R-squared for the simple regression is pretty good: 0.842. So if we multiply the predicted values by a constant so that the average predicted value, in the square outside the circle, if not heavily biased anymore, we would have a good fit with the same R-squared. Of course, this assumes that the true average value on the unit square domain is known, at least approximately. It is significantly different from the average value computed on the training set (the circle), thus the bias. This fix won't work for the linear regression, with the R-squared staying unchanged and close to zero after rescaling, even if we remove the bias. </p>
<p><strong>2.3. Confidence intervals for predicted values</strong></p>
<p>Here, we are back to using the first data set that worked well both for linear and simple regression, doing interpolation rather than extrapolation, as at the beginning of section 2. The control set is fixed, but we split the training set (consisting this time of 500 points) into 5 subsets. This approach is similar to cross-validation or bootstrapping, and allows us to compute confidence intervals for the predicted values. It works as follows:</p>
<ul>
<li>Repeat the whole procedure 5 times, using each time a different subset of the training set</li>
<li>Estimate <em>Z</em> based on the location (<em>X</em>, <em>Y</em>) for each point in the control set, using the formula in section 1: we will have 5 different estimates for each point, one for each subset of the training set</li>
<li>For each point in the control set, compute the minimum and maximum estimated value, out of the 5 predictions</li>
<li>The confidence interval for each point has the minimum predicted value as lower bound, and the maximum as upper bound. </li>
</ul>
<p>Of course the technique can be further refined, using percentiles rather than minimum and maximum for the bounds of the confidence intervals. The most modern way to do it is described in my book <em>Statistics: New Foundations, Toolkit and Machine Learning Recipes</em>, available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">here</a> to DSC members. See chapters 15-16, pages 107-132.</p>
<p>The <strong>striking conclusions</strong> based on this test are as follows:</p>
<ul>
<li>The CI (confidence interval) based on simple regression is about 50% larger on average than the one based on linear regression</li>
<li>The CI based on simple regression contains the true value 92% of the time, versus 24% of the time for the linear regression.</li>
</ul>
<p>What is striking is the 92% achieved by the simple regression. Part of it is because the simple regression CI's are larger, but there is more to it. </p>
<p><strong>2.4. Excel spreadsheet</strong></p>
<p>All the data and tests discussed, including the computations, are available in my spreadsheet, allowing you to replicate the results or use it on your own data. You can download it <a href="https://storage.ning.com/topology/rest/1.0/file/get/8209116672?profile=original" target="_blank" rel="noopener">here</a> (krigi2.xlsx). The main tabs in the spreadsheet are</p>
<ul>
<li>Square</li>
<li>Circle-Interpolation</li>
<li>Circle-Extrapolation</li>
<li>Square-CI-Summary</li>
</ul>
<p>The remaining tabs are used for auxiliary computations and can be ignored.</p>
<p><span style="font-size: 14pt;"><strong>4. Generalization</strong></span></p>
<p>If you look at the main formula in section 1, the predicted <em>Z</em> is the quotient of two arithmetic means. The one at the numerator is a weighted mean, and the one at the denominator is a standard mean. But the formula will also work with other types of means, for example with the exponential mean discussed in one of my previous articles, <a href="https://www.datasciencecentral.com/profiles/blogs/alternative-to-the-arithmetic-geometric-and-harmonic-means" target="_blank" rel="noopener">here</a>. The advantage of using such means, over the arithmetic mean, is that there are hyperparameters attached to them, thus allowing for more granular fine-tuning. </p>
<p>For example, the exponential mean of <em>n</em> numbers <em>A</em><span style="font-size: 8pt;">1</span>, ..., <em>A<span style="font-size: 8pt;">n</span></em> is defined as</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8209146656?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8209146656?profile=RESIZE_710x" width="350" class="align-center"/></a></p>
<p>When the hyperparameter <em>p</em> tends to 1, it corresponds to the arithmetic mean. Here, use the exponential mean with</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8209189858?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8209189858?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p>respectively for the numerator and denominator in the first formula in section 1. You can even use a different <em>p</em> for the numerator and denominator.</p>
<p>Other original exact interpolation techniques based on Fourier methods, in one dimension and for points equally spaced, are described <a href="https://mathoverflow.net/questions/376081/infinite-partial-fraction-expansions-to-compute-fractional-iterations-and-recurr" target="_blank" rel="noopener">in this article</a>. Indeed, it was this type of interpolation that led me to investigate the material presented here. Robust, simple linear regression techniques are also described in chapter 1 in my book <em>Statistics: New Foundations, Toolkit and Machine Learning Recipes</em>, available <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">here</a> to DSC members.</p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books,<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
Interesting Application of the Poisson-Binomial Distribution
tag:www.datasciencecentral.com,2020-11-11:6448529:BlogPost:1000712
2020-11-11T03:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>While the Bernoulli and binomial distributions are among the first ones taught in any elementary statistical course, the Poisson-Binomial is rarely mentioned. It is however one of the simplest discrete distributions, with applications in survey analysis, see <a href="https://www.researchgate.net/publication/228718793_Statistical_Applications_of_the_Poisson-Binomial_and_conditional_Bernoulli_distributions" rel="noopener" target="_blank">here</a>. In this article, we are dealing with…</p>
<p>While the Bernoulli and binomial distributions are among the first ones taught in any elementary statistical course, the Poisson-Binomial is rarely mentioned. It is however one of the simplest discrete distributions, with applications in survey analysis, see <a href="https://www.researchgate.net/publication/228718793_Statistical_Applications_of_the_Poisson-Binomial_and_conditional_Bernoulli_distributions" target="_blank" rel="noopener">here</a>. In this article, we are dealing with experimental / probabilistic number theory, leading to a more efficient detection of large prime numbers, with applications in cryptography and IT security. </p>
<p>This article is accessible to people with minimal math or statistical knowledge, as we avoid jargon and theory, favoring simplicity. Yet we are able to present original research-level results that will be of interest to professional data scientists, mathematicians, and machine learning experts. The data set explored here is the set of numbers, and thus accessible to anyone. We also explain computational techniques, even mentioning online tools, to deal with very large integers that are beyond what standard programming languages or Excel can handle. </p>
<p><span style="font-size: 14pt;"><strong>1. The Poisson-Binomial Distribution</strong></span></p>
<p>We are all familiar with the most basic of all random variables: the Bernoulli. If <i>Y</i> is such a variable, it is equal to 0 with probability <em>p</em>, and to 1 with probability 1 - <em>p</em>. Here the parameter <em>p</em> is a real number between 0 and 1. If you run <em>n</em> trials, independent from each other, and each with the same potential outcome, then the number of successes, defined as the number of times the outcome is equal to 1, is a Binomial variable of parameters <em>n</em> and <em>p</em>. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124664081?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124664081?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source for picture: <a href="https://blogs.sas.com/content/iml/2020/10/07/poisson-binomial-hundreds-of-parameters.html" target="_blank" rel="noopener">here</a></em></p>
<p>If the trials are independent but a different <em>p</em> is attached to each of them, then this time the number of successes has a Poisson-binomial distribution. In short, let's say that we have <em>n</em> independent Bernoulli random variables <i>Y</i><span style="font-size: 8pt;">1</span>, ..., Y<em><span style="font-size: 8pt;">n</span></em> respectively with parameter <em>p</em><span style="font-size: 8pt;">1</span>, ..., <em>p<span style="font-size: 8pt;">n</span></em>, then the number of successes <i>X</i> = <i>Y</i><span style="font-size: 8pt;">1</span> + ... + Y<em><span style="font-size: 8pt;">n</span></em> has a Poisson-binomial distribution of parameters <em>p</em><span style="font-size: 8pt;">1</span>, ..., <em>p<span style="font-size: 8pt;">n</span></em> and <em>n</em>. The exact probability density function is cumbersome to compute as it is combinatorial in nature, but a Poisson approximation is available and will be used in this article, thus the name <em>Poisson-binomial</em>. </p>
<p>The first two moments (expectation and variance) are as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124556881?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124556881?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>The exact formula for the PDF (probability density function) involves an exponentially growing number of terms as <em>n</em> becomes large. For instance, P(<em>X</em> = <em>n</em> - 2) which is the probability that exactly two out of <em>n</em> trials fail, is given by the following formula:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124558097?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124558097?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>For this reason, whenever possible, approximations are used. </p>
<p><strong>1.1. Poisson approximation</strong></p>
<p>When the parameters <em>p<span style="font-size: 8pt;">k</span></em> are small, say <em>p<span style="font-size: 8pt;">k</span></em> < 0.1, then the following Poisson approximation applies. Let <span><em>λ</em> = <em>p</em><span style="font-size: 8pt;">1</span> + ... + <em>p<span style="font-size: 8pt;">n</span></em>. Then for <em>m</em> = 0, ..., <em>n</em>, we have: </span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124637257?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124637257?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>When <em>n</em> becomes large, we can use the <a href="https://www.datasciencecentral.com/profiles/blogs/new-perspective-on-central-limit-theorem-and-related-stats-topics" target="_blank" rel="noopener">Central Limit Theorem</a> to compute more complicated probabilities such as P(<em>X</em> > <em>m</em>), based on the Poisson approximation. See also the <a href="https://en.wikipedia.org/wiki/Le_Cam%27s_theorem" target="_blank" rel="noopener">Le Cam theorem</a> for more precise approximations. </p>
<p><span style="font-size: 14pt;"><strong>2. Case study: Odds to observe many primes in a random sequence</strong></span></p>
<p>The 12 integers below were produced with a special sequence described in the second example in <a href="https://mathoverflow.net/questions/374305/sequences-with-high-densities-of-primes-how-to-boost-them-to-get-even-more-and" target="_blank" rel="noopener">this article</a>. It quickly produces a large volume of numbers with no small divisors. How likely it is to produce such a sequence of numbers just by chance? The numbers <span style="font-size: 12pt;">q[5], q[6], q[7], q[12]</span> have divisors smaller than 1,000 and the remaining eight numbers have no divisor smaller than <em>N</em> = 15,485,863. Note that <em>N</em> (the one-millionth prime) is the largest divisor that I tried in that test. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124676862?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124676862?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>Here is the answer. The probability for a large number <em>x</em> to be prime is about 1 / log <em>x</em>, by virtue of the <a href="https://www.datasciencecentral.com/profiles/blogs/simple-proof-of-prime-number-theorem" target="_blank" rel="noopener">Prime Number Theorem</a>. The probability for a large number <em>x</em> to have no divisor smaller than <em>N</em> is</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124736673?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124736673?profile=RESIZE_710x" width="200" class="align-center"/></a></p>
<p>where the product is over all primes <em>p</em> < <em>N</em> and <em>γ</em> = 0.577215… is the Euler–Mascheroni constant. Here <em>ρ<span style="font-size: 8pt;">N</span></em> ≈ 0.033913. See <a href="https://www.datasciencecentral.com/profiles/blogs/88-per-cent-of-all-integers-have-a-factor-under-100" target="_blank" rel="noopener">here</a> for an explanation of the equality on the left side. The right-hand formula is known as the <a href="https://en.wikipedia.org/wiki/Mertens%27_theorems" target="_blank" rel="noopener">Mertens theorem</a>. See also <a href="https://mathoverflow.net/questions/374824/asymptotics-for-prod1-frac1p-over-all-primes-p-leq-x-with-p-equiv" target="_blank" rel="noopener">here</a>. The symbol ~ represents <a href="https://en.wikipedia.org/wiki/Asymptotic_analysis" target="_blank" rel="noopener">asymptotic equivalence</a>. Thus the probability to observe 4 large numbers out of 12 having no divisor smaller than <em>N</em> is</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8124740889?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8124740889?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>Note that we used a binomial distribution here to answer the question. Also, the probability for <em>x</em> to be prime if it has no divisor smaller than <em>N</em> is equal to<a href="https://storage.ning.com/topology/rest/1.0/file/get/8148564499?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8148564499?profile=RESIZE_710x" width="550" class="align-center"/></a></p>
<p>For the above numbers q[1],⋯,q[12], the probability in question is not small. For instance, it is equal to 0.47, 0.36 and 0.23 respectively for q[1], q[2] and q[11]. Other sequences producing a high density of prime numbers are discussed <a href="https://mathoverflow.net/questions/374305/sequences-with-high-densities-of-primes-how-to-boost-them-to-get-even-more-and" target="_blank" rel="noopener">here</a> and <a href="https://mathoverflow.net/questions/375133/quadratic-progressions-with-very-high-prime-density" target="_blank" rel="noopener">here</a>. </p>
<p><strong>2.1. Computations based on the Poisson-Binomial distribution</strong></p>
<p>Let us denote as <em>p<span style="font-size: 8pt;">k</span></em> the probability that q[<em>k</em>] is prime, for <em>k</em> =1, ...,12. As discussed earlier in section 2, <em>p<span style="font-size: 8pt;">k</span></em> = 1 / log q[<em>k</em>] is small, and the Poisson approximation can be used when dealing with the Poisson-binomial distribution. So we can use the formula in section 1.1. with <span><em>λ</em> </span>= <em>p</em><span style="font-size: 8pt;">1</span> + ... + <em>p<span style="font-size: 8pt;">n</span></em> and <em>n</em> = 12. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8148672493?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8148672493?profile=RESIZE_710x" width="300" class="align-center"/></a></p>
<p>Thus, <span><em>λ</em> = 0.11920 (approx.) Now we can compute <em>P</em>(<em>X</em> = <em>m</em>) for <em>m</em> = 8, 9, 10, 11,12:</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8148678090?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8148678090?profile=RESIZE_710x" width="120" class="align-center"/></a></p>
<p>The chance that 8 or more large numbers are prime among q[1],⋯,q[12] is the sum of the 5 probabilities in the above table. It is equal to 9.1068 / 10^13. That is, less than one in a trillion. </p>
<p><strong>2.2. Technical note: handling very large numbers</strong></p>
<p>Numbers investigated in this research have dozens and even hundreds of digits. The author has routinely worked with numbers with millions of digits. Below are some useful tools to deal with such large numbers.</p>
<ul>
<li>If you use a programming language, check if it has a BigNum or BigInt library. Here I used the Perl programming language, with the BigNum library. A similar library is available in Python. See examples of code, <a href="https://www.datasciencecentral.com/forum/topics/question-how-precision-computing-in-python" target="_blank" rel="noopener">here</a>. </li>
<li>A list of all prime numbers up to one trillion is available <a href="http://compoasso.free.fr/primelistweb/page/prime/liste_online_en.php" target="_blank" rel="noopener">here</a>. </li>
<li>To check if a large number <em>p</em> is prime or not, use the command PrimeQ[<em>p</em>] in Mathematica, also available online <a href="https://www.wolframalpha.com/input/?i=PrimeQ%5B29*%2880%21%29+%2B+1%5D" target="_blank" rel="noopener">here</a>. Another online tool, allowing you to test many numbers in batch to find which ones are prime, is available <a href="https://www.alpertron.com.ar/ECM.HTM" target="_blank" rel="noopener">here</a>.</li>
<li>The online Sagemath symbolic calculator is also useful. I used it e.g. to compute millions of binary digits of numbers such as SQRT(2), see <a href="https://sagecell.sagemath.org/?z=eJzz0yguLCrRMNLUKShKTbY1NAACTb3ikiKNpMTiVFsjTQCp3gnT&lang=sage" target="_blank" rel="noopener">here</a>. </li>
<li>For those interested in experimental number theory, the <a href="https://oeis.org/" target="_blank" rel="noopener">OEIS online tool</a> is also very valuable. If you discover a sequence of integers, and you are wondering if it has been discovered before, you can do a reverse lookup to find references to the sequence in question. You can also do a reverse lookup on math constants, entering the first 15 digits to see if it matches a known math constant.</li>
</ul>
<p><span style="font-size: 14pt;"><strong>3. Cryptography application </strong></span></p>
<p>Many cryptography systems rely on public and private keys that feature the product of two large primes, typically with hundreds or thousands of binary digits. Producing such large primes was not an easy task until efficient algorithms were created to check if a number is prime or not. These algorithms are known as <a href="https://en.wikipedia.org/wiki/Primality_test" target="_blank" rel="noopener">primality tests</a>. Some are very fast but only provide a probabilistic answer: the probability that the number in question is a prime number, which is either zero or extremely close to one. These algorithms rely on sampling a large number of primes to identify prime candidates, and then determine their status (prime or not prime) with an exact but more costly test. </p>
<p>Remember that the probability for a random, large integer <em>p</em> to be prime, is about 1 / log <em>p</em>. So if you test 100,000 numbers close to 10^300, you'd expect to find 145 primes. Not a very efficient strategy. One way to improve these odds by an order of magnitude, is to pick up integers belonging to sequences that are prime-rich: such sequences can contain 10 times more primes than random sequences. This is where the methodology discussed here becomes handy. Such sequences are discussed in two of my articles: <a href="https://mathoverflow.net/questions/375133/quadratic-progressions-with-very-high-prime-density" target="_blank" rel="noopener">here</a> and <a href="https://mathoverflow.net/questions/374305/sequences-with-high-densities-of-primes-how-to-boost-them-to-get-even-more-and" target="_blank" rel="noopener">here</a>. </p>
<p></p>
<p><em><strong>About the author</strong>: Vincent Granville is a d<span class="lt-line-clamp__raw-line">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).</span> You can access Vincent's articles and books, <a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles" target="_blank" rel="noopener">here</a>.</em></p>
Thursday News, October 29
tag:www.datasciencecentral.com,2020-10-29:6448529:BlogPost:999995
2020-10-29T18:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of featured articles and technical resources posted since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/31QaZCZ">Fully online MS in Data Science at CUNY</a></li>
</ul>
<p><strong>DSC Articles</strong></p>
<ul>
<li><div class="ib"><span><a href="https://www.datasciencecentral.com/profiles/blogs/how-kids-channel-their-internal-data-scientist-to-become-candy">How Kids Channel Their Internal Data Scientist to Become Candy Optimization…</a></span></div>
</li>
</ul>
<p>Here is our selection of featured articles and technical resources posted since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/31QaZCZ">Fully online MS in Data Science at CUNY</a></li>
</ul>
<p><strong>DSC Articles</strong></p>
<ul>
<li><div class="ib"><span><a href="https://www.datasciencecentral.com/profiles/blogs/how-kids-channel-their-internal-data-scientist-to-become-candy">How Kids Channel Their Internal Data Scientist to Become Candy Optimization Machines</a>...</span></div>
</li>
<li><div class="ib"><a href="https://www.datasciencecentral.com/profiles/blogs/fintech-trends-ai-smart-contracts-neobanks-open-banking-and" target="_blank" rel="noopener">FinTech Trends: AI, Smart Contracts, Neobanks, Open Banking, and Blockchain</a></div>
</li>
<li><div class="ib"><span><span><a href="https://www.datasciencecentral.com/profiles/blogs/conjunction-vs-disjunction"></a></span></span><span><a href="https://www.datasciencecentral.com/profiles/blogs/conjunction-vs-disjunction"></a></span><a href="https://www.datasciencecentral.com/profiles/blogs/digital-twin-virtual-manufacturing-and-the-coming-diamond-age">Digital Twins, Virtual Manufacturing, and the Coming Diamond Age</a></div>
</li>
<li><div class="ib"><a href="https://www.datasciencecentral.com/profiles/blogs/conjunction-vs-disjunction">Conjunction vs Disjunction: Bad Apples and Other Analogies</a></div>
</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-connection-between-transparency-auditability-and-ai"><span>The</span> Connection Between Transparency, Auditability, and AI</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-most-essential-skills-you-need-to-know-to-start-doing-machine-1">Essential Skills Needed to Start Doing Machine Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/world-s-top-5-data-analytics-companies-in-2020" target="_blank" rel="noopener">World's Top 5 Data Analytics Companies in 2020</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-most-essential-skills-you-need-to-know-to-start-doing-machine">Job opportunities in Data Science with Python</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/insights-from-the-free-state-of-ai-repost"><span>Insights from the free state of AI repost</span></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-steps-to-collect-high-quality-data">5 Steps to Collect High-quality Data</a><span><a href="https://www.datasciencecentral.com/profiles/blogs/insights-from-the-free-state-of-ai-repost"></a></span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-are-ensemble-techniques">What are Ensemble Techniques?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-mlops-stack">The MLOps Stack</a></li>
</ul>
<p><b>Published On Tech Target</b></p>
<ul>
<li><a href="https://searchenterpriseai.techtarget.com/news/252491270/Wordtune-AI-tool-from-AI21-Labs-rewrites-sentences-using-NLG" target="_blank" rel="noopener">Wordtune AI tool from AI21 Labs rewrites sentences using NLG</a></li>
<li><a href="https://searchhrsoftware.techtarget.com/news/252491265/Firms-dive-into-data-for-diversity-and-inclusion-strategies" target="_blank" rel="noopener"></a><a href="https://searchhrsoftware.techtarget.com/news/252491265/Firms-dive-into-data-for-diversity-and-inclusion-strategies" target="_blank" rel="noopener">Firms</a> <a href="https://searchhrsoftware.techtarget.com/news/252491265/Firms-dive-into-data-for-diversity-and-inclusion-strategies" target="_blank" rel="noopener">dive into data for diversity and inclusion strategies</a></li>
<li><a href="https://searchenterpriseai.techtarget.com/feature/AI-fraud-detection-tools-can-help-rising-e-commerce-fraud" target="_blank" rel="noopener">AI fraud detection tools can help fight rising e-commerce fraud</a></li>
<li><a href="https://searchcontentmanagement.techtarget.com/feature/Baseball-team-digitizes-media-uses-AI-to-uncover-metadata" target="_blank" rel="noopener">Baseball team digitizes media, uses AI to uncover metadata</a></li>
<li><a href="https://searchbusinessanalytics.techtarget.com/feature/How-DataOps-architecture-benefits-your-analytics-strategy" target="_blank" rel="noopener">How DataOps architecture benefits your analytics strategy</a></li>
</ul>
<p></p>
<p><strong>Technical Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/free-book-cloud-native-containers-and-next-gen-apps">Free book - Cloud Native, Containers and Next-Gen Apps</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/8-best-big-data-hadoop-analytics-tools-in-2021">Best Big Data Hadoop Analytics Tools in 2021</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/rpa-guide-for-fintech-industry">RPA Guide For Fintech Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/applied-data-science-with-python">Applied Data Science with Python</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/so-you-want-to-write-for-dsc-1">So You Want to Write for Data Science Central</a></li>
</ul>
<p></p>
<hr/><p>For more news. information, and commentary in the AI, analytics and enterprise data realms, subscribe to the <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">Data Science Cental Newsletter</a>.</p>
Weekly Digest, October 26
tag:www.datasciencecentral.com,2020-10-25:6448529:BlogPost:999591
2020-10-25T21:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><span><strong>Featured Resources and Technical…</strong></span></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/so-you-want-to-write-for-dsc-1">So You Want to Write for Data Science Central</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/statistical-machine-learning-in-python">Statistical Machine Learning in Python</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/approaches-to-time-series-data-with-weak-seasonality">Approaches to Time Series Data with Weak Seasonality</a> +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/odds-vs-probability-vs-likelihood">Odds vs Probability vs Chance</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/free-book-cloud-native-containers-and-next-gen-apps">Free book - Cloud Native, Containers and Next-Gen Apps</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/8-best-big-data-hadoop-analytics-tools-in-2021">Best Big Data Hadoop Analytics Tools in 2021</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/rpa-guide-for-fintech-industry">RPA Guide For Fintech Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/applied-data-science-with-python">Applied Data Science with Python</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/data-science-techniques-to-eliminate-false-negatives">Question: Techniques to eliminate False Negatives</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-implications-of-huang-s-law-for-the-artificial-intelligence">The implications of Huang’s law for the AI stack</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/digital-dreams-analog-processes"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/digital-dreams-analog-processes">Digital Dreams – Analog Processes</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/waiting-for-godot-developing-competitive-differentiation"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/waiting-for-godot-developing-competitive-differentiation">Waiting for Godot: Developing Competitive Differentiation</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/next-generation-chip-wars-as-amd-eyes-xilink-acquisition"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/next-generation-chip-wars-as-amd-eyes-xilink-acquisition">Next Generation Chip Wars Heat Up </a>as AMD Eyes Xilinx acquisition<br/> <a href="https://www.datasciencecentral.com/profiles/blogs/technology-in-education-trends-edtech-the-future-of-e-learning/"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/technology-in-education-trends-edtech-the-future-of-e-learning/">Edtech - the Future of E-learning Software</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/data-preparation-need-not-be-cumbersome-or-time-consuming"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/data-preparation-need-not-be-cumbersome-or-time-consuming">Data Preparation Need Not Be Cumbersome Or Time Consuming</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/how-artificial-intelligence-is-reshaping-small-businesses"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-artificial-intelligence-is-reshaping-small-businesses">How Artificial Intelligence Is Reshaping Small Businesses</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/5-tried-and-tested-saas-marketing-strategies-to-generate-leads"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-tried-and-tested-saas-marketing-strategies-to-generate-leads">5 Tried and Tested SaaS Marketing Strategies to Generate Leads</a><br/> <a href="https://www.datasciencecentral.com/profiles/blogs/how-cognitive-chatbots-provide-supreme-customer-experience-and"></a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-cognitive-chatbots-provide-supreme-customer-experience-and">How cognitive chatbots transform service desk interactions</a><br/> <a href="https://www.datasciencecentral.com/forum/topics/data-science-as-a-service-industry-overview-and-growth-outlook"></a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/data-science-as-a-service-industry-overview-and-growth-outlook">Data Science as a Service Industry: </a>Overview and Growth Outlook</li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8073414678?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8073414678?profile=RESIZE_710x" width="400" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>
Thursday News, October 22
tag:www.datasciencecentral.com,2020-10-22:6448529:BlogPost:999318
2020-10-22T17:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of featured articles and technical resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/31s5M46">Learn why 63% of firms will be advancing their adoption of AI<span> </span></a><span>by 2023.</span></li>
</ul>
<p><strong>Technical Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/free-book-cloud-native-containers-and-next-gen-apps">Free book - Cloud Native, Containers and…</a></li>
</ul>
<p>Here is our selection of featured articles and technical resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/31s5M46">Learn why 63% of firms will be advancing their adoption of AI<span> </span></a><span>by 2023.</span></li>
</ul>
<p><strong>Technical Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/free-book-cloud-native-containers-and-next-gen-apps">Free book - Cloud Native, Containers and Next-Gen Apps</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/8-best-big-data-hadoop-analytics-tools-in-2021">Best Big Data Hadoop Analytics Tools in 2021</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/rpa-guide-for-fintech-industry">RPA Guide For Fintech Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/applied-data-science-with-python">Applied Data Science with Python</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/data-science-techniques-to-eliminate-false-negatives">Question: Techniques to eliminate False Negatives</a></li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/waiting-for-godot-developing-competitive-differentiation">Waiting for Godot: Developing Competitive Differentiation</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/next-generation-chip-wars-as-amd-eyes-xilink-acquisition">Next Generation Chip Wars Heat Up<span> </span></a>as AMD Eyes Xilinx acquisition</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-artificial-intelligence-is-reshaping-small-businesses">How Artificial Intelligence Is Reshaping Small Businesses</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-tried-and-tested-saas-marketing-strategies-to-generate-leads">5 Tried and Tested SaaS Marketing Strategies to Generate Leads</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-cognitive-chatbots-provide-supreme-customer-experience-and">How cognitive chatbots transform service desk interactions</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/data-science-as-a-service-industry-overview-and-growth-outlook">Data Science as a Service Industry:<span> </span></a>Overview and Growth Outlook</li>
</ul>
<p>Enjoy the reading!</p>
Weekly Digest, October 19
tag:www.datasciencecentral.com,2020-10-18:6448529:BlogPost:997575
2020-10-18T23:49:59.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><span><strong>Featured Resources and Technical…</strong></span></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/genius-tool-to-compare-best-time-series-models-for-multi-step">Best Models For Multi-step Time Series Modeling</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/types-of-variables-in-data-science-in-one-picture">Types of Variables in Data Science in One Picture</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-quick-demonstration-of-polling-confidence-interval-calculations">A quick demonstration of polling confidence interval calculations </a>using simulation</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-should-never-run-a-logistic-regression-unless-you-have-to">Why you should NEVER run a Logistic Regression </a>(unless you have to)</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/model-evaluation-model-selection-and-algorithm-selection-in">Cross-validation and hyperparameter tuning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-best-data-science-courses-2020">5 Great Data Science Courses</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/complete-hands-off-automated-machine-learning">Complete Hands-Off Automated Machine Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-should-learn-sitecore-cms-in-2021">Why You Should Learn Sitecore CMS?</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-is-driving-software-2-0-with-minimal-human-intervention">AI is Driving Software 2.0… with Minimal Human Intervention</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/data-observability-how-to-fix-your-broken-data-pipelines">Data Observability: How to Fix Your Broken Data Pipelines</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/applications-of-machine-learning-in-fintech">Applications of Machine Learning in FinTech</a></li>
<li><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/where-synthetic-data-brings-value">Where synthetic data brings value</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-fintech-is-the-future-of-banking">Why Fintech is the Future of Banking?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/real-estate-how-it-is-impacted-by-business-intelligence">Real Estate: How it is Impacted by Business Intelligence</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/determining-how-cloud-computing-benefits-data-science">Determining How Cloud Computing Benefits Data Science</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-is-mobile-banking-advantages-and-disadvantages-of-mobile-1">Advantages And Disadvantages Of Mobile Banking</a></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8048875067?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8048875067?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>
Thursday News, October 15
tag:www.datasciencecentral.com,2020-10-15:6448529:BlogPost:995450
2020-10-15T17:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of articles and technical contributions featured on DSC since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/3lK98qI">Penn State Master’s in Data Analytics<span> </span></a>– 100% Online</li>
<li><a href="https://dsc.news/310p28s">eBook: Data Preparation for Dummies</a></li>
</ul>
<p><strong>Technical Contributions…</strong></p>
<p>Here is our selection of articles and technical contributions featured on DSC since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/3lK98qI">Penn State Master’s in Data Analytics<span> </span></a>– 100% Online</li>
<li><a href="https://dsc.news/310p28s">eBook: Data Preparation for Dummies</a></li>
</ul>
<p><strong>Technical Contributions</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-quick-demonstration-of-polling-confidence-interval-calculations">A quick demonstration of polling confidence interval calculations<span> </span></a>using simulation</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-should-never-run-a-logistic-regression-unless-you-have-to">Why you should NEVER run a Logistic Regression<span> </span></a>(unless you have to)</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/model-evaluation-model-selection-and-algorithm-selection-in">Cross-validation and hyperparameter tuning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-should-learn-sitecore-cms-in-2021">Why You Should Learn Sitecore CMS?</a></li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-is-driving-software-2-0-with-minimal-human-intervention">AI is Driving Software 2.0… with Minimal Human Intervention</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/applications-of-machine-learning-in-fintech">Applications of Machine Learning in FinTech</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-fintech-is-the-future-of-banking">Why Fintech is the Future of Banking?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/real-estate-how-it-is-impacted-by-business-intelligence">Real Estate: How it is Impacted by Business Intelligence</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/determining-how-cloud-computing-benefits-data-science">Determining How Cloud Computing Benefits Data Science</a></li>
</ul>
<p>Enjoy the reading!</p>
<p></p>
<p></p>
Weekly Digest, October 12
tag:www.datasciencecentral.com,2020-10-11:6448529:BlogPost:992588
2020-10-11T22:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><strong>Announcement…</strong></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/36FbDXk">Customized data science workstations equipped with NVIDIA Rapids</a></li>
</ul>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/importance-of-service-mesh-networks-for-scaling-enterprise-ai-1">Importance of Service Mesh Networks for Scaling Enterprise AI Solutions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-rules-of-probability-in-one-picture">5 Rules of Probability in One Picture (Cat and Dog Edition) </a>+</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/no-causation-without-representation">Free book: No Causation without representation!</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-neural-network-zoo">The Neural Network Zoo</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/using-ai-to-super-compress-images">Using AI to Super Compress Images</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/explainable-artificial-intelligence-xai-1">Explainable Artificial Intelligence (XAI)</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/7-reasons-why-flutter-is-development-trend-of-2020">7 Reasons why Flutter is Development Trend of 2020</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-simple-guide-to-ai-machine-learning-and-deep-learning-or-as">A simple guide to AI, Machine Learning and Deep Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/kotlin-vs-flutter-find-your-perfect-fit-for-cross-platform-app-3">Kotlin vs Flutter </a>- Find Your Perfect Fit For Cross-platform App Development</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/natural-language-processing-how-this-innovative-technology-is">NLP in Chatbots</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/example-of-traffic-camera-maintenance-dashboard">Question: Traffic Camera Maintenance Dashboard</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/machine-learning-in-one-picture">Machine Learning with Applications in One Picture</a> +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-simply-deep-yet-convoluted-world-of-supervised-vs">The Convoluted World of Supervised vs Unsupervised Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-middle-east-to-become-the-world-s-leading-ai-hub">The Middle East to Become the World’s Leading AI Hub</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-and-machine-learning-top-priority-with-corporate-executives">AI and Machine Learning: Top Priority with Corporate Executives</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/voice-payment-in-banking-the-new-revolution-in-fintech">Voice Payment in Banking: The New Revolution in Fintech</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-finance-and-banking-mobile-app-is-vital-in-this-digital-era">Why finance and banking mobile app is vital in this digital era?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-iot-is-better-for-monitoring-gas-concentration-levels">Why IoT is Better for Monitoring Gas Concentration Levels?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-is-shaping-the-future-of-appointment-scheduling">AI is Shaping the Future of Appointment Scheduling</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/6-ways-through-which-data-science-in-finance-is-reinventing-the">Data Science in Finance is Reinventing the Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/big-data-how-it-is-reshaping-retail">Big Data: How it is Reshaping Retail</a></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/8024569282?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/8024569282?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>
Thursday News, October 8
tag:www.datasciencecentral.com,2020-10-08:6448529:BlogPost:990578
2020-10-08T20:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of featured articles and resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/36FbDXk">Customized data science workstations equipped with NVIDIA Rapids</a></li>
</ul>
<p><strong>Technical</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/using-ai-to-super-compress-images">Using AI to Super Compress Images…</a></li>
</ul>
<p>Here is our selection of featured articles and resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/36FbDXk">Customized data science workstations equipped with NVIDIA Rapids</a></li>
</ul>
<p><strong>Technical</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/using-ai-to-super-compress-images">Using AI to Super Compress Images</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-simple-guide-to-ai-machine-learning-and-deep-learning-or-as">A simple guide to AI, Machine Learning and Deep Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/kotlin-vs-flutter-find-your-perfect-fit-for-cross-platform-app-3">Kotlin vs Flutter<span> </span></a>- Find Your Perfect Fit For Cross-platform App Development</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/natural-language-processing-how-this-innovative-technology-is">NLP in Chatbots</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/example-of-traffic-camera-maintenance-dashboard">Question: Traffic Camera Maintenance Dashboard</a></li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-middle-east-to-become-the-world-s-leading-ai-hub">The Middle East to Become the World’s Leading AI Hub</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-and-machine-learning-top-priority-with-corporate-executives">AI and Machine Learning: Top Priority with Corporate Executives</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/voice-payment-in-banking-the-new-revolution-in-fintech">Voice Payment in Banking: The New Revolution in Fintech</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-iot-is-better-for-monitoring-gas-concentration-levels">Why IoT is Better for Monitoring Gas Concentration Levels?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/6-ways-through-which-data-science-in-finance-is-reinventing-the">Data Science in Finance is Reinventing the Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/big-data-how-it-is-reshaping-retail">Big Data: How it is Reshaping Retail</a></li>
</ul>
<p>Enjoy the reading!</p>
Weekly Digest, October 5
tag:www.datasciencecentral.com,2020-10-04:6448529:BlogPost:987889
2020-10-04T17:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><strong>Announcements…</strong></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/36xR5j5">See DataRobot in Action </a>- Webinar, October 7</li>
<li><a href="https://dsc.news/2Gjhgzu">Databricks' virtual hands-on lab </a>- October 14</li>
<li><a href="https://dsc.news/2Gh9eHz">Create powerful dashboards that answer questions quickly </a>- Tableau Whitepaper</li>
</ul>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/free-book-artificial-intelligence-foundations-of-computational">Free book - AI: Foundations of Computational Agents</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-need-to-know-those-probability-distributions">Why You Need to Know Those Probability Distributions</a> +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/time-series-forecasting-knn-vs-arima">Time Series Forecasting: KNN vs. ARIMA</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/post-scripting-to-deal-with-complex-sql-queries">Post-scripting to Deal with Complex SQL Queries</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/10-node-js-advantages">10 Node JS Advantages</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-scale-out-milvus-vector-similarity-search-engine/">Vector Similarity Search Engine</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/advantages-and-disadvantages-of-python-for-your-business">Advantages And Disadvantages Of Python For Your Business</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/intersystems-iris-the-all-purpose-universal-platform-for-real">All-Purpose Universal Platform for Real-Time AI/ML</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/graduate-programs-in-healthcare-data-science">Question: Graduate Programs in Healthcare Data Science</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/on-the-nature-of-data-flights-of-birds-and-new-beginnings" target="_blank" rel="noopener">On the Nature of Data, Flights of Birds, and New Beginnings</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/data-detectives">Data Detectives</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/introducing-analytics-to-a-product">Introducing Analytics To A Product</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/personal-updates-and-dsc">Personal updates and DSC</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-secret-weapons-of-fake-news">The Secret Weapons of Fake News</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/here-s-how-to-fix-a-haphazard-data-driven-approach-to-education">How to Fix a Haphazard Data-Driven Approach to Education</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/hardware-appliances-vs-software-defined-storage">Hardware Appliances vs. Software Defined Storage</a></li>
<li><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/which-data-protection-techniques-do-you-need-to-guarantee-privacy">Which data protection techniques do you need to guarantee privacy?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-an-apple-is-changing-the-qsr-industry">How An Apple Is Changing the QSR Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-is-augmented-data-preparation-and-why-is-it-important">What is Augmented Data Preparation and Why is it Important?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-is-the-banking-industry-coping-with-the-digital">How is the banking industry coping digital transformation</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/know-your-support-options-for-dynamics-365">ERP: Know Your Support Options for Dynamics 365</a></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/7999338085?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/7999338085?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>
Thursday News, October 1
tag:www.datasciencecentral.com,2020-10-01:6448529:BlogPost:985963
2020-10-01T18:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of featured resources and articles posted since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/36lexju">Humility in AI: Building Trustworthy and Ethical AI Systems</a></li>
<li><a href="https://dsc.news/2ET3UcA">Databricks' Virtual Hands-on Lab<span> </span></a> - October 14</li>
</ul>
<p><strong>Resources…</strong></p>
<p>Here is our selection of featured resources and articles posted since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/36lexju">Humility in AI: Building Trustworthy and Ethical AI Systems</a></li>
<li><a href="https://dsc.news/2ET3UcA">Databricks' Virtual Hands-on Lab<span> </span></a> - October 14</li>
</ul>
<p><strong>Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/why-you-need-to-know-those-probability-distributions">Why You Need to Know Those Probability Distributions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/time-series-forecasting-knn-vs-arima">Time Series Forecasting: KNN vs. ARIMA</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/post-scripting-to-deal-with-complex-sql-queries">Post-scripting to Deal with Complex SQL Queries</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/advantages-and-disadvantages-of-python-for-your-business">Advantages And Disadvantages Of Python For Your Business</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/intersystems-iris-the-all-purpose-universal-platform-for-real">All-Purpose Universal Platform for Real-Time AI/ML</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/graduate-programs-in-healthcare-data-science">Question: Graduate Programs in Healthcare Data Science</a></li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/personal-updates-and-dsc">Personal updates and DSC</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-secret-weapons-of-fake-news">The Secret Weapons of Fake News</a></li>
<li><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/which-data-protection-techniques-do-you-need-to-guarantee-privacy">Which data protection techniques do you need to guarantee privacy?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-an-apple-is-changing-the-qsr-industry">How An Apple Is Changing the QSR Industry</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-is-augmented-data-preparation-and-why-is-it-important">What is Augmented Data Preparation and Why is it Important?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/know-your-support-options-for-dynamics-365">ERP: Know Your Support Options for Dynamics 365</a></li>
</ul>
<p>Enjoy the reading!</p>
Personal updates and DSC
tag:www.datasciencecentral.com,2020-09-28:6448529:BlogPost:983886
2020-09-28T18:19:17.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>After more than 10 years being involved with Data Science Central, initially as the founder, and most recently being acquired by <a href="https://www.techtarget.com/" rel="noopener" target="_blank">TechTarget</a>, I have decided to pursue new interests. TechTarget now has a great team taking care of DSC, and I will still be involved as a consultant to make sure that everything continues to run smoothly and that the quality standards are maintained and even enhanced. In my new role, I will…</p>
<p>After more than 10 years being involved with Data Science Central, initially as the founder, and most recently being acquired by <a href="https://www.techtarget.com/" target="_blank" rel="noopener">TechTarget</a>, I have decided to pursue new interests. TechTarget now has a great team taking care of DSC, and I will still be involved as a consultant to make sure that everything continues to run smoothly and that the quality standards are maintained and even enhanced. In my new role, I will become a contributor and write articles for DSC, so I will still be visible, even more than before as I find more time to write more articles.</p>
<p>I am happy to announce that TechTarget has just brought on Kurt Cagle as the new Community Manager for DSC. Kurt is a former DSC blogger and also living in WA two miles away from my place where DSC was created (so it will be easy to meet in person to explain all the tricks) and will take over many of my responsibilities, including interactions with the community. As for me, I will be spending more needed time in my new restaurant opening next month, <a href="https://www.parisrestaurantandbar.com/" target="_blank" rel="noopener">Paris Restaurant</a> in Anacortes, WA.</p>
<p>My new upcoming articles will be original, with the same style aimed at explaining sometimes advanced and new, original ML concepts and recipes in layman terms to a large range of analytics professionals: data scientists, executives, decision makers, and consumers of analytic products.</p>
<p>I wish all the best to Kurt and TechTarget. I am very happy to have worked with the stellar TechTarget team after the acquisition, as well as to continue to work with former DSC colleagues who were hired by TechTarget and are still there and happy today. This relationship will continue to grow in the foreseeable future, with TechTarget’s strong commitment to the DSC community.</p>
<p></p>
<p>Best,</p>
<p>Vincent</p>
Weekly Digest, September 28
tag:www.datasciencecentral.com,2020-09-27:6448529:BlogPost:983419
2020-09-27T22:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><strong>Announcement…</strong></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/36bvrkF" target="_self">Reduce time to insight and lower the cost of data science</a></li>
</ul>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/correlation-coefficients-in-data-science-in-one-picture">Correlation Coefficients in Data Science and Machine Learning </a>(in One Picture) +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/gpt3-and-agi-beyond-the-dichotomy-part-two">GPT3 and AGI: Beyond the dichotomy – part two</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/most-useful-c-c-ml-libraries-every-data-scientist-should-know">Most Useful C/C++ ML Libraries Every Data Scientist Should Know</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/common-technical-characteristics-of-some-major-stock-market">Common Technical Characteristics of Major Stock Market Corrections</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-rise-of-gpu-databases">The rise of GPU databases</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/gpt3-and-agi-beyond-the-dichotomy-part-one">GPT3 and AGI: Beyond the dichotomy - part one</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-best-practices-for-putting-machine-learning-models-into">5 Best Practices For Putting Machine Learning Models Into Production</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/is-bert-always-the-better-cheaper-faster-answer-in-nlp-apparently">Is BERT Always the Better Cheaper Faster Answer in NLP? </a>Apparently Not.</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/machine-learning-and-artificial-intelligence-for-business-1">Machine Learning and AI for Business Recovery after COVID 19</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-has-become-so-human-that-you-can-t-tell-the-difference">AI Has Become So Human, That You Can’t Tell the Difference</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/6-best-ai-based-apps-in-2020-a-brief-introduction">6 Best AI-Based Apps in 2020: A Brief Introduction</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-future-of-fake-news">The Future of Fake News</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/4-essential-ai-human-touch-points-for-successful-ai-initiatives">4 Essential AI/Human Touch Points for Successful AI Initiatives</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/driving-business-results-with-artificial-intelligence-services">Driving Business Results with Artificial Intelligence Services</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ain-t-no-such-a-thing-as-a-citizen-data-scientist">Ain’t No Such a Thing as a ‘Citizen Data Scientist’</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-business-intelligence-is-transforming-manufacturing">How BI is transforming manufacturing operations</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-will-blockchain-industry-fare-in-the-immediate-future">How Will Blockchain Industry Fare in the Immediate Future</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/big-data-in-manufacturing-how-it-benefits-the-sector">Big Data in Manufacturing: How it Benefits the Sector</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-many-ways-in-which-healthcare-analytics-enables-better">How Healthcare Analytics Enables Better Patient Care</a></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/7975636081?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/7975636081?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>
DSC Thursday News - September 24
tag:www.datasciencecentral.com,2020-09-24:6448529:BlogPost:981537
2020-09-24T20:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of featured articles and resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/2RIWIT4">Accelerating AI and ML Projects with Graph Algorithms<span> </span></a><span>- Virtual conference</span></li>
</ul>
<p><strong>Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/most-useful-c-c-ml-libraries-every-data-scientist-should-know">Most Useful C/C++ ML Libraries Every Data Scientist…</a></li>
</ul>
<p>Here is our selection of featured articles and resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/2RIWIT4">Accelerating AI and ML Projects with Graph Algorithms<span> </span></a><span>- Virtual conference</span></li>
</ul>
<p><strong>Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/most-useful-c-c-ml-libraries-every-data-scientist-should-know">Most Useful C/C++ ML Libraries Every Data Scientist Should Know</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/gpt3-and-agi-beyond-the-dichotomy-part-one" target="_blank" rel="noopener">GPT3 and AGI: Beyond the dichotomy - part one</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-best-practices-for-putting-machine-learning-models-into">5 Best Practices For Putting Machine Learning Models Into Production</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/is-bert-always-the-better-cheaper-faster-answer-in-nlp-apparently">Is BERT Always the Better Cheaper Faster Answer in NLP?<span> </span></a>Apparently Not.</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/machine-learning-and-artificial-intelligence-for-business-1">Machine Learning and AI for Business Recovery after COVID 19</a></li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-has-become-so-human-that-you-can-t-tell-the-difference" target="_blank" rel="noopener">AI Has Become So Human, That You Can’t Tell the Difference</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/driving-business-results-with-artificial-intelligence-services">Driving Business Results with Artificial Intelligence Services</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ain-t-no-such-a-thing-as-a-citizen-data-scientist">Ain’t No Such a Thing as a ‘Citizen Data Scientist’</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-business-intelligence-is-transforming-manufacturing">How BI is transforming manufacturing operations</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-many-ways-in-which-healthcare-analytics-enables-better">How Healthcare Analytics Enables Better Patient Care</a></li>
</ul>
<p>Enjoy the reading! </p>
Weekly Digest, September 21
tag:www.datasciencecentral.com,2020-09-21:6448529:BlogPost:981086
2020-09-21T00:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><strong>Announcement…</strong></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/2ZS2UN3">How to Automate your Cloud Data Warehouse </a>- Upcoming DSC Webinar</li>
</ul>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/stumped-bayes-theorem">Stumped by Bayes' Theorem? Try This Simple Workaround</a></li>
<li><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/data-driven-innovation-in-healthcare-synthetical-clinical-data">Data-driven innovation in healthcare: synthetical clinical data </a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/intersystems-iris-the-all-purpose-universal-platform-for-real">All-Purpose Universal Platform for Real-Time AI/ML</a> +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/8-smart-ways-to-become-a-data-scientist">8 Smart Ways To Become A Data Scientist</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-programming-guide-with-probability-and-statistics">A Programming Guide with Probability and Statistics</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/call-symput-in-sas-explained-with-examples">CALL SYMPUT in SAS Explained with Examples</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-coursera-series-machine-learning-for-everyone">New Coursera Series: Machine Learning for Everyone</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/namara-dataspec-monitor-the-health-of-any-data">Monitoring the Health of Any Data</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/risk-avoidance-spectrum-and-character-types">Risk Avoidance Spectrum and Character Types</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/algorithms-of-social-manipulation">Algorithms of Social Manipulation</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/3-big-reasons-every-business-must-adopt-devops-and-cloud">3 Big Businesses Must Adopt DevOps and Cloud Computing</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-illusion-of-choice">The illusion of choice</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/iot-s-purview-on-current-environmental-conditions">IoT’s Purview on Current Environmental Conditions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-deep-learning-and-machine-learning">Difference Between Deep Learning and Machine Learning </a>in one Infographics</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-correlation-and-regression-in-statistics">Difference Between Correlation and Regression in Statistics</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/deep-learning-explained-in-4-simple-facts-1">Deep Learning Explained in 4 Simple Facts</a></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/7949771291?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/7949771291?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>
New Coursera Series: Machine Learning for Everyone
tag:www.datasciencecentral.com,2020-09-17:6448529:BlogPost:980787
2020-09-17T17:31:01.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>After three courses, you will be able to:</span></p>
<ul>
<li><strong>Lead ML:</strong><span> </span>Manage or participate in the end-to-end implementation of machine learning</li>
<li><strong>Apply ML:</strong><span> </span>Identify the opportunities where machine learning can improve marketing, sales, financial credit scoring, insurance, fraud detection, and much more</li>
<li><strong>Greenlight ML:</strong><span> </span>Forecast the effectiveness of and scope the requirements for a…</li>
</ul>
<p><span>After three courses, you will be able to:</span></p>
<ul>
<li><strong>Lead ML:</strong><span> </span>Manage or participate in the end-to-end implementation of machine learning</li>
<li><strong>Apply ML:</strong><span> </span>Identify the opportunities where machine learning can improve marketing, sales, financial credit scoring, insurance, fraud detection, and much more</li>
<li><strong>Greenlight ML:</strong><span> </span>Forecast the effectiveness of and scope the requirements for a machine learning project and then internally sell it to gain buy-in</li>
<li><strong>Regulate ML:</strong><span> </span>Manage ethical pitfalls, the risks to social justice that stem from machine learning</li>
</ul>
<p>While there are so many how-to courses for hands-on techies, there are practically none that also serve business leaders – a striking omission, since success with machine learning relies on a very particular business leadership practice just as much as it relies on adept number crunching.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/7938365663?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/7938365663?profile=RESIZE_710x" class="align-center"/></a></p>
<p><strong>NO HANDS-ON AND NO HEAVY MATH.</strong><span> </span>Rather than a hands-on training, this specialization serves both business leaders and burgeoning data scientists alike with expansive, holistic coverage of the state-of-the-art techniques and business-level best practices. There are no exercises involving coding or the use of machine learning software.</p>
<p><strong>IN-DEPTH YET ACCESSIBLE.</strong><span> </span>Brought to you by industry leader Eric Siegel – a winner of teaching awards when he was a professor at Columbia University – this specialization stands out as one of the most thorough, engaging, and surprisingly accessible on the subject of machine learning.</p>
<div class="su-expand su-expand-link-style-default underlinenone"><div class="su-expand-content su-u-trim"><br clear="all"/><strong>Here’s what you will learn:</strong><ul>
<li>How machine learning – aka predictive analytics – works</li>
<li>How it actively improves major business operations to boost business, accumulate clicks, fight fraud, and deny deadbeats</li>
<li>How to report on the increase in profit, ROI, and predictive performance it achieves</li>
<li>What the data needs to looks like</li>
<li>Leadership: gold standard practices for managing a machine learning project</li>
<li>The technical tips and tricks – and how to avoid the most prevalent pitfalls</li>
<li>Whether true artificial intelligence is coming or is just a myth</li>
<li>The risks to social justice that stem from machine learning</li>
</ul>
<p><strong>DYNAMIC CONTENT.</strong><span> </span>Across this range of topics, this specialization keeps things action-packed with case study examples, software demos, stories of poignant mistakes, and stimulating assessments.</p>
<p><strong>WHO IT’S FOR.</strong><span> </span>This concentrated entry-level program is totally accessible to business-level learners – and yet also vital to data scientists who want to secure their business relevance. It’s for anyone who wishes to participate in the commercial deployment of machine learning, no matter whether you’ll do so in the role of enterprise leader or quant. This includes business professionals and decision makers of all kinds, such as executives, directors, line of business managers, and consultants – as well as data scientists.</p>
<p><em>Available <a href="https://www.predictiveanalyticsworld.com/machine-learning-courses/" target="_blank" rel="noopener">here</a></em></p>
</div>
</div>
Thursday News, September 17
tag:www.datasciencecentral.com,2020-09-17:6448529:BlogPost:980794
2020-09-17T17:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>This is our list of featured articles and resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/2ZJ5ULG">New eBook: How prescriptive analytics provides a roadmap<span> </span></a><span>to your revenue target</span></li>
</ul>
<p><strong>Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/stumped-bayes-theorem">Stumped by Bayes' Theorem? Try This Simple Workaround…</a></li>
</ul>
<p>This is our list of featured articles and resources posted since Monday:</p>
<p><strong>Announcement</strong></p>
<ul>
<li><a href="https://dsc.news/2ZJ5ULG">New eBook: How prescriptive analytics provides a roadmap<span> </span></a><span>to your revenue target</span></li>
</ul>
<p><strong>Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/stumped-bayes-theorem">Stumped by Bayes' Theorem? Try This Simple Workaround</a></li>
<li><a href="https://www.analyticbridge.datasciencecentral.com/profiles/blogs/data-driven-innovation-in-healthcare-synthetical-clinical-data">Data-driven innovation in healthcare: synthetical clinical data</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-programming-guide-with-probability-and-statistics">A Programming Guide with Probability and Statistics</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-coursera-series-machine-learning-for-everyone">New Coursera Series: Machine Learning for Everyone</a></li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/iot-s-purview-on-current-environmental-conditions">IoT’s Purview on Current Environmental Conditions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-deep-learning-and-machine-learning">Difference Between Deep Learning and Machine Learning<span> </span></a>in one Infographics</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-correlation-and-regression-in-statistics">Difference Between Correlation and Regression in Statistics</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/deep-learning-explained-in-4-simple-facts-1">Deep Learning Explained in 4 Simple Facts</a></li>
</ul>
<p>Enjoy the reading!</p>
K-Nearest Neighbors (KNN): Solving Classification Problems
tag:www.datasciencecentral.com,2020-09-13:6448529:BlogPost:980505
2020-09-13T23:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p></p>
<p><em class="hm">Originally posted by Michael Grogan. </em></p>
<p><strong>In this tutorial, we are going to use the K-Nearest Neighbors (KNN) algorithm to solve a classification problem.</strong><span> </span>Firstly, what exactly do we mean by classification?</p>
<p>Classification across a variable means that results are categorised into a particular group. e.g. classifying a fruit as either an apple or an orange.</p>
<p>The KNN algorithm is one the most basic, yet most commonly used…</p>
<p></p>
<p><em class="hm">Originally posted by Michael Grogan. </em></p>
<p><strong>In this tutorial, we are going to use the K-Nearest Neighbors (KNN) algorithm to solve a classification problem.</strong><span> </span>Firstly, what exactly do we mean by classification?</p>
<p>Classification across a variable means that results are categorised into a particular group. e.g. classifying a fruit as either an apple or an orange.</p>
<p>The KNN algorithm is one the most basic, yet most commonly used algorithms for solving classification problems. KNN works by seeking to minimize the distance between the test and training observations, so as to achieve a high classification accuracy.</p>
<p><a href="https://mobilemonitoringsolutions.com/wp-content/uploads/2018/09/knn1-1.png"><img src="https://mobilemonitoringsolutions.com/wp-content/uploads/2018/09/knn1-1.png" alt="K-Nearest Neighbors 1" width="640" height="480" class="aligncenter size-full wp-image-8309"/></a></p>
<p>As we dive deeper into our case study, you will see exactly how this works. First of all, let’s take a look at the specific case study that we will analyse using KNN.</p>
<h1>Our case study</h1>
<p>In this particular instance, the KNN is used to classify consumers according to their internet usage. Certain consumers will use more data (in megabytes) than others, and certain factors will have an influence on the level of usage. For simplicity, let’s set this up as a classifiction problem.</p>
<p>Our dependent variable (usage per week in megabytes) is expressed as a<span> </span><strong>1</strong><span> </span>if the person’s usage exceeds 15000mb per week, and<span> </span><strong>0</strong><span> </span>if it does not. Therefore, we are splitting consumers into two separate groups based on their usage (1= heavy users, 0 = light users).</p>
<p>The independent variables (or the variables that are hypothesised to directly influence usage – the dependent variable) are as follows:</p>
<ul>
<li>Income per month</li>
<li>Hours of video per week</li>
<li>Webpages accessed per week</li>
<li>Gender (0 = Female, 1 = Male)</li>
<li>Age</li>
</ul>
<p>To clarify:</p>
<ul>
<li>Dependent variable: A variable that is influenced by other variables. In this case, data usage is being influenced by other factors.</li>
<li>Independent variable: A variable that influences another variable. For instance, the more hours of video a person watches per week, the more this will increase the amount of data consumed.</li>
</ul>
<h2>Load libraries</h2>
<p>Firstly, let’s open up a Python environment and load the following libraries:</p>
<pre><br/>import numpy as np<br/>
import statsmodels.api as sm<br/>
import pandas as pd<br/>
from sklearn.model_selection import train_test_split<br/>
from sklearn.preprocessing import MinMaxScaler<br/>
from sklearn.neighbors import KNeighborsClassifier<br/>
import matplotlib.pyplot as plt<br/>
import mglearn<br/>
import os;</pre>
<p>As we go through the tutorial, the uses for the above libraries will become evident.</p>
<p>Note that I used Python 3.6.5 at the time of writing this tutorial. As an example, if one wanted to install the<span> </span><strong>mglearn</strong><span> </span>library, it can accordingly be installed with the<span> </span><strong>pip</strong><span> </span>command as follows:</p>
<pre><br/>pip3 install mglearn</pre>
<h2>Load data and define variables</h2>
<p>Before we dive into the analysis itself, we will first:</p>
<p>1. Load the CSV file into the Python environment using the<span> </span><strong>os</strong><span> </span>and<span> </span><strong>pandas</strong><span> </span>libraries</p>
<p>2. Stack the independent variables with<span> </span><strong>numpy</strong><span> </span>and<span> </span><strong>statsmodels</strong></p>
<p>Firstly, the file path where the CSV is located is set. The dataset itself can be found here, titled<span> </span><a href="http://www.michaeljgrogan.com/wp-content/uploads/2018/08/internetlogit.csv" rel="noopener" target="_blank">internetlogit.csv</a>.</p>
<pre><br/>path="/home/michaeljgrogan/Documents/a_documents/computing/data science/datasets"<br/>
os.chdir(path)<br/>
os.getcwd()</pre>
<p>Then, we are loading in the CSV file using pandas (or pd – which represents the short notation that we specified upon importing):</p>
<pre><br/>variables=pd.read_csv('internetlogit.csv')<br/>
usage=variables['usage']<br/>
income=variables['income']<br/>
videohours=variables['videohours']<br/>
webpages=variables['webpages']<br/>
gender=variables['gender']<br/>
age=variables['age']</pre>
<p>Finally, we are defining our dependent variable (usage) as<span> </span><strong>y</strong>, and our independent variables as<span> </span><strong>x</strong>.</p>
<pre><br/>y=usage<br/>
x=np.column_stack((income,videohours,webpages,gender,age))<br/>
x=sm.add_constant(x,prepend=True)</pre>
<h2>MaxMinScaler and Train-Test Split</h2>
<p>To further prepare the data for meaningful analysis with KNN, it is necessary to:</p>
<p>1. Scale the data between<span> </span><strong>0</strong><span> </span>and<span> </span><strong>1</strong><span> </span>using a max-min scaler in order for the KNN algorithm to interpret it properly. Failing to do this results in unscaled data given that our dependent variable is between 0 and 1, and the KNN may not necessarily give us accurate results. In other words, if our dependent variable is scaled between 0 and 1, then our independent variables<span> </span><strong>also</strong><span> </span>need to be scaled between 0 and 1.</p>
<p>2. Partition the data into<span> </span><strong>training</strong><span> </span>and<span> </span><strong>test</strong><span> </span>data. In this instance, 80% of the data is apportioned to the training segment, while 20% is apportioned to the test segment. Specifically, the KNN model will be built with the training data, and the results will then be validated against the test data to gauge classification accuracy.</p>
<pre><br/>x_scaled = MinMaxScaler().fit_transform(x)<br/>
x_train, x_test, y_train, y_test = train_test_split(x_scaled, y, test_size=0.2)</pre>
<p>Now, our data has been split and the independent variables have been scaled appropriately.</p>
<p>To get a closer look at our scaled variables, let’s view the<span> </span><strong>x_scaled</strong><span> </span>variable as a pandas dataframe.</p>
<pre><br/>pd.DataFrame(x_scaled)</pre>
<p>You can see that all of our variables are now on a scale between<span> </span><strong>0</strong><span> </span>and<span> </span><strong>1</strong>, allowing for a meaningful comparison with the dependent variable.</p>
<pre><br/>0 1 2 3 4 5<br/>
0 0.0 0.501750 0.001364 0.023404 0.0 0.414634<br/>
1 0.0 0.853250 0.189259 0.041489 0.0 0.341463<br/>
2 0.0 0.114500 0.000000 0.012766 1.0 0.658537<br/>
.. ... ... ... ... ... ...<br/>
963 0.0 0.106500 0.061265 0.014894 0.0 0.073171<br/>
964 0.0 0.926167 0.033951 0.018085 1.0 0.926829<br/>
965 0.0 0.975917 0.222488 0.010638 1.0 0.634146</pre>
<h1>Classification with KNN</h1>
<p>Now that we have loaded and prepared our data, we are now ready to run the KNN itself! Specifically, we will see how the accuracy rate varies as we manipulate the number of nearest neighbors.</p>
<h2>n_neighbors = 1</h2>
<p>Firstly, we will run with 1 nearest neighbor (where n_neighbors = 1) and obtain a training and test set score:</p>
<pre><br/>print (x_train.shape, y_train.shape)<br/>
print (x_test.shape, y_test.shape)<br/>
knn = KNeighborsClassifier(n_neighbors=1)<br/>
model=knn.fit(x_train, y_train)<br/>
model<br/>
print("Training set score: {:.2f}".format(knn.score(x_train, y_train)))<br/>
print("Test set score: {:.2f}".format(knn.score(x_test, y_test)))</pre>
<p>We obtain the following output:</p>
<pre><br/>Training set score: 1.00<br/>
Test set score: 0.91</pre>
<p>With a training set score of 1.00, this means that the predictions of the KNN model as validated on the training data shows 100% accuracy. The accuracy decreases slightly to 91% when the predictions of the KNN model are validated against the test set.</p>
<p>Moreover, we can now visualise this using<span> </span><strong>mglearn</strong>:</p>
<pre>mglearn.plots.plot_knn_classification(n_neighbors=1)<br/>plt.show()</pre>
<p><a href="https://mobilemonitoringsolutions.com/wp-content/uploads/2018/09/knn1-1.png"><img src="https://mobilemonitoringsolutions.com/wp-content/uploads/2018/09/knn1-1.png" alt="knn 1" width="640" height="480" class="aligncenter size-full wp-image-8309"/></a></p>
<h2>n_neighbors = 5</h2>
<p>Now, what happens if we decide to use 5 nearest neighbors? Let’s find out!</p>
<pre><br/>knn = KNeighborsClassifier(n_neighbors=5)<br/>
knn.fit(x_train, y_train)<br/>
print("Training set score: {:.2f}".format(knn.score(x_train, y_train)))<br/>
print("Test set score: {:.2f}".format(knn.score(x_test, y_test)))</pre>
<p>We now obtain a higher test set score of 0.94, with a slightly lower training set score of 0.95:</p>
<pre><br/>Training set score: 0.95<br/>
Test set score: 0.94</pre>
<p>When we analyze this visually, we see that we now have 5 nearest neighbors for each test prediction instead of 1:</p>
<p><a href="https://mobilemonitoringsolutions.com/wp-content/uploads/2018/09/knn5.png"><img src="https://mobilemonitoringsolutions.com/wp-content/uploads/2018/09/knn5.png" alt="knn 5" width="640" height="480" class="aligncenter size-full wp-image-8310"/></a></p>
<p>In this instance, we see that increasing the number of nearest neighbors increased the accuracy rate against our test data.</p>
<h1>Cross Validation</h1>
<p>One important caveat to note.</p>
<p>Given that we have used the train-test split method, there is always the danger that the split data is not random. i.e. the test data may be overly similar to the training data. This would mean that while the KNN model would demonstrate a high degree of accuracy on the training data, this would not necessarily be the case if new data was introduced outright.</p>
<p>In our case, given that the test set score is not that much lower than the training set score, this does not appear to be an issue here.</p>
<p>However, what method could we use to guard against this issue? The most popular one is a method called<span> </span><strong>cross validation</strong>.</p>
<h2>How Does Cross Validation Work?</h2>
<p>Essentially, this works by creating multiple train-test splits (called folds) with the training data. Specifically, the algorithm is trained on k-1 folds while the final fold is referred to as the “holdout fold”, meaning that the final fold is used as the test set.</p>
<p>Let’s see how this works. In this particular instance, cross validation is unlikely to be of use to us here, since both the training and test set score was quite high on our original train-test split.</p>
<p>However, there are many instances where this will not be the case, and cross validation therefore becomes an important tool in splitting and testing our data more effectively.</p>
<p>For this purpose, suppose that we wish to generate<span> </span><strong>7</strong><span> </span>separate cross validation scores. We will first import our cross validation parameters from sklearn:</p>
<pre><br/>from sklearn.cross_validation import cross_val_score, cross_val_predict</pre>
<p>Then, we generate 7 separate cross validation scores based on our prior KNN model:</p>
<pre><br/>scores = cross_val_score(model, x_scaled, y, cv=7)<br/>
print ("Cross-validated scores:", scores)</pre>
<p>Here, we can see that the cross-validated scores do not increase as we add to the number of folds.</p>
<pre><br/>Cross-validated scores: [0.96402878 0.85611511 0.89855072 0.93478261 0.94202899 0.89051095 0.91240876]</pre>
<p>This is expected since we still got quite a high test set score on our original train-test split.</p>
<p>With this being said, cross validation is quite commonly used when there is a large disparity between the training and the test set score, and the technique is quite useful under these circumstances. In other words, if we had a high training and low test set score, it becomes much more likely that the cross validation score would increase with each added fold.</p>
<h1>Summary</h1>
<p>In this tutorial, you have learned:</p>
<ul>
<li>What is a classification problem</li>
<li>How KNN can be used to solve classification problems</li>
<li>Configuring of data for effective analysis with KNN</li>
<li>How to use cross validation to conduct more extensive accuracy testing</li>
</ul>
<p>Many thanks for reading, and feel free to leave any questions in the comments below!</p>
<p><em>Also posted <a href="https://mobilemonitoringsolutions.com/k-nearest-neighbors-knn-solving-classification-problems/" target="_blank" rel="noopener">here</a></em></p>
Weekly Digest, September 14
tag:www.datasciencecentral.com,2020-09-13:6448529:BlogPost:980474
2020-09-13T22:30:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><strong>Announcements…</strong></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/32c7Mhu">30-Day Trial: Pentaho Data Integration Business Analytics</a></li>
<li><a href="https://dsc.news/3mgK102" target="_blank" rel="noopener">TIBCO Connected Experience 2020</a> - September 22-24 (Now Online)</li>
</ul>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-is-the-connection-between-ai-cloud-native-and-edge-devices">What is the connection between AI, Cloud-Native and Edge devices?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/micromasters-the-fast-way-to-get-into-data-science">MicroMasters: The Fast Way to Get Into Data Science</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/studying-risk-based-algomorphology-using-thunderbird-charts">Studying Risk-based Algomorphology Using Thunderbird Charts</a> +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/6-most-important-data-science-skills">6 Most Important Data Science Skills</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/4-steps-to-building-a-video-search-system">4 Steps to Building a Video Search System</a></li>
<li><a href="https://www.education.datasciencecentral.com/">Top Data Science Programs</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-big-data-analytics-certifications-aid-data-analysts">How Big Data Analytics Certifications Aid Data Analysts</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/becoming-a-10x-data-scientist">Becoming a 10x Data Scientist</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">Free Book: Applied Stochastic Processes</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-simple-easy-solution-for-eliminating-bias-from-models">The Simple Easy Solution for Eliminating Bias from Models</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/remarketing-strategically-targeting-your-customers-1">Remarketing: Strategically Targeting Your Customers</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/nearly-33-years-later-nobody-knows-what-triggered-the-crash-of">33 Years Later, Nobody Knows What Triggered the Crash of 1987</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/opinion-is-a-phd-helpful-for-a-data-science-career">Opinion: Is a PhD helpful for a data science career?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/5-challenges-to-be-prepared-for-before-scaling-machine-learning">5 Challenges When Scaling Machine Learning Models</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/hospitality-sector-amp-iot-a-beneficial-duo-for-everyday-life">Hospitality Sector & IoT: A Beneficial Duo for Everyday Life</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/business-intelligence-how-it-can-transform-manufacturing">Business Intelligence: How It Can Transform Manufacturing</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/chros-commitments-towards-people-analytics-an-outlook">CHROs’ Commitments Towards People Analytics: An Outlook</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/leveraging-digital-engagement-to-modernize-the-customer">Leveraging Digital Engagement to Modernize the Customer Experience</a></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/7919701865?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/7919701865?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>
Thursday News, September 10
tag:www.datasciencecentral.com,2020-09-10:6448529:BlogPost:980212
2020-09-10T18:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of featured articles and resources posted since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/32c7Mhu">30-Day Trial: Pentaho Data Integration Business Analytics</a></li>
<li><a href="https://dsc.news/33a5VsM">Visit the SAS Data Science Experience Page</a></li>
</ul>
<p><strong>Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/micromasters-the-fast-way-to-get-into-data-science">MicroMasters:…</a></li>
</ul>
<p>Here is our selection of featured articles and resources posted since Monday:</p>
<p><strong>Announcements</strong></p>
<ul>
<li><a href="https://dsc.news/32c7Mhu">30-Day Trial: Pentaho Data Integration Business Analytics</a></li>
<li><a href="https://dsc.news/33a5VsM">Visit the SAS Data Science Experience Page</a></li>
</ul>
<p><strong>Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/micromasters-the-fast-way-to-get-into-data-science">MicroMasters: The Fast Way to Get Into Data Science</a></li>
<li><a href="https://www.education.datasciencecentral.com/">Top Data Science Programs</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-big-data-analytics-certifications-aid-data-analysts">How Big Data Analytics Certifications Aid Data Analysts</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/becoming-a-10x-data-scientist">Becoming a 10x Data Scientist</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes">Free Book: Applied Stochastic Processes</a></li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/remarketing-strategically-targeting-your-customers-1">Remarketing: Strategically Targeting Your Customers</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/nearly-33-years-later-nobody-knows-what-triggered-the-crash-of">33 Years Later, Nobody Knows What Triggered the Crash of 1987</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/opinion-is-a-phd-helpful-for-a-data-science-career">Opinion: Is a PhD helpful for a data science career?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/chros-commitments-towards-people-analytics-an-outlook">CHROs’ Commitments Towards People Analytics: An Outlook</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/leveraging-digital-engagement-to-modernize-the-customer">Leveraging Digital Engagement to Modernize the Customer Experience</a></li>
</ul>
<p>Enjoy the reading! </p>
Weekly Digest, September 7
tag:www.datasciencecentral.com,2020-09-06:6448529:BlogPost:979778
2020-09-06T22:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. </span></p>
<p><span><strong>Featured Resources and Technical…</strong></span></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span></p>
<p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/obscure-tests-for-model-fitting">Model Fitting Tests You've Probably Never Heard Of</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/which-machine-learning-deep-learning-algorithm-to-use-by-problem">Which ML / deep learning algorithm to use by problem type</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/free-online-book-machine-learning-from-scratch">Free online book - Machine Learning from Scratch</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-design-a-biased-algorithm-insights-from-the-uk">How to design a biased algorithm .. insights from the UK</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-complicated-projects-like-building-a-skyscraper-or-a-rocket">How complicated projects like building a rocket manage so many details?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/cheat-sheets-for-ai-neural-networks-machine-learning-deep">Cheat Sheets for AI, Neural Networks, ML, Deep Learning...</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/smarter-housing-search-and-recommendation-powered-by-milvus">Smarter Housing Search and Recommendation</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/belief-propagation-message-passing-for-classical-and-quantum">Belief Propagation (Message Passing) </a>for Classical and Quantum Bayesian Networks</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/restored-wikipedia-articles-on-computing">Restored Wikipedia articles on computing</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/data-science-opportunities-in-the-age-of-covid">Data Science Opportunities in the Age of COVID</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/explaining-deep-learning-results-artificial-intelligence-outputs">Explaining Deep Learning Results</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/smart-iot-concepts-to-practice-intelligent-farming">Smart IoT Concepts to Practice Intelligent Farming</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/clear-the-confusion-artificial-intelligence-vs-machine-learning">Clear The Confusion: AI vs ML vs Deep Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-three-ai-applications-in-2020-that-are-the-first-choices-of">3 AI Applications In 2020 That Are The First Choices Of Every Business</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/machine-learning-how-it-improves-performance-engineering">Machine Learning: How it Improves Performance Engineering</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/ai-capability-can-reduce-60-efforts-of-hr-teams">AI capability can reduce 60% efforts of HR Teams</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-accelerate-data-platform-integration-using-iot-and-ai">How to accelerate data platform integration using IoT and AI</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/education-software-development-behind-curtains-what-to-know-in-1">Education software development behind curtains</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/impact-of-covid-19-on-artificial-intelligence-in-manufacturing">Impact of COVID-19 on AI In Manufacturing Market</a></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><span><strong><a href="https://storage.ning.com/topology/rest/1.0/file/get/7851119253?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/7851119253?profile=RESIZE_710x" class="align-center"/></a></strong></span></p>
<p style="text-align: center;"><em>Source: article flagged with a + </em></p>
<p>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</p>
Thursday News, September 3
tag:www.datasciencecentral.com,2020-09-03:6448529:BlogPost:979547
2020-09-03T19:00:00.000Z
Vincent Granville
https://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of featured articles and technical resources posted since Monday:</p>
<p><strong>Technical Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/obscure-tests-for-model-fitting">Model Fitting Tests You've Probably Never Heard Of</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/which-machine-learning-deep-learning-algorithm-to-use-by-problem">Which ML / deep learning algorithm to use by problem…</a></li>
</ul>
<p>Here is our selection of featured articles and technical resources posted since Monday:</p>
<p><strong>Technical Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/obscure-tests-for-model-fitting">Model Fitting Tests You've Probably Never Heard Of</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/which-machine-learning-deep-learning-algorithm-to-use-by-problem">Which ML / deep learning algorithm to use by problem type</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-design-a-biased-algorithm-insights-from-the-uk">How to design a biased algorithm .. insights from the UK</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/belief-propagation-message-passing-for-classical-and-quantum">Belief Propagation (Message Passing)<span> </span></a>for Classical and Quantum Bayesian Networks</li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/data-science-opportunities-in-the-age-of-covid">Data Science Opportunities in the Age of COVID</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/smart-iot-concepts-to-practice-intelligent-farming">Smart IoT Concepts to Practice Intelligent Farming</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/clear-the-confusion-artificial-intelligence-vs-machine-learning">Clear The Confusion: AI vs ML vs Deep Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-three-ai-applications-in-2020-that-are-the-first-choices-of">3 AI Applications In 2020 That Are The First Choices Of Every Business</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/machine-learning-how-it-improves-performance-engineering">Machine Learning: How it Improves Performance Engineering</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/impact-of-covid-19-on-artificial-intelligence-in-manufacturing">Impact of COVID-19 on AI In Manufacturing Market</a></li>
</ul>
<p>Enjoy the reading!</p>