Comments - Update about our data science competition - Data Science Central2019-10-20T17:37:10Zhttps://www.datasciencecentral.com/profiles/comment/feed?attachedTo=6448529%3ABlogPost%3A86649&xn_auth=noJean-Francois Puget, PhD, Dis…tag:www.datasciencecentral.com,2013-10-23:6448529:Comment:1127022013-10-23T19:26:59.779ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>Jean-Francois Puget, PhD, Distinguished Engineer, Industry Solutions Analytics and Optimization at IBM found the exact formula and came with a proof. He was awarded a $1,000 prize for his solution. Here’s the result:</p>
<p><b>Theorem</b></p>
<p><i>Let m be the quotient and let r be the remainder of the Euclidean division of n by 4: n = 4m + r, 0 <= r < 4.</i></p>
<p><i>Let p(n) = 6m2 + 3mr + r(r-1)/2.</i></p>
<p><i>Then:</i></p>
<ul>
<li><i>q(n) = p(n), if p(n) is…</i></li>
</ul>
<p>Jean-Francois Puget, PhD, Distinguished Engineer, Industry Solutions Analytics and Optimization at IBM found the exact formula and came with a proof. He was awarded a $1,000 prize for his solution. Here’s the result:</p>
<p><b>Theorem</b></p>
<p><i>Let m be the quotient and let r be the remainder of the Euclidean division of n by 4: n = 4m + r, 0 <= r < 4.</i></p>
<p><i>Let p(n) = 6m2 + 3mr + r(r-1)/2.</i></p>
<p><i>Then:</i></p>
<ul>
<li><i>q(n) = p(n), if p(n) is even</i></li>
<li><i>q(n) = p(n) - 1, if p(n) is odd</i></li>
</ul>
<p>The rather lengthy and complicated proof can be found at <a>datashaping.com/Puget-Proof.pdf</a>.</p>
<p>The two winners of this competition will be announced this month, on Data Science Central.</p>
<p><strong>Note</strong></p>
<p>Finding an explicit formula for q(n) can be done using algorithms (simulations, smart permutations sampling) that involve processing huge amount of data. Jean-Francois found an exact mathematical solution.This proves that sometimes, mathematical modeling can beat even the most powerful system of clustered computers to find a solution. Though usually, both work hand in hand. </p>
<p>This function q(n) is at the core of a new type of statistical metrics developed for big data: a non-parametric, robust metric to measure a (new type of ) correlation or goodness of fit. This metric generalizes traditional metrics that have been used for centuries, and it is most useful when working with large ordinal data series, such as rank data. While based on rank statistics, it is much less sensible to outliers than current metrics based on rank statistics (Spearman’s rank correlation) which was designed for rather small n, where it is indeed very robust.</p>