All Blog Posts Tagged 'spatial' - Data Science Central2019-12-05T18:15:14Zhttps://www.datasciencecentral.com/profiles/blog/feed?tag=spatial&xn_auth=noVisually Explained: How Can Executives Grasp What Programming Is All About?tag:www.datasciencecentral.com,2019-12-05:6448529:BlogPost:9128612019-12-05T15:30:00.000ZRafael Knuthhttps://www.datasciencecentral.com/profile/RafaelKnuth
<p><span>Quite often, non-technical executives have difficulties understanding what programming, on a very fundamental level, is all about. Because of that knowledge-gap, they tend to hire and overburden experienced data professionals with tasks which they are hopelessly overqualified for. Such as, for example, doing ad-hoc SQL queries on CRM data: "You're the go-to-guy for all things data, and we need the results for the board meeting tomorrow." That's a quite humbling and frustrating…</span></p>
<p><span>Quite often, non-technical executives have difficulties understanding what programming, on a very fundamental level, is all about. Because of that knowledge-gap, they tend to hire and overburden experienced data professionals with tasks which they are hopelessly overqualified for. Such as, for example, doing ad-hoc SQL queries on CRM data: "You're the go-to-guy for all things data, and we need the results for the board meeting tomorrow." That's a quite humbling and frustrating experience for anyone who calls himself a Data Scientist, even for a freshman.</span></p>
<p><span>On the other hand, though, the non-technical staff is literally scared of programming. Writing code is often viewed as an esoteric occupation which only a few chosen people are qualified for. Because of that common misconception, companies miss out on opportunities to upskill their current employee base to solve data-related tasks that the previously mentioned Data Scientists should not be bothered with.</span></p>
<p><span><a href="https://media.giphy.com/media/L2xCYxAC7ZS0g1dgxW/giphy.gif" target="_blank" rel="noopener"><img src="https://media.giphy.com/media/L2xCYxAC7ZS0g1dgxW/giphy.gif?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p><strong>Programming = Automation</strong></p>
<p><span>On a very fundamental level, programming is about automating repeatable tasks. Let's say, you want to load boxes onto trucks, depending on characteristics such as their color. How would you go about this?</span></p>
<p><span>You can write a program that loops through a list of boxes. If a box is red, it will be loaded onto the red truck. Else, if it's blue, like in this case, it should go into the blue truck. Obviously, computer programs are capable of solving by far more sophisticated tasks than this one here. However, even if you look at a complex machine learning model, you can understand a decent portion of what it does based on the many if-then-rules being applied to the underlying data.</span></p>
<p><strong>Programming = Automation + Democratization</strong></p>
<p><span>Let's take the blue and red box example one step further. What if you have to sort your boxes by way more characteristics, such as the content of the boxes, its weight, value, destination, planned delivery date, etc. Do you need to write that code from scratch? What if someone else already solved the exact same problem?</span></p>
<p><span>Open-source programs such as Python are developed, maintained, and constantly enhanced by a large community of very thoughtful individuals. "Why should one reinvent the wheel each and every time?" many of them would reason. That's why numerous solutions to complex programming problems are bundled into so-called libraries.</span></p>
<p><span>What if you want to analyze data using Python? Leverage on the Pandas library. You want to visualize data? Take Matplotlib. The amount of know-how bundled into libraries is just mind-blowing. Take TensorFlow, for example, a popular deep learning library for Python. If you want to train your very first object detection deep learning model, you can get the job done with as few as six lines of code. Yes, six!</span></p>
<p><span>Programming is not only about automation, but also about democratization. Why is this so important? Only a few tech-companies can afford to hire large data science teams and solve complex machine learning and deep learning problems. Luckily, many big tech-companies package a huge portion of their cumulative intellectual effort into libraries for popular programming languages such as Python. Virtually anyone can leverage on the brainpower of, for example, Google, the company that developed TensorFlow.</span></p>
<p><strong>Getting started</strong></p>
<p><span>If you want to popularize the utilization of programming languages within your company: How do you get started? Let me draw another analogy. With programming languages like Python, it's very much like with chess. You can learn the rules very quickly. How do you move the queen, rooks, bishops, knights, etc. on the chessboard? This is no rocket science, and you can immediately start playing. However, if you aim to become really good at this game, it will take you years of continuous practice. Nonetheless, you can leverage on the many strategies developed by grandmasters of chess, and apply them to your game.</span></p>
<p><span>Programming, like chess, is very simple on a very fundamental level and very complex at the top level of the game. It's not only complex, and it's not made only for a few highly-sophisticated players. It's a game aimed at everyone willing to join.</span></p>
<p><em><span>I work in the field of Data & Technology Literacy. Please leave a comment, shoot me an email at rafael@knuthconcepts.com, or reach out to me on</span></em> <span><a href="https://www.linkedin.com/in/rafaelknuth/"><em>LinkedIn</em></a><em>.</em></span></p>
<p><span> </span></p>
<p><span> </span></p>
<p><span> </span></p>
<p><span> </span></p>
<p><span> </span></p>
<p><span> </span></p>
<p><span> </span></p>No Matter What You Call It, It’s all the Same Thingtag:www.datasciencecentral.com,2019-12-05:6448529:BlogPost:9128222019-12-05T00:12:11.000ZWilliam Vorhieshttps://www.datasciencecentral.com/profile/WilliamVorhies
<p><strong><em>Summary:</em></strong><em> A little history lesson about all the different names by which the field of data science has been called, and why, whatever you call it, it’s all the same thing.</em></p>
<p> </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3755457727?profile=original" rel="noopener" target="_blank"><img class="align-right" src="https://storage.ning.com/topology/rest/1.0/file/get/3755457727?profile=RESIZE_710x" width="450"></img></a> A little reminiscence, or for those of you who are only recently data scientists, a little history lesson. </p>
<p>Our profession of…</p>
<p><strong><em>Summary:</em></strong><em> A little history lesson about all the different names by which the field of data science has been called, and why, whatever you call it, it’s all the same thing.</em></p>
<p> </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3755457727?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3755457727?profile=RESIZE_710x" width="450" class="align-right"/></a>A little reminiscence, or for those of you who are only recently data scientists, a little history lesson. </p>
<p>Our profession of finding the signal in the data, be that supervised or unsupervised got underway in the 90s. In the last 20+ years we’ve been called by a variety of names. It’s not at all clear that as those names changed that any clarity was added. </p>
<p>In fact, for a profession as concerned with accuracy as we are, we’ve done a pretty poor job at naming things. Take ‘Big Data’ for instance. Not really about ‘big’ at all. Just as much about velocity and variety as it is about volume. Or NoSQL which has pretty much lost its meaning since all those NoSQL DBs now run SQL just fine. Or ‘artificial intelligence’. That term has been high jacked by the press and developers to put the sheen of AI on pretty much everything we do.</p>
<p>So just for fun, here’s a brief recap of all the names that have been used to describe what we do.</p>
<p> </p>
<p><span style="font-size: 12pt;"><strong>KDD (Knowledge Discovery in Databases):</strong> </span> This is the oldest title I can personally remember to describe what we do. Coined by Gregory Piatetsky-Shapiro in 1989. We still had both feet firmly planted in BI and its retrospective point of view but already there were the inklings that the same data could also tell us many things about the future.</p>
<p> </p>
<p><span style="font-size: 12pt;"><strong>Data Mining:</strong></span> DM was originally intended to describe what went on in KDD, but like ‘AI’ the term was widely adopted in the business press and became the more popular descriptor up through the mid-2000s.</p>
<p> </p>
<p><span style="font-size: 12pt;"><strong>Predictive Modeling:</strong></span> When I first got involved in data science in 2001, Predictive Modeling was the preferred term. It more accurately described what we were doing and the tools we were using to predict future behavior and future values. </p>
<p>It caught me by surprise that Gartner and the other review agencies changed that name almost immediately to ‘<strong>Predictive Analytics’</strong>. This was at the time when data viz began to play a more important role and if you read the reports from that period it looks like Gartner and the others generalized the name to ‘Analytics’ to allow the data viz platforms a place at the table.</p>
<p> </p>
<p><span style="font-size: 12pt;"><strong>Prescriptive Analytics:</strong></span> In 2014 Gartner once again got involved in changing definitions by introducing Prescriptive Analytics as separate from Predictive Analytics. Gartner’s definition says we should differentiate what ‘could happen’ (predictive) versus what ‘should happen’ (prescriptive). I admit that I still see this as <em><u><a href="https://www.datasciencecentral.com/profiles/blogs/prescriptive-versus-predictive-analytics-a-distinction-without-a">a distinction without a difference</a></u></em> since ‘prescriptive’ is merely ‘predictive’ with some optimization math applied. That’s something we had been doing all along.</p>
<p> </p>
<p><span style="font-size: 12pt;"><strong>Machine Learning:</strong> </span> The term ML actually predates KDD but it only came into common use in the last half of the 2000s. As our techniques for supervised and unsupervised learning became more diverse with the adoption of SVMs, ensemble methods, and the rebirth of ANNs, there was renewed focus on the fact that many new techniques belonged in the tent provided they met the original criteria of discovering patterns in the data without being explicitly programmed to do so.</p>
<p>2006 marks the beginning of the NoSQL age with open source Hadoop that allowed us to begin to apply ML techniques to unstructured and semi-structured text and image data. ML still describes the most commonly adopted business applications of data science through scoring models and forecasting and the source of most of the value currently created by data science.</p>
<p> </p>
<p><span style="font-size: 12pt;"><strong>Deep Learning:</strong></span> To data scientists, the term Deep Learning or its more explicit title <strong>Deep Neural Nets (DNNs)</strong> was an outgrowth of the introduction of NoSQL DBs and the rapidly increasing compute capacity of advanced chips and the cloud. To data scientists, DL/DNN was and is the tool set that enabled what we came to see as artificial intelligence.</p>
<p>It took from the advent of open source Hadoop in 2006 until about 2016 or 2017 to reach human-capable levels of speech, text, and image recognition, the cornerstones of AI.</p>
<p>It’s also worth noting that the original field of ML continued to innovate better and better algorithms through about 2016. Very little in the way of major break throughs has occurred in ML since that time and for the last several years both ML and AI have been in mature implementation and value harvesting phases.</p>
<p> </p>
<p><span style="font-size: 12pt;"><strong>Artificial Intelligence (AI)</strong>: </span> By 2017 the term “AI” had been fully appropriated by the press, the public, and even by developers. It evolved into a generic phrase literally defined as:</p>
<p><em>Anything that makes a decision or takes an action that a human used to take, or helps a human make a decision or take an action.</em></p>
<p>So as you have conversations with potential users today you need to have that up-front qualifying conversation about ‘what do you really mean when you say you want an AI solution’. The great majority of implementations continue to be Machine Learning and recently, at least within the data science profession, there’s been a return to more accurately labeling this as <strong>“ML/AI”</strong>.</p>
<p> </p>
<p><span style="font-size: 12pt;"><strong>Is This Really All the Same Thing?</strong></span></p>
<p>There is a wonderful set of just five questions that describes everything we do with our models in data science.</p>
<ol>
<li>Is this A or B?</li>
<li>Is this weird?</li>
<li>How much – or – How many?</li>
<li>How is this organized?</li>
<li>What should I do next?</li>
</ol>
<p>I apologize that I’m unable to find the author’s name to credit. This simple summary drives home the fact that no matter what techniques you’re using, and whether you’re deep in DNNs or in equally sophisticated ML techniques like XGBoost, that what we do has a very common and easy to understand purpose, regardless of what you call it.</p>
<p> </p>
<p> </p>
<p><a href="https://www.datasciencecentral.com/profiles/blog/list?user=0h5qapp2gbuf8"><em><u>Other articles by Bill Vorhies</u></em></a></p>
<p> </p>
<p>About the author: Bill is Contributing Editor for Data Science Central. Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001. His articles have been read more than 2 million times.</p>
<p>He can be reached at:</p>
<p><a href="mailto:Bill@DataScienceCentral.com">Bill@DataScienceCentral.com</a> <span>or</span> <a href="mailto:Bill@Data-Magnum.com">Bill@Data-Magnum.com</a></p>
<p><span> </span></p>Free open access book on Industry 4.0, factory automation and Edgetag:www.datasciencecentral.com,2019-12-03:6448529:BlogPost:9126142019-12-03T21:20:19.000Zajit jaokarhttps://www.datasciencecentral.com/profile/ajitjaokar
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3753699287?profile=original" rel="noopener" target="_blank"><img class="align-full" src="https://storage.ning.com/topology/rest/1.0/file/get/3753699287?profile=RESIZE_710x"></img></a></p>
<p></p>
<p><strong>The Digital Shopfloor: Industrial Automation in the Industry 4.0 Era</strong> looks like a great free open access book by John Soldatos, Oscar Lazaro and Franco Antonio Cavadini</p>
<p>The book deals with the transformation of the shop floor and the wider supply chain by the deployment of Industrial IoT</p>
<p> </p>
<p>Some of…</p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3753699287?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3753699287?profile=RESIZE_710x" class="align-full"/></a></p>
<p></p>
<p><strong>The Digital Shopfloor: Industrial Automation in the Industry 4.0 Era</strong> looks like a great free open access book by John Soldatos, Oscar Lazaro and Franco Antonio Cavadini</p>
<p>The book deals with the transformation of the shop floor and the wider supply chain by the deployment of Industrial IoT</p>
<p> </p>
<p>Some of the themes covered by the book include:</p>
<p> </p>
<ul>
<li>Introduction to Industry 4.0 and the Digital Shopfloor Vision</li>
</ul>
<p> </p>
<ul>
<li>Open Automation Framework for Cognitive Manufacturing</li>
</ul>
<p> </p>
<ul>
<li>Reference Architecture for Factory Automation using Edge Computing and Blockchain Technologies</li>
</ul>
<p> </p>
<ul>
<li>Communication and Data Management in Industry 4.0</li>
</ul>
<p> </p>
<ul>
<li>A Framework for Flexible and Programmable Data Analytics in Industrial Environments</li>
</ul>
<p> </p>
<ul>
<li>Model Predictive Control in Discrete Manufacturing Shopfloors</li>
</ul>
<p> </p>
<ul>
<li>Modular Human–Robot Applications in the Digital Shopfloor Based on IEC-61499</li>
</ul>
<p> </p>
<ul>
<li>Digital Models for Industrial Automation Platforms</li>
</ul>
<p> </p>
<ul>
<li>A Centralized Support Infrastructure (CSI) to Manage CPS Digital Twin, towards the Synchronization between CPS Deployed on the Shopfloor and Their Digital Representation</li>
</ul>
<p> </p>
<ul>
<li>Ecosystems for Digital Automation Solutions an Overview and the Edge4Industry Approach</li>
</ul>
<p> </p>
<p>The book link is <a href="https://www.riverpublishers.com/research_details.php?book_id=676">The Digital Shopfloor: Industrial Automation in the Industry 4.0 Era</a></p>
<p> </p>Weekly Digest, December 2tag:www.datasciencecentral.com,2019-12-02:6448529:BlogPost:9120282019-12-02T00:00:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. …</span></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span><span><a href="https://www.datasciencecentral.com/profiles/blogs/weekly-digest-september-16-1"></a></span></p>
<p><strong>Announcements</strong></p>
<ul>
<li><span>Data-informed decisions can lead to greater business success. Take the lead in your organization and your career with an online Certificate in Business Analytics from Michigan State University. With two certificate track options, you can pick the path that’s right for you. <a href="https://ad.doubleclick.net/ddm/trackclk/N510001.2006708DATASCIENCECENTRA/B20336635.261028304;dc_trk_aid=456953459;dc_trk_cid=125128041;dc_lat=;dc_rdid=;tag_for_child_directed_treatment=;tfua=" target="_blank" rel="noopener">Start your future today!</a></span></li>
<li><span>Would you like to learn the mathematics behind machine learning to enter the exciting fields of data science and artificial intelligence? This book will get you started in machine learning in a smooth and natural way, preparing you for more advanced topics and dispelling the belief that machine learning is complicated, difficult, and intimidating. <a href="https://gum.co/VVZsI" target="_blank" rel="noopener">Get eBook</a>.</span></li>
</ul>
<div><p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/chaos-attractors-in-machine-learning-systems">Variance, Attractors and Behavior of Chaotic Statistical Systems</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-family-of-generalized-gaussian-distributions">New Family of Generalized Gaussian or Cauchy Distributions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/list-of-quantum-clouds">List of Quantum Clouds</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-standard-deviation-and-standard-error-in-one-p" target="_self">Standard Deviation versus Standard Error in One Picture</a> +</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/r-package-install-troubleshooting">R Package Install Troubleshooting</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/visually-explained-three-excel-core-features-even-excel-pros-don">Three Excel Core-Features Even Excel-Pros Don't Know</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/introduction-to-markov-chains-what-are-markov-chains-when-to-use">Introduction to Markov Chains</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-evaluate-probability-of-a-commercial-opportunity">Evaluating the probability of winning a commercial opportunity</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/critical-tools-used-in-the-data-science-domain">Critical tools used in the Data Science Domain</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/pos-tagger-for-french-langage-python">Question: NLP with POS Tagger, for French language</a></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/on-being-a-50-year-old-data-scientist">On Being a 50 Year Old Data Scientist</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/industry-averages-benchmarks-and-the-death-of-innovation">Industry Averages, Benchmarks and the Death of Innovation</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-logjam-in-ai-ml-platforms-is-about-to-complicate-your-life">The Logjam in AI/ML Platforms is About to Complicate Your Life</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-hires-make-the-best-data-scientists">What hires make the best data scientists?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/top-7-data-science-use-cases-in-administration">Top 7 Data Science Use Cases in Administration</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/best-artificial-intelligence-technologies-to-know-in-2019">Best Artificial Intelligence Technologies to know in 2019</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-likelihood-principle-the-mvue-ghosts-cakes-and-elves">The Likelihood Principle, the MVUE, Ghosts, Cakes and Elves</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/microsoft-dynamics-365-how-it-changes-a-manufacturing-business">Microsoft Dynamics 365: How it Changes Manufacturing Business</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-can-learning-analytics-do-for-accessibility-and-for-disabled">Analytics to improve learning experience for disabled learners</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-spotify-know-a-lot-about-you-using-machine-learning-and-ai">How Spotify know a lot about you using machine learning and AI</a></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><span><strong><a href="https://storage.ning.com/topology/rest/1.0/file/get/3750661416?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3750661416?profile=RESIZE_710x" class="align-center"/></a></strong></span></p>
<p style="text-align: center;"><span><em>Source: article flagged with a + </em></span></p>
<p style="text-align: center;"></p>
<p><strong>From our Sponsors</strong></p>
<ul>
<li><span><a href="https://dsc.news/33n52Mp" target="_blank" rel="noopener">Tune Your Machine Learning Algorithm</a> - Dec 5</span></li>
<li><span><a href="https://dsc.news/33L5bcL" target="_blank" rel="noopener">Automating Regulatory Compliance with Data Wrangling</a> - Dec 10</span></li>
<li><span><a href="https://dsc.news/2pw1kl2" target="_blank" rel="noopener">Real-Time Analytics at Scale with High Velocity Data</a> - Dec 12</span></li>
<li><span><a href="https://dsc.news/2QBhyEC" target="_blank" rel="noopener">From Degas to Dashboards: Lessons of the Great Masters</a> - Dec 17</span></li>
<li><a href="https://dsc.news/2J5nrFB" target="_blank" rel="noopener"></a><a href="https://www.datasciencecentral.com/group/announcements/forum/topics/the-state-of-ai-bias-in-2019" target="_blank" rel="noopener">The State of AI Bias in 2019</a></li>
<li><a href="https://www.datasciencecentral.com/group/announcements/forum/topics/driving-digital-transformation-using-ai-ml" target="_blank" rel="noopener">Driving Digital Transformation Using AI & ML</a></li>
</ul>
<p><strong>New Books and Resources for DSC Members</strong><span> </span>- [<a href="https://www.datasciencecentral.com/profiles/blogs/new-books-and-resources-for-dsc-members">See Full List</a>]</p>
<ul>
<li><span><a href="https://dsc.news/2IyZgPk" rel="noopener" target="_blank">Getting Started with TensorFlow 2.0</a></span></li>
<li><a href="https://dsc.news/2pZ2aXt" rel="noopener" target="_blank">Online Encyclopedia of Statistical Science</a></li>
<li><a href="https://dsc.news/2IByRkm" rel="noopener" target="_blank">Statistics -- New Foundations, Toolbox, and Machine Learning Recipes</a></li>
<li><span><a href="https://dsc.news/2EbQCo4" rel="noopener" target="_blank">Classification and Regression In a Weekend</a></span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" rel="noopener" target="_blank">Applied Stochastic Processes</a></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/new-book-enterprise-ai-an-applications-perspective" rel="noopener" target="_blank">Enterprise AI - An Applications Perspective</a></span></li>
</ul>
<p style="text-align: center;"></p>
<p><span>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</span></p>
</div>R Package Install Troubleshootingtag:www.datasciencecentral.com,2019-12-01:6448529:BlogPost:9119232019-12-01T15:30:00.000ZAndrea Manero-Bastinhttps://www.datasciencecentral.com/profile/AndreaManeroBastin
<p><em><span>This article was written by Laura Ellis.</span></em></p>
<p><em><span> </span></em></p>
<p><span>One of the reasons why I love R is that I feel like I’m constantly finding out about cool new packages through an ever-growing community of users and teachers. </span></p>
<p><span>To understand the current state of R packages on CRAN, I ran some code provided by Gergely Daróczi on Github . As of today there have been almost 14,000 R packages published on CRAN and the rate of…</span></p>
<p><em><span>This article was written by Laura Ellis.</span></em></p>
<p><em><span> </span></em></p>
<p><span>One of the reasons why I love R is that I feel like I’m constantly finding out about cool new packages through an ever-growing community of users and teachers. </span></p>
<p><span>To understand the current state of R packages on CRAN, I ran some code provided by Gergely Daróczi on Github . As of today there have been almost 14,000 R packages published on CRAN and the rate of publishing appears to be growing at an almost exponential trend. Additionally, there are even more packages available on sources like Github, Bioconductor, Bitbucket and more. </span></p>
<p><span>Last week, I was heading out on a trip. I excitedly planned my air time to do some fun new R tutorials. The night before the flight I attempted to install all required packages or files so that I wasn't struggling with slow plane wifi. Unfortunately the following packages kept giving me install issues: quanteda, magrittr and emo. As per usual, I executed the install.packages() command to install the packages.</span></p>
<p></p>
<p><span><a href="https://images.squarespace-cdn.com/content/v1/58eef8846a4963e429687a4d/1519187666291-QYH7MKR1WIZD7FYZPTA5/ke17ZwdGBToddI8pDm48kO0NvAgWDGv6hkCtpvOO4BF7gQa3H78H3Y0txjaiv_0fDoOvxcdMmMKkDsyUqMSsMWxHk725yiiHCCLfrh8O1z5QPOohDIaIeljMHgDF5CVlOqpeNLcJ80NK65_fV7S1UQhFfjmMD0Hq5z6oNKw_tn5KEFNGRZBMFT12JGNetn4WpC969RuPXvt2ZwyzUXQf7Q/Package+Install+Troubleshooting+Flow+Chart.png?format=1000w" target="_blank" rel="noopener"><img src="https://images.squarespace-cdn.com/content/v1/58eef8846a4963e429687a4d/1519187666291-QYH7MKR1WIZD7FYZPTA5/ke17ZwdGBToddI8pDm48kO0NvAgWDGv6hkCtpvOO4BF7gQa3H78H3Y0txjaiv_0fDoOvxcdMmMKkDsyUqMSsMWxHk725yiiHCCLfrh8O1z5QPOohDIaIeljMHgDF5CVlOqpeNLcJ80NK65_fV7S1UQhFfjmMD0Hq5z6oNKw_tn5KEFNGRZBMFT12JGNetn4WpC969RuPXvt2ZwyzUXQf7Q/Package+Install+Troubleshooting+Flow+Chart.png?format=1000w&profile=RESIZE_710x" class="align-full"/></a></span></p>
<p><span> </span></p>
<p></p>
<p><em>To read the whole article, with examples, click <a href="https://www.littlemissdata.com/blog/r-package-install" target="_blank" rel="noopener">here</a>.</em></p>
<p></p>
<p><span> </span></p>Industry Averages, Benchmarks and the Death of Innovationtag:www.datasciencecentral.com,2019-11-30:6448529:BlogPost:9117772019-11-30T21:00:00.000ZBill Schmarzohttps://www.datasciencecentral.com/profile/BillSchmarzo
<p>Sustaining industry averages and benchmarks are the antithesis of innovation and a great way to ensure average performance. Doing whatever everyone else is doing is a “paving the cow path" management mentality, lacking aspirational goals which are critical for organizations to fuel innovation and create customer and market differentiation. Which brings me to why I teach.</p>
<p>As my students and I work through their “Thinking Like a Data Scientist” exercises together, I always learn…</p>
<p>Sustaining industry averages and benchmarks are the antithesis of innovation and a great way to ensure average performance. Doing whatever everyone else is doing is a “paving the cow path" management mentality, lacking aspirational goals which are critical for organizations to fuel innovation and create customer and market differentiation. Which brings me to why I teach.</p>
<p>As my students and I work through their “Thinking Like a Data Scientist” exercises together, I always learn something new from them. And not one of these students – tomorrow’s business and society leaders – is not interested in being average. They hold high aspirations for themselves and their future.</p>
<p>So, based upon what I am seeing in the classroom (and from several clients who are also not content with just being average), here are Schmarzo’s “7 Keys to Fuel Innovation for 2020.”</p>
<p>Let’s drill each one!</p>
<h2><strong>Key #1: Have an Aspirational Vision</strong></h2>
<p>Every organization needs a “North Star” – an unwavering definition of an organization’s aspirational vision – that guides their business, operational, technology and cultural decisions.</p>
<p>The “3 Horizons of Digital Transformation” defines your organization’s “North Star,” guiding its aspirational journey to create new sources of customer, product and operational value (see Figure 1).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3748971303?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3748971303?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure</strong> <strong><span>1</span></strong><b>: 3 Horizons of Digital Transformation </b></p>
<p>The 3 Horizons concept not only provides an aspirational “Horizon 3” vision, but it also provides a roadmap for how organizations can leverage digital technologies, data and analytics to drive Operational Excellence today (Horizon 1) while building out the digital capabilities to transition to “smart” environments (Horizon 2).</p>
<h2><strong>Key #2:<span> </span> Frame (and Reframe) the Problem</strong></h2>
<p>Invest the time upfront to frame, and then reframe, the problem your organization is trying to solve.<span> </span> Create a common mission statement, vision and language, and the organizational improvisation that can unleash and scale innovation across the organization (see Figure 2).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3748971559?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3748971559?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure</strong> <strong><span>2</span></strong><strong>:</strong> <a href="https://www.datasciencecentral.com/profiles/blogs/scaling-innovation-whiteboards-versus-maps"><strong>Scaling Innovation</strong></a></p>
<p><a href="https://www.datasciencecentral.com/profiles/blogs/scaling-innovation-whiteboards-versus-maps">Scaling innovation</a> means establishing a common language, vocabulary and approach so that there is no confusion about what is being said, and a standard engagement “framework” that frames the value identification, capture and operationalization process.</p>
<p>Scaling innovation means empowering organizational improvisation by creating smaller, autonomous teams who have the charter to achieve their team’s objectives via the ability to move team members in and out of teams, all while maintaining operational integrity and a laser focus on the organization’s aspirational vision.</p>
<h2><strong>Key #3:<span> </span> Blend</strong> <strong>Two Seemingly Unrelated Concepts</strong></h2>
<p>Sometimes the best innovative ideas come from blending two or more loosely coupled but influential approaches in order to seek new synergies and drive new perspectives.<span> </span> Think about the blending data science with economics or blending of data science with design thinking (see Figure 3).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3748971698?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3748971698?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure</strong> <strong><span>3</span></strong><strong>: Blending Data Science and Design Thinking</strong></p>
<p>We find that blending Design Thinking with Data Science yields higher quality and more relevant analytic results. Both Data Science and Design Thinking leverage highly nonlinear thinking processes to discover the criteria for success. Data Science discovers the criteria for success buried in the data (codifying trends, patterns, and relationships) while Design Thinking discovers the criteria for success buried in the human interactions (using personas, journey maps, and storyboards).<span> </span></p>
<h2><strong>Key #4:<span> </span> Simplify and Streamline</strong></h2>
<p>There is probably no more striking example of simplifying than when Steve Jobs, the innovative founder of Apple and Pixar, created the iPod. Sony and many others had created all sorts of clumsy mp3 playing devices that required a user’s guide the size of “War and Peace” to use.<span> </span> Jobs created a playlist on one of those devices and changed the market by understanding the customer journey in detail, and the working hard to simplify.</p>
<p>Simplifying is not easy, and it is NOT “dumbing down”.<span> </span> Simplifying requires intimate knowledge of your customers’ mindsets, and then working to eliminate the areas that reduce enjoyment and highlighting areas that increase enjoyment.<span> </span> A customer journey map is a great tool for gaining this level of customer insights (see Figure 4).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3748971743?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3748971743?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure</strong> <strong><span>4</span></strong><strong>: Customer Journey Map</strong></p>
<h2><strong>Key #5:<span> </span> Exploit Diversity</strong></h2>
<p>To find innovative solutions to customers’ most important problems when leveraging data and analytics requires the organizations to transform their business stakeholders into “Citizens of Data Science” in order to drive the collaborative data science engagement process. Business Stakeholders should not only understand where and how to apply data science to power the business but champion an analytics-based approach toward <strong>value creation</strong> across the entire organization. That’s the goal of my new workbook, “The Art of Thinking Like A Data Scientist", seeks to accomplish (see Figure 5).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3712871487?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3712871487?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure 5: The Art of Thinking Like a Data Scientist </strong></p>
<h2><strong>Key #6:<span> </span> Ensure Relevance</strong></h2>
<p>The business, economic and social good that can be delivered courtesy of data science is almost unbounded; it has the potential to improve healthcare, public safety, transportation, education, environment, manufacturing, communities and the overall quality of life. If what your organization seeks is to exploit the potential of data science to power your business models; then your next question is “How do I achieve that?” That’s the role and potential of Value of Engineering (see Figure 6).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3748972382?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3748972382?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure 6: Value Engineering Framework</strong></p>
<p>If what your organization seeks is to exploit the innovative potential of data and analytics to power your business and operational initiatives; then the Data Science Value Engineering Framework provides the “How” your organization can do it.</p>
<h2><strong>Key #7:<span> </span> Embrace (Monetize) Impediments</strong></h2>
<p>To create that customer-focused mindset, organizations need to invest the time and effort to really understand and experience their customers’ solution journey; to create an “outside-in” approach that identifies, validates, values and prioritizes the sources of customer and market value creation regardless of artificially-defined industry boundaries. And surprisingly, many times it is a solid understanding of the customers' pain points on that journey that provide the impetus for an innovative solution (see Figure 7).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3748972292?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3748972292?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure 7:<span> </span> Seek an Innovative Solution by Monetizing the Pain</strong></p>
<p>The clear opportunity to monetize the pain points was really the “aha” moment for me. While there are certainly opportunities to monetize the high points of a journey, the real monetization opportunities come from mitigating or eliminating the pain points on the customer journey.</p>
<h2><strong>Summary</strong></h2>
<p>Industry averages and industry benchmarks are the antithesis of innovation. Doing whatever everyone else is already doing is a “paving the cow path" management mentality. If you want to move beyond just being average, I’d suggest embracing<span> </span> Schmarzo’s “7 Keys to Fuel Innovation for 2020:”</p>
<ol>
<li>Have an Aspirational Vision</li>
<li>Frame (and Reframe) the Problem</li>
<li>Blend Two Seemingly Unrelated Concepts</li>
<li>Simplify and Streamline</li>
<li>Exploit Diversity</li>
<li>Define Value</li>
<li>Embrace (Monetize) Impediments</li>
</ol>
<p>Enjoy the ride!</p>
<p> </p>On Being a 50 Year Old Data Scientisttag:www.datasciencecentral.com,2019-11-30:6448529:BlogPost:9116722019-11-30T17:30:00.000ZStephanie Glenhttps://www.datasciencecentral.com/profile/StephanieGlen
<p>At the time of writing, I'm a 52 year-old working in the fields of mathematics and data science. In mathematics, that makes me well-seasoned (and probably well-tenured, if I had chosen to continue in academia). In data science, some would consider me a dinosaur. In fact, many older people considering a career in data science might be put off by the thought that data science is tough to break into at a later age. But is that statement true? Should the over 50 crowd put down their textbooks…</p>
<p>At the time of writing, I'm a 52 year-old working in the fields of mathematics and data science. In mathematics, that makes me well-seasoned (and probably well-tenured, if I had chosen to continue in academia). In data science, some would consider me a dinosaur. In fact, many older people considering a career in data science might be put off by the thought that data science is tough to break into at a later age. But is that statement true? Should the over 50 crowd put down their textbooks and pick up their gardening tools?</p>
<h2>Is Math a Young Person's Game? Maybe</h2>
<p>As far as the mathematics portion of my career, I didn't become a mathematician until I was in my mid-thirties. Before that I dabbled with whatever venture brought in a few bob to feed the kids: computer operator, Ebay entrepreneur, aviation electrician. I was 36 when I decided to go back to school to get my master's. If <b><a href="https://www.newyorker.com/magazine/1972/02/19/reflections-mathematics-and-creativity" target="_blank" rel="noopener">Alfred Adler</a></b> is to be believed, my "mathematical life" had already long passed by the time I graduated.</p>
<p><b>" </b></p>
<blockquote>Work rarely improves after the age of twenty-five or thirty. If little has been accomplished by then, little will ever be accomplished. </blockquote>
<p><span>"</span></p>
<p><span>That belief--that mathematics is a young person's game--is often banded about although there are many examples of mathematicians making extraordinary contributions as older ages. For example, famous mathematician Sir Michael Atiyah, born in 1929, is working on the Riemann Hypothesis and came <a href="https://www.sciencemag.org/news/2018/09/skepticism-surrounds-renowned-mathematician-s-attempted-proof-160-year-old-hypothesis" target="_blank" rel="noopener">very close to solving it last year</a> when he was a spry octogenarian. I could throw in many more examples, such as <a href="https://math.unm.edu/~rhersh/menopause.pdf" target="_blank" rel="noopener">this article</a> which has a plethora of statements on how women mathematicians are at their prime in their 30s, 40s, and 50s.</span></p>
<p><span>The belief that math is a young person's game may or not be true, but let's assume for a moment that <em>is</em> true. That if you're an older student in math, you're probably not going to rise to the top of your field. Does that belief also extend to Data Science?</span></p>
<h2>When is a Data Scientist "Past their Prime"?</h2>
<p>Seeing as data science algorithms weren't developed until the late 80s, and assuming you got in on the ground floor, the most experience anyone could possibly have as a data scientist is about 30 years. That gives us very few data points on which to base an argument either way, so let's do what a statistician does when few data points are available, and revert to expert opinion. </p>
<p>David A. Vogan Jr., the chairman of M.I.T.'s math department (as cited in <a href="https://www.westmont.edu/~howell/courses/ma-108/illustrations/past-prime.html" target="_blank" rel="noopener">Lila Guterman's</a> article ), says <strong>experience matters in all sciences</strong> (other than mathematics, where he believed that experience tends to <em>not</em> be a good thing). "<em>In a lot of the sciences, there's a tremendous value that comes from experience and building up familiarity with thousands and thousands of complicated special cases."</em></p>
<p>Or, there's <a href="https://www.jstor.org/stable/225610?seq=1" target="_blank" rel="noopener">this 1946 article</a> which reports the median age when scientists (of any kind) do their best work is 43. That is, <strong>half of people on the list did their best work after the age of 43.</strong> The list was made up of over 4,000 scientists, some famous (some not so famous). Yes, it's an older article, but when you take into consideration that the <a href="https://www.nytimes.com/2017/04/17/science/ranks-of-scientists-aging-faster-than-other-workers.html" target="_blank" rel="noopener">average age of a scientist is rising</a>, it's still very relevant.</p>
<p>Here's <a href="https://www.huffpost.com/entry/science-success-age_n_5824a19ee4b07751c390d9b2" target="_blank" rel="noopener">HuffPost's</a> Formula for Scientific Excellence: <em>"Scientists are likely to do their best work during the time that they’re most productive, and young people generally tend to be more productive. But if a scientist is more productive in the later years of her career, then she’s most likely to have her best work then."</em></p>
<h2>How Old is Too Old to Be a Data Scientist?</h2>
<p>How old is "too old" to be a data scientist? Assuming you have the skill set, there isn't an age limit—even if you're starting from scratch with a degree.</p>
<p>As an example, the age range at the Berkeley School of Information reports that the age range of students in their <a href="https://datascience.berkeley.edu/experience/class-profile/" target="_blank" rel="noopener">online data science program</a> is 21 to 67. The average age is 35, which means there are a <em>lot</em> of students in the upper age group. For <a href="https://analytics.ncsu.edu/?page_id=2807" target="_blank" rel="noopener">NCSU's Master of Science in Analytics</a>, the oldest student is 50.</p>
<h2>The Reality Check</h2>
<p>That said, it's time for a <strong>reality check.</strong> There are several questions you have to ask yourself before you go for that graduate degree. Probably the first is <em>what exactly do you hope to accomplish?</em> If you enter the regular workforce, in a regular job at age 50, that leaves you little time to "rise the ranks." So if you're hoping to work your way from intern to CEO in that time frame, it's probably not going to be possible. On the other hand, if you want to branch out as a freelancer, develop some new algorithms or crunch some data of your own, then the possibilities are definitely there.</p>
<p>Secondly, your financial return on investment is going to be lower than if you were in your 20s. An <a href="https://sps.northwestern.edu/masters/data-science/tuition-costs.php" target="_blank" rel="noopener">MS in DS at Northwestern</a> will cost you $54k. <a href="http://Average%20costs%20for%20attendance%20includes%20$62,000%20for%20undergraduates%20and%20$44,000%20for%20graduates." target="_blank" rel="noopener">Stanford</a> will set you back $15k per quarter (as an undergraduate), and the (relatively cheap) <a href="https://finaid.gatech.edu/current-cost-overview" target="_blank" rel="noopener">Georgia Tech </a>degree will cost $10k per undergraduate semester--if you're in state. These costs don't include room and board, books, or those extra baby sitting fees (or if you're older, lost time with the grandkids!). Are you going to be able to recoup your investment in your remaining working life? You be the judge. </p>
<p><span style="font-size: 1.5em;"><br/> References</span></p>
<p><a href="https://datascience.berkeley.edu/experience/class-profile/" target="_blank" rel="noopener">Berkeley School of Information</a></p>
<p><a href="https://www.westmont.edu/~howell/courses/ma-108/illustrations/past-prime.html" target="_blank" rel="noopener">Are Mathematicians Past Their Prime at 35?</a></p>
<p><span>Alfred Adler. "Mathematics and Creativity." </span><i>The New Yorker Magazine</i><span>, February 19, 1972.</span></p>Thursday News, November 29 - Special Thanksgiving Editiontag:www.datasciencecentral.com,2019-11-29:6448529:BlogPost:9117312019-11-29T18:00:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>Here is our selection of featured articles and technical resources posted since Monday. There is a lot of very interesting material in this edition.</p>
<p><strong>Technical Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/chaos-attractors-in-machine-learning-systems">Variance, Attractors and Behavior of Chaotic Statistical Systems…</a></li>
</ul>
<p>Here is our selection of featured articles and technical resources posted since Monday. There is a lot of very interesting material in this edition.</p>
<p><strong>Technical Resources</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/chaos-attractors-in-machine-learning-systems">Variance, Attractors and Behavior of Chaotic Statistical Systems</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-family-of-generalized-gaussian-distributions">New Family of Generalized Gaussian or Cauchy Distributions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/list-of-quantum-clouds">List of Quantum Clouds</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-standard-deviation-and-standard-error-in-one-p">Difference Between Standard Deviation and Standard Error in One Picture</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/visually-explained-three-excel-core-features-even-excel-pros-don">Three Excel Core-Features Even Excel-Pros Don't Know</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/an-easy-way-to-evaluate-probability-of-a-commercial-opportunity">Evaluating the probability of winning a commercial opportunity</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/critical-tools-used-in-the-data-science-domain">Critical tools used in the Data Science Domain</a></li>
<li><a href="https://www.datasciencecentral.com/forum/topics/pos-tagger-for-french-langage-python">Question: NLP with POS Tagger, for French language</a></li>
</ul>
<p><strong>Articles</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-logjam-in-ai-ml-platforms-is-about-to-complicate-your-life">The Logjam in AI/ML Platforms is About to Complicate Your Life</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-hires-make-the-best-data-scientists">What hires make the best data scientists?</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/top-7-data-science-use-cases-in-administration">Top 7 Data Science Use Cases in Administration</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/best-artificial-intelligence-technologies-to-know-in-2019">Best Artificial Intelligence Technologies to know in 2019</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/the-likelihood-principle-the-mvue-ghosts-cakes-and-elves">The Likelihood Principle, the MVUE, Ghosts, Cakes and Elves</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/microsoft-dynamics-365-how-it-changes-a-manufacturing-business">Microsoft Dynamics 365: How it Changes Manufacturing Business</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/what-can-learning-analytics-do-for-accessibility-and-for-disabled">Analytics to improve learning experience for disabled learners</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/how-spotify-know-a-lot-about-you-using-machine-learning-and-ai">How Spotify know a lot about you using machine learning and AI</a></li>
</ul>
<p><strong>Upcoming Webinars</strong></p>
<ul>
<li><span><a href="https://dsc.news/33n52Mp" target="_blank" rel="noopener">Tune Your Machine Learning Algorithm</a> - Dec 5</span></li>
<li><span><a href="https://dsc.news/33L5bcL" target="_blank" rel="noopener">Automating Regulatory Compliance with Data Wrangling</a> - Dec 10</span></li>
<li><span><a href="https://dsc.news/2pw1kl2" target="_blank" rel="noopener">Edge Computing with Real-time Analytics at Scale</a> - Dec 12</span></li>
<li><span><a href="https://dsc.news/2QBhyEC" target="_blank" rel="noopener">From Degas to Dashboards: Lessons of the Great Masters</a> - Dec 17</span></li>
</ul>
<p>Happy holidays!</p>
<p></p>Best Artificial Intelligence Technologies to know in 2019tag:www.datasciencecentral.com,2019-11-29:6448529:BlogPost:9115562019-11-29T13:35:43.000ZAllen Adamshttps://www.datasciencecentral.com/profile/AllenAdams
<p><span>Technology decision-makers are (and also should keep) seeking methods to successfully carry out artificial intelligence innovations into their businesses and, therefore, drive value. And though all AI innovations most definitely have their own merits, not all of them deserve purchasing, with each passing day we come across a number of <a href="https://www.leewayhertz.com/artificial-intelligence-application-development-company/" target="_self">AI…</a></span></p>
<p><span>Technology decision-makers are (and also should keep) seeking methods to successfully carry out artificial intelligence innovations into their businesses and, therefore, drive value. And though all AI innovations most definitely have their own merits, not all of them deserve purchasing, with each passing day we come across a number of <a href="https://www.leewayhertz.com/artificial-intelligence-application-development-company/" target="_self">AI development</a> techniques.</span></p>
<p><span>If something and also only one thing occurs after you read this write-up, we hope it is that you are inspired to join the 62% of business that boosted their enterprises in 2018 by embracing Artificial Intelligence into their workflow.</span></p>
<p><strong><span>1. Natural Language Generation</span></strong></p>
<p><span>Natural language generation is an AI subdiscipline that converts data right into a message, enabling computer systems to interact ideas with ideal precision.</span></p>
<p><span>It is used in customer care to create reports and market recaps. It supplies by business-like Attivio, Automated Insights, Cambridge Semantics, Digital Reasoning, Lucidworks, Narrative Science, SAS, as well as Yseop.</span></p>
<p><strong>2. Speech recognition</strong></p>
<p><span>Siri is just among the systems that can understand you.</span></p>
<p><span>Each day, an increasing number of systems are created that can transcribe human language, getting to thousands of thousands via voice-response interactive systems and mobile apps.</span></p>
<p><span>A business using speech acknowledgment solutions consist of NICE, Nuance Communications, OpenText, as well as Verint Systems.</span></p>
<p><strong>3. Online Agents</strong></p>
<p><span>A virtual agent is nothing more than a computer system agent or program efficient in communicating with human beings.</span></p>
<p><span>One of the most common instances of this type of innovation is chatbots.</span></p>
<p><span>Virtual representatives are presently being made use of for customer service as well as assistance and as wise home managers.</span></p>
<p><span>Some of the firms that offer virtual representatives consist of Amazon, Apple, Artificial Solutions, Assist AI, Creative Virtual, Google, IBM, IPsoft, Microsoft, as well as Satisfy.</span></p>
<p><strong>4. Machine Learning Platforms</strong></p>
<p><span>These days, computer systems can additionally promptly learn, and they can be incredibly intelligent!</span></p>
<p><span>Artificial intelligence (ML) is a subdiscipline of computer science as well as a branch of AI. Its goal is to create techniques that permit computers to discover.</span></p>
<p><span>By providing formulas, APIs (application shows interface), development as well as training devices, extensive data, applications, and also other devices, ML platforms are acquiring an increasing number of traction daily.</span></p>
<p><span>They are presently generally utilized for prediction and also category.</span></p>
<p><span>Some of the companies marketing ML systems include Amazon, Fractal Analytics, Google, H2O.ai, Microsoft, SAS, Skytree, and also Adex.</span></p>
<p><span>The last one is the first and also just audience administration device worldwide that applies real AI and machine learning to electronic advertising and marketing to discover one of the most rewarding target market or demographic group for any ad. You can find more about it below.</span></p>
<p><strong>5. AI-Optimized Hardware</strong></p>
<p><span>AI modern technology makes equipment much friendlier.</span></p>
<p><span>Exactly how?</span></p>
<p><span>Through new graphics as well as central processing units and processing devices mainly made and structured to perform AI-oriented tasks.</span></p>
<p><span>As well as if you haven't seen them already, anticipate the unavoidable appearance and complete approval of AI-optimized silicon chips that can be placed right into your portable gadgets as well as in other places.</span></p>
<p><span>You can get access to this modern technology through Alluviate, Cray, Google, IBM, Intel, as well as Nvidia.</span></p>
<p><strong>6. Choice Management</strong></p>
<p><span>Intelligent makers can present rules and also reasoning to AI systems so you can utilize them for preliminary setup/training, ongoing maintenance, and even adjusting.</span></p>
<p><span>Choice management has already been integrated into a variety of company applications to aid and also implement the computerized choice, making your service as profitable as feasible.</span></p>
<p><span>Check out Advanced Systems Concepts, Informatica, Maana, Pegasystems, and UiPath for added options.</span></p>
<p><strong>7. Deep Learning Platforms</strong></p>
<p><span>Comprehensive discovering systems utilize a unique kind of ML that involves synthetic neural circuits with different abstraction layers that can simulate the human brain, processing data, and also producing patterns for decision making.</span></p>
<p><span>It is presently generally being utilized to recognize patterns as well as identify applications that are just compatible with massive data sets.</span></p>
<p><span>Deep Instinct, Ersatz Labs, Fluid AI, MathWorks, Peltarion, Saffron Technology, and Sentient Technologies all have low learning alternatives worthy of checking out.</span></p>
<p><strong>8. Biometrics</strong></p>
<p><span>This innovation can recognize, measure, and also assess human behavior and physical aspects of the body's framework as well as type.</span></p>
<p><span>It permits more all-natural communications between humans and also makers, consisting of contacts associated with touch, photo, speech, and even body language acknowledgment, as well as is prominent within the market research area.</span></p>
<p><span>3VR, Affectiva, Agnitio, FaceFirst, Sensory, Synqera, as well as Tahzoo, are all biometrics companies striving to establish this area.</span></p>
<p><strong>9. Robotic Processes Automation</strong></p>
<p><span>Robot procedures automation makes use of scripts as well as methods that mimic as well as automate human tasks to support corporate processes.</span></p>
<p><span>It is specifically helpful for scenarios when working with humans for a details work or task is too expensive or ineffective.</span></p>
<p><span>Again, an excellent example of this is Adext AI, a system that automates electronic advertising and marketing processes utilizing AI, saving services from dedicating hours to mechanical and repetitive tasks.</span></p>
<p><span>It's a remedy that allows you to take advantage of your human ability as well as relocate workers right into more tactical and innovative settings, so their activities can influence the business's development.</span></p>
<p><span>Advanced Systems Concepts, Automation Anywhere, Blue Prism, UiPath, and also WorkFusion are other instances of robot procedures automation business.</span></p>
<p><strong>10. Text Analytics & NLP (Natural Language Processing).</strong></p>
<p><span>This technology utilizes message analytics to comprehend the structure of sentences, along with their significance and also intent, with statistical approaches as well as ML.</span></p>
<p><span>Text analytics and also NLP is for security systems as well as fraudulence discovery.</span></p>
<p><span>They are also being made use of by a substantial array of automated assistants and even applications to remove unstructured data.</span></p>
<p><span>A few of the providers and suppliers of these modern technologies include Basis Technology, Coveo, Expert System, Indico, Knime, Lexalytics, Linguamatics, Mindbreeze, Sinequa, Stratifyd, and Synapsify.</span></p>
<p><strong>11. Digital Twin/AI Modeling.</strong></p>
<p><span>A digital twin is a software program construct that bridges the gap between physical systems and the digital world.</span></p>
<p><span>General Electric (GE), as an example, is constructing an AI labor force to check its aircraft engines, engines, and gas turbines as well as forecast failures with cloud-hosted software program designs of GE's makers. Their digital twins are lines of software program code, but the most intricate versions look like 3-D computer-aided design illustrations packed with interactive graphs, diagrams, as well as data points.</span></p>
<p><span>Business utilizing digital twin and also AI modeling modern technologies consist of VEERUM, which is to safeguard essential infrastructure, and Supply Dynamics, SaaS service to take care of critical material sourcing in complicated, extremely dispersed production environments.</span></p>
<p><strong>12. Cyber Defense.</strong></p>
<p><span>Cyber protection is a local area network defense reaction that focuses on avoiding, discovering, and also giving prompt responses to attacks or risks to facilities and also information.</span></p>
<p><span>AI and also ML is now being used to relocate cyberdefense right into a new transformative phase in response to an increasingly hostile atmosphere: Breach Level Index detected a total of over 2 billion breached records throughout 2017. Seventy-six percent of the data in the study were lost accidentally, and also 69% were an identity burglary sort of breach.</span></p>
<p><span>Recurring semantic networks, which handling series of inputs, can be used in mix with ML strategies to produce monitored discovering modern technologies, which identify dubious customer tasks as well as a spot as much as 85% of all cyber assaults.</span></p>
<p><span>Startups such as Darktrace, which pairs behavioral analytics with sophisticated mathematics to immediately identify irregular habits within companies and Cylance, which uses AI algorithms to quit malware as well as mitigate damages from zero-day attacks, are both working in the area of AI-powered cyber defense.</span></p>
<p><span>DeepInstinct, another cyber protection business, is a deep learning project called "Most Disruptive Startup" by Nvidia's Silicon Valley ceremony, which secures enterprises' endpoints, servers, and also smartphones.</span></p>
<p><strong>13. Conformity.</strong></p>
<p><span>Compliance is the certification or verification that a person or organization meets the demands of approved methods, legislation, rules and also guidelines, standards, or the terms of a contract, and also, there is a significant market that supports it.</span></p>
<p><span>We are currently seeing the initial wave of regulatory conformity remedies that utilize AI to deliver performance through automation as well as complete danger protection.</span></p>
<p><span>Some instances of AI's usage in conformity are showing up across the globe. For example, NLP (Natural Language Processing) solutions can scan governing text as well as match its patterns with a cluster of keywords to identify the changes that relate to an organization.</span></p>
<p><span>Capital stress screening options with predictive analytics as well as situation home builders can help companies stay certified with regulative resources demands. And the volume of deal activities flagged as possible instances of cash laundering can be minimized as deep understanding is made use of to apply progressively advanced organization rules to each one.</span></p>
<p><span>Companies operating in this location consist of Compliance. Retch firm that matches files to an equivalent business function, an international conformity modern technology that supports the monetary services to deal with financial crimes whose copyrighted anticipating analytics platform increases client acceptance prices decrease scams and hand-operated testimonials.</span></p>
<p><strong>14. Knowledge Worker Aid.</strong></p>
<p><span>While some are genuinely concerned about AI changing individuals in the work environment, it allows not fail to remember that AI technology additionally has the possible to vastly help staff members in their work, specifically those in expertise work.</span></p>
<p><span>The automation of expertise work has been listed as the # 2 most disruptive arising technology fad.</span></p>
<p><span>The medical and also legal professions, which are hugely reliant on expertise employees, is where employees have been increasingly taking on AI as a diagnostic tool.</span></p>
<p><span>There is a boosting number of firms dealing with technologies in this field. Kim Technologies, whose aim is to equip knowledge employees who have little to no IT programming experience with the tools to produce brand-new operations and also paper procedures with the help of AI, is one of them. Kyndi is another whose system is designed to assist in understanding workers process large quantities of information.</span></p>
<p><strong>15. Material Creation.</strong></p>
<p><span>Material development now consists of any material people contribute to the online globe, such as video clips, advertisements, posts, white documents, infographics, and various other aesthetic or written assets.</span></p>
<p><span>Brand Names, like USA Today, Hearst, and also CBS, are already utilizing AI to produce their web content.</span></p>
<p><span>Wibbitz, a SaaS device that aids authors in developing video clips from composed material in mins with AI video production modern technology, is a notable instance of a solution from this field. Wordsmith is one more tool, developed by Automated Insights, that applies NLP (Natural Language Processing) to create news stories based upon incomes data.</span></p>
<p><strong>16. Peer-to-Peer Networks.</strong></p>
<p><span>Peer-to-peer networks, in their purest kind, are produced when two or even more PCs connect as well as share resources without the data going through a server computer system.</span></p>
<p><span>But peer-to-peer networks are also used by cryptocurrencies, and have the potential to even fix some of the globe's most challenging problems, by collecting and also examining significant amounts of information, says Ben Hartman, CEO of Bet Capital LLC, to Entrepreneur.</span></p>
<p><span>Nano Vision, a startup that rewards users with cryptocurrency for their molecular information, intends to transform the method we approach risks to human health, such as superbugs, contagious conditions, and also cancer cells, among others.</span></p>
<p><span>A new gamer utilizing peer-to-peer networks and also AI is Presearch, a decentralized online search engine that's powered by the area as well as benefits participants with tokens for a much more transparent search system.</span></p>
<p><strong>17. Feeling Recognition.</strong></p>
<p><span>This modern technology enables the software program to "check out" the feelings on a human face making use of innovative photo handling or sound information processing. We are currently at the point where we can record "micro-expressions," or refined body movement hints, and also vocal intonation that betrays an individual's sensations.</span></p>
<p><span>Law enforcers can utilize this innovation to attempt to spot more info regarding somebody during the investigation. Yet, it likewise has a wide variety of applications for online marketers.</span></p>
<p><span>There are raising varieties of startups working in this area. Past Verbal analyzes, audio inputs to define an individual's character characteristics, consisting of just how favorable, how thrilled, angry, or moody they are. Viso uses emotion video clip analytics to influence brand-new product concepts, identify upgrades, as well as a boost consumer experience. As well as Affectiva's Emotion AI is utilized in the gaming, automobile, robotics, education and learning, health care sectors, and also various other fields to apply face coding as well as feeling analytics from the face and also voice information.</span></p>
<p><strong>18. Picture Recognition.</strong></p>
<p><span>Picture acknowledgment is the process of identifying and also spotting things or feature in an electronic photo or video. AI significantly stacked on top of this innovation to a significant impact.</span></p>
<p><span>AI can look at social media sites platforms for pictures and also contrast them to a variety of data sets to decide which ones are most relevant during photo searches.</span></p>
<p><span>Image recognition innovation can additionally be utilized to find license plates, identify the condition, examine customers and their point of view and also confirm users based on their faces.</span></p>
<p><span>Clarifai gives photo acknowledgment systems for customers to spot near-duplicates and also find comparable uncategorized pictures.</span></p>
<p><span>SenseTime is among the leaders in this sector and establishes face recognition innovation that can put on payment as well as picture analysis for charge card verification and also various other applications. As well as GumGum's goal is to unlock the value of photos as well as video clips produced across the web, making use of AI innovation.</span></p>
<p><strong>19. Advertising and marketing Automation.</strong></p>
<p><span>Marketing departments have benefitted so much from AI thus far, as well as there is a bad faith placed in AI within this sector for an excellent reason. Fifty-five percent of marketers make sure AI will have a more substantial influence in their field that social media has. What a statement.</span></p>
<p><span>Marketing automation enables the business to improve engagement and raise effectiveness to expand profits much faster. It utilizes software to automate client segmentation, consumer information assimilation, and also project administration and streamlines repetitive jobs, permitting strategic minds to get back to doing what they do best.</span></p>
<p><span>Among the leaders in this field is Adext AI, whose audience monitoring system can improve ad invest efficiency by as much as +83% % in just ten days. The software application automates all the processes of campaign administration and also optimization, making daily modifications per ad to super-optimize campaigns and also taking care of budgets across multiple platforms and over several various demographics as well as mini-group groups per advertisement.</span></p>Variance, Attractors and Behavior of Chaotic Statistical Systemstag:www.datasciencecentral.com,2019-11-29:6448529:BlogPost:9073462019-11-29T08:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>We study the properties of a typical chaotic system to derive general insights that apply to a large class of unusual statistical distributions. The purpose is to create a unified theory of these systems. These systems can be deterministic or random, yet due to their gentle chaotic nature, they exhibit the same behavior in both cases. They lead to new models with numerous applications in Fintech, cryptography, simulation and benchmarking tests of statistical hypotheses. They are also related…</p>
<p>We study the properties of a typical chaotic system to derive general insights that apply to a large class of unusual statistical distributions. The purpose is to create a unified theory of these systems. These systems can be deterministic or random, yet due to their gentle chaotic nature, they exhibit the same behavior in both cases. They lead to new models with numerous applications in Fintech, cryptography, simulation and benchmarking tests of statistical hypotheses. They are also related to numeration systems. One of the highlights in this article is the discovery of a simple variance formula for an infinite sum of highly correlated random variables. We also try to find and characterize attractor distributions: these are the limiting distributions for the systems in question, just like the Gaussian attractor is the universal attractor with finite variance in the central limit theorem framework. Each of these systems is governed by a specific functional equation, typically a stochastic integral equation whose solutions are the attractors. This equation helps establish many of their properties. The material discussed here is state-of-the-art and original, yet presented in a format accessible to professionals with limited exposure to statistical science. Physicists, statisticians, data scientists and people interested in signal processing, chaos modeling, or dynamical systems will find this article particularly interesting. Connection to other similar chaotic systems is also discussed. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746584930?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746584930?profile=RESIZE_710x" class="align-center"/></a></p>
<p><span style="font-size: 14pt;"><strong>1. The Geometric System: Definition and Properties</strong></span></p>
<p>We consider the infinite series </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3745723191?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3745723191?profile=RESIZE_710x" class="align-center"/></a>where <em>X</em>(1), <em>X</em>(2) and so on are independently and identically distributed random variables. We use the notation <em>X</em> to represent any of these random variables. Let <em>F</em> denotes the distribution function. This system satisfies the following functional equation:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3745729212?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3745729212?profile=RESIZE_710x" class="align-center"/></a>The <em>k</em>-th moment of Z thus satisfies</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3745733169?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3745733169?profile=RESIZE_710x" class="align-center"/></a></p>
<p>This can be re-written as </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3745735731?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3745735731?profile=RESIZE_710x" class="align-center"/></a></p>
<p>The latter formula can be used to compute all the moments recursively. In particular,</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3745740012?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3745740012?profile=RESIZE_710x" class="align-center"/></a></p>
<p>Note that to have convergence, we need |E(<em>X</em>)| < 1. We also assume here that |E(<em>X</em>^2)| < 1, so that the variance exists.</p>
<p><strong>1.1. Parameter estimation and test for independence</strong></p>
<p>It is pretty amazing that we were able to establish a formula for the variance, despite the fact that all the terms in the infinite series defining <em>Z</em>, are highly correlated. If the terms were independent, as in <em>Z</em> = <em>X</em>(1) + <em>X</em>(2) <em>X</em>(3) + <em>X</em>(4) <em>X</em>(5) <em>X</em>(6) + ..., then we would have Var(<em>Z</em>) = Var(<em>X</em>) / [ (1 - E(<em>X</em>^2)) (1 - E^2(<em>X</em>)) ] instead. This leads to a new test to check if the terms in the series in question are independent or comes from the original model <em>Z</em> = <em>X</em>(1) + <em>X</em>(1) <em>X</em>(2) + <em>X</em>(1) <em>X</em>(2) <em>X</em>(3) + ... The statistic for the test is <em>T</em> = Var(<em>Z</em>) (1 - E(<em>X</em>^2)) (1 - E^2(<em>X</em>)) / Var(<em>X</em>), computed on the data set. It is expected to be equal to 1 only in case of independence, or if (1 - E(<em>X</em>))^2 = 1 - E^2(<em>X</em>) . The latter can happen only if E(<em>X</em>) = 0 (resulting in E(<em>Z</em>) =0) or if E(<em>X</em>) = 1. But E(<em>X</em>) = 1 can be excluded since |E(X)| < 1. Here the notation E^2(<em>X</em>) means E(<em>X</em>) at power two.</p>
<p>To estimate the mean and variance of <em>X</em> if <em>Z</em> = <em>X</em>(1) + <em>X</em>(1) <em>X</em>(2) + <em>X</em>(1) <em>X</em>(2) <em>X</em>(3) + ..., assuming you only observe <em>Z</em>, and <em>X</em> is an hidden variable, use the following formulas: </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3750344243?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3750344243?profile=RESIZE_710x" class="align-center"/></a></p>
<p>A consequence of this formula is that <em>E</em>(<em>Z</em>) > -1/2. For details, see <a href="https://stats.stackexchange.com/questions/438621/if-ex-1-and-ex21-can-we-have-1-ex2-1-ex2" target="_blank" rel="noopener">here</a>. </p>
<p><strong>1.2. Connection to the Fixed-Point Theorem</strong></p>
<p>Let <em>Z</em>(<em>k</em>) = <em>X</em>(<em>k</em>) + <em>X</em>(<em>k</em>) <em>X</em>(<em>k</em>+1) + <em>X</em>(<em>k</em>) <em>X</em>(<em>k</em>+1) <em>X</em>(<em>k</em>+2) + ...We have <em>Z</em>(<em>k</em>) = <em>X</em>(<em>k</em>) (1 + <em>Z</em>(<em>k</em>+1)). As <em>k</em> tends to infinity, <em>Z</em>(<em>k</em>) tends to <em>Z</em>. <span>The convergence is in distribution. So at the limit, <em>Z</em> and <em>X</em>(1 + <em>Z</em>) have the same distribution. Also <em>Z</em>(<em>k</em>) is independent of <em>Z</em>(<em>k</em>+1). In other words, the distribution of <em>Z</em> is a fixed-point of the backward stochastic recurrence equation <em>Z</em>(<em>k</em>) = <em>X</em>(<em>k</em>) (1 + <em>Z</em>(<em>k</em>+1)). Solving for <em>Z</em> amounts to solving a stochastic recurrence equation.</span></p>
<p><span style="font-size: 14pt;"><strong>2. Geometric and Uniform Attractors</strong></span></p>
<p>We focus here on special cases, where <em>X</em> has a discrete distribution. In particular, we prove the following:</p>
<ul>
<li><em>Z</em> has a geometric distribution if and only if <em>X</em> has a Bernouilli distribution.</li>
<li><em>Z</em> has a uniform distribution on [-1, 1] if and only if <em>X</em> has the following distribution: P(<em>X</em> = -0.5) = P(<em>X</em> = 0.5) = 0.5.</li>
<li><em>Z</em> can not have an arbitrary distribution.</li>
</ul>
<p><strong>2.1. General formula</strong></p>
<p>Here <em>X</em> and <em>Z</em> are discrete. Using the notation</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3745809495?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3745809495?profile=RESIZE_710x" class="align-center"/></a></p>
<p>we have, based on the definition of <em>Z</em>:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3745805777?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3745805777?profile=RESIZE_710x" class="align-center"/></a></p>
<p><strong>2.2. The geometric attractor</strong></p>
<p>It is easy to directly prove, using the definition of <em>Z</em>, that if <em>X</em> is Bernouilli(<em>p</em>), then <em>Z</em> is Geometric with parameter 1 - <em>p</em>, that is <em>P</em>(<em>Z</em> = <em>k</em>) = <em>p</em> (1 - <em>p</em>)^<em>k</em>, for <em>k</em> = 0, 1, and so on. Likewise, if <em>Z</em> is geometric, then <em>X</em> must be Bernouilli. To prove this, using the notations of section 2.1, let us assume that <em>P</em>(<em>Z</em> = <em>k</em>) = <em>q</em>(<em>k</em>) = <em>q</em>(0) (1 - <em>q</em>(0))^<em>k</em>, and that P(<em>X</em> = <em>k</em>) = <em>p</em>(<em>k</em>), for <em>k</em> = 0, 1, and so on. <span>The equation <em>p</em>(1)<em>p</em>(0) = <em>q</em>(1) = <em>q</em>(0) (1 - <em>q</em>(0)) combined with <em>p</em>(0) = <em>q</em>(0) yields <em>p</em>(1) = 1 - <em>q</em>(0). As a result, <em>p</em>(0) + <em>p</em>(1) = <em>q</em>(0) + (1 - <em>q</em>(0)) = 1. Thus if <em>k</em> > 1, then <em>P</em>(<em>X</em> = <em>k</em>) = <em>p</em>(<em>k</em>) = 0. This corresponds to a Bernouilli distribution for <em>X</em>. </span></p>
<p><strong>2.3. Not any distribution can be an attractor</strong></p>
<p><span>In the Central Limit Theorem (CLT) framework, by far the main attractor is the Gaussian attractor. The few other ones are all <a href="https://www.datasciencecentral.com/profiles/blogs/new-family-of-generalized-gaussian-distributions" target="_blank" rel="noopener">stable distributions</a> with infinite variance. Here we have far more attractors, for instance the geometric distribution. Yet not any distribution can be an attractor. For instance, if <em>Z</em> and <em>X</em> have a discrete distribution, using the formulas in section 2.1, we must have <em>p</em>(1) = <em>q</em>(1) / <em>p</em>(0) = <em>q</em>(1) / <em>q</em>(0). Thus we must have </span><em>q</em>(1) ≤ <em>q</em>(0) for <em>Z</em> to be an attractor. In short any discrete distribution with <em>P</em>(<em>Z</em> = 0) < <em>P</em>(<em>Z</em> = 1) can not be an attractor. Also, any distribution (discrete or continuous) with E(<em>Z</em>) < -1/2 can not be an attractor: see section 1.1 for the explanation. </p>
<p>We also found in section 2.2 that there is only one distribution for <em>X</em> that leads to the geometric distribution for <em>Z</em>. Just like in the CLT framework, there is only distribution for <em>X</em> leading to the Cauchy attractor: <em>X</em> must be Cauchy itself. This raises an interesting question: can two different distributions (for <em>X</em>) lead to the same attractor? The answer appears to be negative here (in contrast with the CLT framework), but I haven't proved it. In section 2.4. we show that there is only one possible distribution for <em>X</em>, leading to the uniform attractor. </p>
<p><strong>2.4. The uniform attractor</strong></p>
<p><span>If <em>X</em> is such that <em>P</em>(<em>X</em> = -0.5) = <em>P</em>(<em>X</em> = 0.5) = 0.5, then <em>Z</em> is uniform on [-1, 1]. This is easy to prove using the formula for the moments, in section 1. In particular, <em>E</em>(<em>Z</em>^<em>k</em>) = 1 / (<em>k</em> + 1) if <em>k</em> is odd, and 0 otherwise. Also, E[(1 + <em>Z</em>)^<em>k</em>] = 2^<em>k</em> / (<em>k</em> + 1). We also have E(<em>X</em>^<em>k</em>) = 1 / 2^<em>k</em> if <em>k</em> is odd, and 0 otherwise. Thus the only way to satisfy E(<em>Z</em>^<em>k</em>) = E(<em>X</em>^<em>k</em>) E[(1 + <em>Z</em>)^<em>k</em>] is if all the moments of <em>Z</em> are those of a uniform distribution on [-1, 1]. Thus <em>Z</em> has the prescribed distribution. </span></p>
<p><span>To prove the converse, note that in order for <em>Z</em> to be uniform on [−1,1] then we must have E(<em>X</em>^<em>k</em>) = 1 / 2^<em>k</em> if <em>k</em> i</span>s odd, 0 otherwise. The only distribution having these moments is the distribution discussed at the very beginning of section 2.4.</p>
<p><span style="font-size: 14pt;"><strong>3. Discrete <em>X</em> Resulting in a Gaussian-looking Attractor</strong></span></p>
<p><span>In section 2.2, we saw that if <em>X</em> is Bernouilli, then <em>Z</em> has a discrete distribution. By slightly modifying the Bernouilli distribution in section 2.4, we obtained a continuous uniform distribution for <em>Z</em>. Here we consider another discrete distribution for <em>X</em>, not very different from a Bernouilli, to obtain a continuous, Gaussian-looking distribution for <em>Z</em>. Specifically, we work with <em>X</em> defined by</span></p>
<p style="text-align: center;"><span><em>P</em>(<em>X</em> = -1 ) = <em>P</em>(<em>X</em> = -0.5) = <em>P</em>(<em>X</em> = 0.5) = P(<em>X</em> = 1) = 0.25.</span></p>
<p style="text-align: left;"><span>The functional equation in section 1 can be re-written as</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746162440?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746162440?profile=RESIZE_710x" class="align-center"/></a></p>
<p>Here <em>f</em> represents the continuous density attached to <em>Z</em>. It satisfies <em>f</em>(<em>z</em>) = <em>f</em>(-<em>z</em>) and <em>f</em>(0) = <em>f</em>(1) = <em>f</em>(-1). We also have</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746167352?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746167352?profile=RESIZE_710x" class="align-center"/></a></p>
<p>The empirical percentile distribution attached to <em>Z</em> is pictured below.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746169327?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746169327?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><span><strong>Figure 1</strong>: <em>Z percentiles if P(X = -1 ) = P(X = -0.5) = P(X = 0.5) = P(X = 1) = 0.25</em></span></p>
<p><span>Despite the appearances, it is possible that the distribution of <em>Z</em>, though visually very smooth, may be differentiable nowhere, and that the density, at least in classical terms, does not exist. I conjecture that it does in this example. For instance, in a similar stochastic system based on infinite nested square roots, it was found that some attractors (the limiting distributions) are <em>almost smooth</em> but with a distribution differentiable nowhere, despite being continuous mostly everywhere: see <a href="https://www.datasciencecentral.com/profiles/blogs/math-fun-infinite-nested-radicals-of-random-variables" target="_blank" rel="noopener">here</a> for details. This is especially the case when the seed distribution (<em>X</em> in this case) is discrete, particularly when it takes only on a finite number of values. In some cases, even the support domain for <em>Z</em> has gaps, sometimes visible, large ones: see <a href="https://math.stackexchange.com/questions/1702353/distribution-of-infinite-nested-radicals-with-random-terms" target="_blank" rel="noopener">here</a>. However, here the distribution and its support domain look well behaved, and even very well approximated by a uniform distribution for <em>z</em> between -1 and 1. </span></p>
<p><span>Similar distributions have been analyzed by David Bailey in his book <em>Experimental Mathematics in Action</em>, published in 2007. In particular, sections 5.2 and 5.3 (pages 114-137) are very relevant to this context. One of the densities he has studied, namely 2<em>qf</em>(<em>z</em>) = <em>f</em>((<em>z</em>-1) / <em>q</em>) + <em>f</em>((<em>z</em>+1) / <em>q</em>), is very similar to the one I studied in my article <a href="https://www.datasciencecentral.com/profiles/blogs/a-strange-family-of-statistical-distributions" target="_blank" rel="noopener">A Strange Family of Statistical Distributions</a>, and the functional equation for <em>f</em> is somewhat similar to the one discussed in this section. All these problems end up in attractors and functional equations like the one discussed here.</span></p>
<p><strong>3.1. Towards a numerical solution</strong></p>
<p>A possible way to find a numerical solution is as follows. Rather than focusing on the density <em>f</em>, we focus on its distribution. Build a sequence of distributions that are piecewise uniform on the support domain, starting in this particular case with</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746221343?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746221343?profile=RESIZE_710x" class="align-center"/></a></p>
<p>At iteration <em>n</em> > 1, the approximated solution (distribution) is piecewise linear on <em>n</em> disjoint contiguous intervals (these intervals eventually cover all the real numbers as <em>n</em> tends to infinity). Its value is always between 0 and 1, and it must be a strictly increasing function. It is chosen to minimize an error criterion, defined as</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746226553?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746226553?profile=RESIZE_710x" class="align-center"/></a></p>
<p>In short, you want the <em>n</em>-th iterate to be as close as possible to solving the theoretical functional equation of this system, while being piecewise uniform on <em>n</em> domains, and that successive iterations reduce the error, eventually to zero as <em>n</em> tends to infinity. Here <em>X</em> has the distribution specified at the beginning of section 4, and <em>g</em>(<em>X</em>, <em>Z</em>) = <em>X</em> (1 + <em>Z</em>) is at the core of the functional equation defined at the very beginning of section 1. <span>Of course this assumes that the solution satisfying these constraints is unique. It also assumes that the algorithm in question converges to the solution. Techniques about how to solve this problem are described in my article <a href="https://www.datasciencecentral.com/profiles/blogs/decomposition-of-statistical-distributions-using-mixture-models-a" rel="nofollow noreferrer">New Perspectives on Statistical Distributions and Deep Learning</a>.</span></p>
<p><span style="font-size: 14pt;"><strong>4. Special Cases with Continuous Distribution for <em>X</em></strong></span></p>
<p>This section can be skipped. It features two interesting exercises, with solution, to help the reader dig deeper into the material presented so far.</p>
<p><b>4.1.</b> <strong>An almost perfect equality</strong></p>
<p>Let <em>X</em> = sin(<span>π</span><em>Y</em>) with <em>Y</em> being Normal(0, 1). Simulate Z, and estimate its variance based on 20,000 deviates. Prove that Var(<em>Z</em>) is not equal to 1, despite the very strong empirical evidence.</p>
<p><strong>Solution</strong></p>
<p>For simulations, you can use the techniques offered <a href="https://www.datasciencecentral.com/forum/topics/simulating-distributions-with-one-line-of-code" target="_blank" rel="noopener">here</a>. Note that E(<em>X</em>) = 0, thus Var(<em>Z</em>) = E(<em>X</em>^2) / (1 - E(<em>X</em>^2)). Also,</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3745965385?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3745965385?profile=RESIZE_710x" class="align-center"/></a></p>
<p>Thus Var(<em>Z</em>) = 1.00000020... and it is different from 1. For the details of the computation, see <a href="https://www.wolframalpha.com/input/?i=integrate+x%5E2+sin%5E2+%28pi+x%29+exp%28-x%5E2%2F2%29+dx+from+x%3D-infty+to+infty" target="_blank" rel="noopener">here</a>. </p>
<p><b>4.2.</b> <strong>Is the log-normal distribution an attractor?</strong></p>
<p>Using simulated values of <em>Z</em> based on <em>X</em> being log-normal, show that <em>Z</em> looks almost like log-normal too. However, it is just a good approximation, but not an exact solution. Prove that <em>Z</em> is not log-normal. </p>
<p><strong>Solution</strong></p>
<p>For simulations, you can use the techniques offered <a href="https://www.datasciencecentral.com/forum/topics/simulating-distributions-with-one-line-of-code" target="_blank" rel="noopener">here</a>. I solved this exercise using <em>X</em> = exp(<em>Y</em>) / 5, with <em>Y</em> being Normal(0, 1). Thus <em>X</em><span> is log-normal and has the following moments (see <a href="https://en.wikipedia.org/wiki/Log-normal_distribution" target="_blank" rel="noopener">here</a>): E(<em>X</em>^<em>k</em>) = exp(<em>k</em>^2 / 2) / 5^<em>k</em>, for <em>k</em> = 0, 1, 2, and so on. Since E(<em>Z</em>^<em>k</em>) = E(<em>X</em>^<em>k</em>) E[ (1 + <em>Z</em>)^<em>k </em>], it is possible to compute the exact value of E(<em>Z</em>^<em>k</em>). The approximate values are</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746096869?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746096869?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p>These values are obtained using the following formulas:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746094063?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746094063?profile=RESIZE_710x" class="align-center"/></a></p>
<p><span>Now let us assume that <em>Z</em> is log-normal. We must have E(Z^<em>k</em>) = exp( <em>kμ</em> + 0.5 (<em>kσ</em>)^2 ) for some parameters <em>μ</em> and <em>σ</em>. Solving for exp( <em>μ</em> + 0.5 <em>σ</em>^2 ) = 0.4919678... and exp( 2<em>μ</em> + 0.5 (2<em>σ</em>)^2 ) = 0.8324035... yields <em>μ</em> = -1.3269649... and<em> </em>0.5 <em>σ</em>^2 = 0.61762300...</span></p>
<p><span>Now E(<em>Z</em>^3) = exp( 3<em>μ</em> + 0.5 (3<em>σ</em>)^2 ) = 4.8438607... This is very different from the value computed earlier (namely, E(<em>Z</em>^3) = 12.7967051...) and thus we must conclude that <em>Z</em> can not be log-normal.</span></p>
<p><span style="font-size: 14pt;"><strong>5. Connection to Binary Digits and Singular Distributions</strong></span></p>
<p>Here we explore cases that are far more chaotic than the well behaved examples of sections 2 and 3. These cases have a common denominator with other chaotic statistical systems, and we explore some of these connections.</p>
<p><strong>5.1. Numbers made up of random digits</strong></p>
<p>Let's revisit the case discussed in section 2.4, with <span><em>P</em>(<em>X</em> = -0.5) = <em>P</em>(<em>X</em> = 0.5) = 0.5. We proved that the attractor <em>Z</em> has a uniform distribution on [-1, 1]. Now, let</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746379324?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746379324?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p><span>Here the <em>B</em>(<em>k</em>)'s are independent Bernouilli variables with parameter <em>p</em>, that is, <em>P</em>(<em>B</em>(<em>k</em>) = 1) = <em>p</em>. Note that the <em>B</em>(<em>k</em>)'s are the binary digits of a random number 1 + <em>Z</em> in [0, 2]. Then we have</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746386079?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746386079?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p>Conversely, for <em>k</em> <span>≥ 0, we have</span> </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746389004?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746389004?profile=RESIZE_710x" class="align-center"/></a></p>
<p>This establishes the reciprocity and one-to-one mapping between the binary numeration system and the representation of <em>Z</em> introduced at the beginning of section 1. If you pick up a number <em>Z</em> at random, its binary digits are independently and identically distributed with a Bernoulli distribution of parameter <em>p</em> = 0.5. Such numbers, called <a href="https://en.wikipedia.org/wiki/Normal_number" target="_blank" rel="noopener">normal numbers</a>, are thus uniformly distributed, as the main result in section 2.4 suggests. Non-normal numbers are extremely rare: the set of non-normal numbers has Lebesgue measure zero. Examples include the case when p <span>≠ 1/2. We explore this case in the next section. </span></p>
<p><strong>5.2. Singular distributions</strong></p>
<p>In most cases, if <em>X</em> is discrete, then <em>Z</em>'s distribution is nowhere differentiable. <span>A typical example is the following: if <em>P</em>(<em>X</em> = -0.5) =0.25 and <em>P</em>(<em>X</em> = 0.5) = 0.75, then see below the percentile distribution for <em>Z</em>, it's clearly corresponds to a <a href="https://en.wikipedia.org/wiki/Singular_distribution" target="_blank" rel="noopener">singular distribution</a> on [-1, 1]. </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746407672?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746407672?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>Z percentiles if P(X = -0.5) =0.25 and P(X = 0.5) = 0.75</em></p>
<p>This is the same type of distribution as the one featured in section 1 in this <a href="https://www.datasciencecentral.com/profiles/blogs/a-strange-family-of-statistical-distributions" target="_blank" rel="noopener">article</a>, associated with a random number that does not have 50% of its digits equal to 1 in base 2. </p>
<p>In such cases where the distribution of <em>Z</em> is difficult to identify, one might use the <a href="https://en.wikipedia.org/wiki/Mellin_transform" target="_blank" rel="noopener">Mellin transform</a> M(<em>Z</em>) to solve the functional equation, if both <em>X</em> and <em>Z</em> are positive. Here we have M(<em>Z</em>) = M(<em>X</em>) M(1 + <em>Z</em>). Mellin transforms are similar to characteristic functions (themselves based on Fourier transforms). If <em>X</em> and <em>Y</em> are independent, then M(<em>XY</em>) = M(<em>X</em>) M(<em>Y</em>), while the characteristic function CF(<em>X</em> + <em>Y</em>) is equal to the product CF(<em>X</em>) CF(<em>Y</em>). See also the Wikipedia article on the product distribution, <a href="https://en.wikipedia.org/wiki/Product_distribution" target="_blank" rel="noopener">here</a>. However, as mentioned in section 3.2 in <a href="https://www.datasciencecentral.com/profiles/blogs/math-fun-infinite-nested-radicals-of-random-variables" target="_blank" rel="noopener">this article</a>, neither the Mellin nor the Characteristic Function may exist in such irregular cases.</p>
<p>By contrast with Figure 2, see below the percentile distribution (corresponding to a uniform distribution for <em>Z</em>) if P(<em>X</em> = -0.5) = P(<em>X</em> = 0.5) = 0.5.</p>
<p style="text-align: center;"><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746440427?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746440427?profile=RESIZE_710x" class="align-center"/></a><strong>Figure 3</strong><em>: Z percentiles if P(X = -0.5) = P(X = 0.5) = 0.5</em></p>
<p><strong>5.3. Connection to Infinite Random Products</strong></p>
<p>Here we discuss a totally different system, but closely related to our discussion in section 5.1. The purpose is to show how striking the similarities are, between what appears at first glance to be two unrelated systems. </p>
<p><span>Every real number <em>Z</em> in [1, 2] can be represented as a product of distinct factors of the form 1 / 2^<em>k</em>, that is:</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746468803?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746468803?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p>The algorithm to find the <em>X</em>(<em>k</em>)'s is a simple version of the <a href="https://en.wikipedia.org/wiki/Greedy_algorithm" target="_blank" rel="noopener">greedy algorithm</a>. <span>It is also related to Feynman's algorithm, see </span><a href="https://en.wikipedia.org/wiki/Logarithm#Feynman's_algorithm">here</a>. If all the <em>X</em>(<em>k</em>)'s are equal to 1, then <em>Z</em> = 2.384231029... and 1 - 1/Z is Pell's constant: see <a href="https://experimentalmath.wordpress.com/mathematical-constants/">here</a>, <a href="http://mathworld.wolfram.com/PellConstant.html">here</a> and <a href="http://oeis.org/A141848">here</a>. A number can have two representations, a standard and a non-standard one, for instance:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746483493?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746483493?profile=RESIZE_710x" class="align-center"/></a></p>
<p>More precisely:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746484829?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746484829?profile=RESIZE_710x" class="align-center"/></a></p>
<p>The standard one is on the left-hand side. As usual, we use the notation <em>X</em> to represent a generic <em>X</em>(<em>k</em>), since those are also (by definition) all independently and identically distributed. Below is the distribution (CDF) of <em>Z</em>, in blue, if <em>X</em> is Bernouilli(0.5). The red curve represents its approximation by a logarithmic function.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746491714?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746491714?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 4</strong>: <em>Z's CDF (blue) versus log approximation (red) for Bernouill(0.5) infinite products</em></p>
<p>The approximation for Z's CDF in Figure 4 is equal to (log <em>z</em>) / (log <span><em>λ</em>), and the support domain is [1, <em>λ</em>] where <em>λ</em> = 2.384231029... is the constant discussed earlier in section 5.3. Now this is becoming very interesting: compare this chart with the one obtained for continued radicals in section 2.2 <a href="https://www.datasciencecentral.com/profiles/blogs/math-fun-infinite-nested-radicals-of-random-variables" rel="nofollow noreferrer">in this article</a>. How similar! Below is the error term, that is, the difference between the log approximation and the exact CDF. </span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746506162?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746506162?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p style="text-align: center;"><strong>Figure 5</strong>: <em>Difference between the log approximation (red curve) and true CDF (blue curve) in Figure 4</em></p>
<p>Below is the percentile distribution if this time <em>P</em>(<em>X</em> = 0) = 5/6 and <em>P</em>(<em>X</em> = 1) = 1/6. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746512939?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746512939?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 6</strong>: <em>Percentile distribution, but this time P(X = 0) = 5/6 and P(X = 1) = 1/6</em></p>
<p>Figure 6 shows the exact same type of distribution as in figure 2, despite the fact that the two systems are unrelated. Now, if <em>P</em>(<em>X</em> = 0) = <em>P</em>(<em>X</em> = 1) = <em>P</em>(<em>X</em> = 2) = <em>P</em>(<em>X</em> = 3) = 0.25, then the percentile distribution for <em>Z</em> looks very smooth, and its inverse S shape (not displayed here) is very similar to that in Figure 1.</p>
<p><span style="font-size: 14pt;"><strong>6. A General Classification of Chaotic Statistical Distributions</strong></span></p>
<p>Over the last few years, I have analyzed many systems similar to the ones discussed here. A summary table can be found <a href="https://www.datasciencecentral.com/profiles/blogs/number-representation-systems-explained-in-one-picture" target="_blank" rel="noopener">here</a>. Many but not all are related to numeration systems. For details see my book <a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" target="_blank" rel="noopener">Applied Stochastic Processes</a> published in 2018, as well as Appendix B (Stochastic Processes and Organized Chaos) in my book <a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">Statistics: New Foundations, Toolbox, and Machine Learning Recipes</a> published in 2019. See also the following articles:</p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/math-fun-infinite-nested-radicals-of-random-variables" target="_blank" rel="noopener">Fun Math: Infinite Nested Radicals of Random Variables</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/a-strange-family-of-statistical-distributions" target="_blank" rel="noopener">A Strange Family of Statistical Distributions</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/some-fun-with-the-golden-ratio-time-series-and-number-theory" target="_blank" rel="noopener">Some Fun with Gentle Chaos, the Golden Ratio, and Stochastic Number...</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness" target="_blank" rel="noopener">Fascinating New Results in the Theory of Randomness</a></li>
</ul>
<p>I am currently studying a new system, based on the discrete difference operator. You can see my research progress <a href="https://stats.stackexchange.com/questions/437519/generalized-univariate-normal-distribution-with-k1-parameters" target="_blank" rel="noopener">here</a> and <a href="https://math.stackexchange.com/questions/3445421/limiting-distributions-associated-with-the-difference-operator" target="_blank" rel="noopener">here</a>. It will soon be published in a new article, and all these articles will soon be turned into a new book. I have discovered that all these systems share a number of common features: attractor distributions, functional equations, chaotic statistical distributions, fractal behavior, and auto-correlations (long range or quickly decaying). Below is a first attempt at classifying the various types of attractors found in these systems. </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3746572137?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3746572137?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 7</strong>: <em>very wild percentile distribution related to the infinite, scaled difference operator (source: <a href="https://math.stackexchange.com/questions/3445421/limiting-distributions-attractors-associated-with-the-discrete-difference-oper" target="_blank" rel="noopener">here</a>)</em></p>
<p>In many cases where <em>Z</em> has a continuous distribution, it looks like the limiting distribution can sometimes be normal, or at least very smooth. If <em>Z</em> has a discrete distribution, we have several possibilities, see below. </p>
<ul>
<li>The limiting distribution can be normal or some other regular smooth distribution, either well known or not</li>
<li>The limiting distribution can look very smooth yet be differentiable nowhere, and yet very close to normal or related distribution (see Figure 1)<span> </span></li>
<li>The limiting distribution can look very un-smooth</li>
<li>The limiting distribution can be very peculiar, as in Figure 6, yet sometimes computable</li>
<li>The limiting distribution can be a piecewise linear mix of uniform distributions or some other weird mix like a staircase (see <a href="https://math.stackexchange.com/questions/3445421/limiting-distributions-attractors-associated-with-the-discrete-difference-oper" target="_blank" rel="noopener">here</a>) </li>
<li>The limiting distribution can be very wild, totally chaotic (see Figure 7 and also <a href="https://math.stackexchange.com/questions/1702353/distribution-of-infinite-nested-radicals-with-random-terms" target="_blank" rel="noopener">here</a>) as if it was a Fourier series with a bunch of missing terms, not unlike the<span> </span><a href="https://en.wikipedia.org/wiki/Weierstrass_function" rel="nofollow noreferrer">Weierstrass function</a>.</li>
</ul>
<p>In some of the chaotic cases, the distribution has a fractal behavior (see section 2.2 in<span> </span><a href="https://www.datasciencecentral.com/profiles/blogs/math-fun-infinite-nested-radicals-of-random-variables" target="_self">this article</a>.)</p>
<p></p>Visually Explained: Three Excel Core-Features Even Excel-Pros Don't Knowtag:www.datasciencecentral.com,2019-11-28:6448529:BlogPost:9113752019-11-28T15:10:49.000ZRafael Knuthhttps://www.datasciencecentral.com/profile/RafaelKnuth
<p><span>Over the last few years, Excel has been redesigned from the ground up. Currently, Microsoft is making the new Excel core-features available to every user, regardless of your Office 365 license. Thanks to the Microsoft naming conventions, it is easy to confuse the new features with existing ones. That being said, Power Query and Power Pivot are not the same things as Pivot Tables, which you have likely been using for years.…</span></p>
<p></p>
<p><span>Over the last few years, Excel has been redesigned from the ground up. Currently, Microsoft is making the new Excel core-features available to every user, regardless of your Office 365 license. Thanks to the Microsoft naming conventions, it is easy to confuse the new features with existing ones. That being said, Power Query and Power Pivot are not the same things as Pivot Tables, which you have likely been using for years.</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3745447239?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3745447239?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p><span><strong>Power Query (M-Language)</strong><br/></span><span>Data preparation is very time-consuming. Power Query allows you to cleanse large amounts of data and make them usable within Excel: for example, a large CSV log file with lots of missing values, weird formats, and errors. Power Query is UI based, it's easy to learn, and if you want to go the extra mile, you can learn its M-Language, which enables you to write custom scripts to automate data cleansing.</span></p>
<p><span><strong>Power Pivot (DAX)</strong><br/></span><span>With Power Pivot, you can structure multiple tables into a star schema, with lookup tables and data tables. DAX (Data Analysis Expression) is a language that enables you to analyze data across your entire model. The learning curve is a bit steep, as you need to understand some important foundational concepts before you start using it. However, it's a good choice if you work with large and complex datasets.</span></p>
<p><span><strong>Cognitive Services</strong><br/></span><span>If you are a Citizen Data Scientist, and you want to consume machine learning and deep learning capabilities without building and training models yourself, Cognitive Services comes to your rescue. It's a collection of (cognitive) vision, knowledge, language, search, and speech services, which can be consumed from within Excel via a REST API service. You can, for example, load customer reviews into Cognitive Services, perform a sentiment analysis and load the results back into Excel.</span></p>
<p><span><strong>My take on the new Excel core-features</strong><br/></span><span>It's a double-edged sword. On the one hand, Excel is being ramped up for an AI and the big-data-driven world we live in. On the other hand, though, Excel is becoming even harder to use for non-experts. In addition to its two established languages, the regular Excel function language (which has no name) and VBA, you now have to familiarize yourself with two new languages: M-Language and DAX.</span></p>
<p><span><strong>What's the alternative?</strong><br/>Realistically speaking, there is no alternative to Excel in the business world. Embrace the new Excel core-features, but don't commit to them 100%. Consider using</span> <span>Python on Jupyter Notebooks in parallel, along with SQL, especially if you have a diverse team with non-technical staff, BI analysts, and data scientists who all need to work jointly on data.</span></p>
<p><span>After all, programming (especially Python) is not as hard to learn as most business people fear. If you can handle Excel, you will do well with Python. Just give yourself and your team time to learn. You will benefit greatly from Python libraries such as Pandas, Matplotlib, Sci-Kit, and TensorFlow.</span></p>
<p><em>I work in the field of Data & Technology Literacy. Please leave a comment, shoot me an email at rafael@knuthconcepts.com or reach out to me on <a href="https://www.linkedin.com/in/rafaelknuth/" target="_blank" rel="noopener">LinkedIn</a>.</em></p>New Family of Generalized Gaussian or Cauchy Distributionstag:www.datasciencecentral.com,2019-11-28:6448529:BlogPost:9107992019-11-28T05:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>The standard definition of a generalized Gaussian distribution can be found <a href="https://en.wikipedia.org/wiki/Generalized_normal_distribution" rel="noopener" target="_blank">here</a>. In this article, we explore a different type of generalized univariate normal distributions that satisfies useful statistical properties, with interesting applications. This new class of distributions is defined by its characteristic function, and applications are discussed in the last section. These…</p>
<p>The standard definition of a generalized Gaussian distribution can be found <a href="https://en.wikipedia.org/wiki/Generalized_normal_distribution" target="_blank" rel="noopener">here</a>. In this article, we explore a different type of generalized univariate normal distributions that satisfies useful statistical properties, with interesting applications. This new class of distributions is defined by its characteristic function, and applications are discussed in the last section. These distributions are semi-stable (we define what this means below). In short it is a much wider class than the <a href="https://en.wikipedia.org/wiki/Stable_distribution" target="_blank" rel="noopener">stable distributions</a> (the only stable distribution with a finite variance being the Gaussian one) and it encompasses all stable distributions as a subset. It is a sub-class of the <a href="https://en.wikipedia.org/wiki/Infinite_divisibility_(probability)" target="_blank" rel="noopener">divisible distributions</a>. The distinctions are as follows:</p>
<ul>
<li>Stable distributions are jointly stable both under addition and multiplication by a scalar</li>
<li>Semi-stable are separately stable under addition and multiplication by a scalar</li>
<li>Divisible distributions are stable under addition</li>
</ul>
<p><strong>1. New two-parameter distribution <em>G</em>(<em>a</em>, <em>b</em>): introduction, properties</strong></p>
<p>Semi-stable distributions can serve as a great introduction to explain the central limit theorem in a simple and elegant way. The family that we investigate here is governed by two parameters <em>a</em> and <em>b</em>. Distributions from that family are denoted as <em>G</em>(<em>a</em>, <em>b</em>). We focus exclusively on symmetrical distributions centered at zero.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744211039?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744211039?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 1</strong>: <em>A special G(a, b) distribution, see section 4</em></p>
<p>By definition, they satisfy the following properties:</p>
<ul>
<li>If <em>X</em> is a random variable with a <em>G</em>(<em>a</em>, <em>b</em>) distribution and <i>v</i> is a real number, then <em>vX</em> has a <em>G</em>(<em>av^<span>β</span></em>, <em>bv</em>) distribution.</li>
<li>If <em>X</em>(1), ..., <em>X</em>(<em>n</em>) are independently and identically distributed <em>G</em>(<em>a</em>, <em>b</em>), then <em>X</em>(1) + ... + <em>X</em>(<em>n</em>) is <em>G</em>(<em>na</em>, <em>b</em>). </li>
</ul>
<p>Here <span><em>β</em> ∈ [1, 2] is a fixed real number, not a parameter of the model. The notation <em>v^β</em> means <em>v</em> at power <em>β</em>. </span><span>As a consequence, we have the following result.</span></p>
<p><strong>2. Generalized central limit theorem</strong></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744121933?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744121933?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p>All these random variables have zero mean and are symmetrical. The case <span><em>β</em> = 2 corresponds to the standard central limit theorem. A fundamental consequence is that if <em>β</em> = 2, then <em>G</em>(<em>a</em>, 0) must be a Gaussian distribution. The notation <em>Z</em> ~ <em>G</em>(<em>a</em>,0) means that the distribution of <em>Z</em> is <em>G</em>(<em>a</em>, 0). The convergence is in distribution. </span></p>
<p><strong>3. Characteristic function</strong></p>
<p><span>The characteristic function <em>ψ</em>(t) of a random variable <em>X</em> uniquely defines its statistical distribution. We consider here CF's that have the following form:</span></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744136961?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744136961?profile=RESIZE_710x" class="align-center"/></a></p>
<p>with the following requirements:</p>
<ul>
<li><em>a</em> is strictly positive</li>
<li><em>h</em> is an even, real-valued function, thus <em>h</em>(<em>t</em>) = <em>h</em>(-<em>t</em>)</li>
<li><em>h</em> is bounded, and the minimum value of <em>h</em> is strictly above zero</li>
<li><em>h</em> is such that it yields a proper CF (one that is <a href="https://en.wikipedia.org/wiki/Positive-definite_function" target="_blank" rel="noopener">positive-definite</a>, according to <a href="https://en.wikipedia.org/wiki/Bochner%27s_theorem" target="_blank" rel="noopener">Bochner's theorem</a>)</li>
</ul>
<p><span>In particular, if <em>β</em> = 1, we are dealing with generalized Cauchy distributions. Here we focus on <em>β</em> = 2, corresponding to generalized Gaussian distributions. If <em>β</em> = 2 and <em>b</em> = 0, then <em>G</em>(<em>a</em>, <em>b</em>) is Gaussian. If <em>β</em> = 1 and <em>b</em> = 0, then <em>G</em>(<em>a</em>, <em>b</em>) is Cauchy. </span></p>
<p><strong>4. Density: special cases, moments, mathematical conjecture</strong></p>
<p>The density is obtained by inverting the characteristic function, in other words, by computing its inverse Fourier transform. Since the density is also symmetrical and centered at zero, no complex numbers are involved, and it simplifies to </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744183880?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744183880?profile=RESIZE_710x" class="align-center"/></a></p>
<p>By construction, it always integrates to 1. If <span><em>β</em> = 1, none of the moments exist. If <em>β</em> is strictly between 1 and 2, the mean is zero but higher moments do not exist. If <em>β</em> = 2 (the case we are interested in) then all the moments exist, and all the odd moments are zero. If <em>β</em> > 2, it can not be a density. </span></p>
<p><span>The only thing that needs to be checked, to guarantee that we are dealing with a proper density, is that <em>f</em> must be positive everywhere. In order for this to be true, the function <em>h</em> must be carefully chosen. We are interested in the following special case exclusively:</span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744203370?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744203370?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p>Here <em>α</em> and <em>λ</em> are two fixed positive real numbers: just like <span><em>β</em>,<em> </em></span>they are not parameters of the model. Based on empirical evidence, I conjecture the following, assuming <span><em>β</em></span> = 2:</p>
<ul>
<li>If <em>α </em>> 2 and <em>λ</em> < 1, then the density is positive everywhere: it is a proper density</li>
<li>If <em>α </em>> 4 and <em>λ</em> < 2, then the density is positive everywhere: it is a proper density</li>
</ul>
<p>The density <em>G</em>(1, 1) corresponding to <em>α</em> = 2, <span><em>β</em> = 2, </span>and <em>λ</em> = 1 is pictured in Figure 1. By contrast, if <em>α =</em> <em>λ</em> = <span><em>β</em> =<em> </em></span>2, then <em>f</em>(13.56) = <em>f</em>(-13.56) = <span>-0.000003388 is the absolute minimum for <em>f</em>. It is below zero, thus <em>f</em> is not a density. </span></p>
<p>All the moments can easily be derived from the characteristic function. Odd moments are zero, and for even moments, we have:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744224804?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744224804?profile=RESIZE_710x" class="align-center"/></a></p>
<p>In particular, assuming <span><em>β</em></span> = 2, Var(<em>X</em>) = 2 <em>α a</em>.<em> </em>It does not depend on λ. </p>
<p><strong>5. Simulations</strong></p>
<p>It is not easy to simulate deviates from <em>G</em>(<em>a</em>, <em>b</em>) using traditional methods based on the characteristic function, such as <a href="https://www.sciencedirect.com/science/article/pii/0898122181900389" target="_blank" rel="noopener">this one</a> developed by Luc Devroye in 1980. I propose here a technique that I believe is simpler. First you need to compute the density function. The following program does that job: </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744297828?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744297828?profile=RESIZE_710x" class="align-center"/></a></p>
<p>You can download the source code <a href="https://storage.ning.com/topology/rest/1.0/file/get/3744911023?profile=original" target="_blank" rel="noopener">here</a>. It computes the density <em>f</em>(<em>x</em>) for 4,000 values of <em>x</em> equally spaced between -20 and +20. The next step is to compute the empirical cumulative distribution based on these 4,000 values: this is straightforward. Then you can use the classic <a href="https://en.wikipedia.org/wiki/Inverse_transform_sampling" target="_blank" rel="noopener">inverse transform method</a>, using the empirical inverse cumulative distribution, to generate the deviates in question. </p>
<p>Another way to simulate this type of distribution is to compute its moments (easy with Mathematica or WolframAlpha) using the formula at the bottom of section 4, see also <a href="https://stats.stackexchange.com/questions/141652/constructing-a-continuous-distribution-to-match-m-moments" target="_blank" rel="noopener">here</a>. Then use the Momentify R package, available <a href="https://statisfaction.wordpress.com/2014/09/20/momentify/" target="_blank" rel="noopener">here</a>. </p>
<p><strong>6. Weakly semi-stable distributions</strong></p>
<p>The semi-stable distributions introduced here satisfy this property: if <em>X</em>(1), ..., <em>X</em>(<em>n</em>) are independently and identically distributed <em>G</em>(<em>a</em>, <em>b</em>), then <em>X</em>(1) + ... + <em>X</em>(<em>n</em>) is <em>G</em>(<em>na</em>, <em>b</em>). A weaker requirement would be that <em>X</em>(1) + <em>X</em>(2) is <em>G</em>(2<em>a</em>, <em>b</em>), resulting in a potentially larger class of distributions. Note that even with this weaker form, the generalized central limit theorem in section 2 may be preserved, unchanged. </p>
<p>We have:</p>
<ul>
<li><em>X</em>(1) + <em>X</em>(2) is <em>G</em>(2<em>a</em>, <em>b</em>)</li>
<li><em>G</em>(2<em>a</em>, <em>b</em>) + <em>G</em>(2<em>a</em>, b) is <em>G</em>(4<em>a</em>, <em>b</em>)</li>
<li><em>G</em>(4<em>a</em>, <em>b</em>) + <em>G</em>(4<em>a</em>, b) is <em>G</em>(8<em>a</em>, <em>b</em>)</li>
</ul>
<p>And so on. Let <em>m</em> = 2^<em>n</em>. Clearly, the main theorem in section 2 is preserved, unchanged, if you replace <em>n</em> by <em>m</em>. Thus if there is convergence, it must be for any <em>n</em>, not just for those that are are a power of 2. And the limiting distribution <em>Z</em> will also be <em>G</em>(<em>a</em>, 0). </p>
<p><strong>7. Counter-example</strong></p>
<p>Building distribution families that meet the requirements of section 1 is not that easy. We try here to build such distributions directly from the density rather than from the characteristic function, and we show that it does not work. The family in question is also governed by two parameters <em>a</em> and <em>b</em> with <i>b</i> > 0. These distributions are also symmetrical and centered at 0, and defined by the following density:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744410942?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744410942?profile=RESIZE_710x" class="align-center"/></a></p>
<p>The characteristic function is as follows:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744430440?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744430440?profile=RESIZE_710x" class="align-center"/></a></p>
<p>If <em>X</em>(1), <em>X</em>(2) are independently and identically distributed, from that same family of distributions, then<a href="https://storage.ning.com/topology/rest/1.0/file/get/3744861059?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744861059?profile=RESIZE_710x" class="align-center"/></a></p>
<p>Clearly, the distribution of <em>X</em>(1) + <em>X</em>(2) can not belong to the same family as <em>X</em>, and thus we are dealing with a family of distributions that is not even weakly semi-stable. The distribution of <em>X</em>, with <em>a</em> = <em>b</em> = 1, is pictured below:</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744876004?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744876004?profile=RESIZE_710x" class="align-center"/></a></p>
<p style="text-align: center;"><strong>Figure 2</strong>: <em>Counter-example</em></p>
<p><strong>8. Applications and conclusions</strong></p>
<p>Stable distributions with fat tails and infinite variance, such as Cauchy or Levy, have been used in financials models when the Gaussian law fails, see <a href="https://en.wikipedia.org/wiki/Financial_models_with_long-tailed_distributions_and_volatility_clustering" target="_blank" rel="noopener">here</a>. The class of distribution proposed here generalizes these stable distributions, yet still offers properties that make them easy to handle especially as far as the asymptotic behavior is concerned. The case <span><em>β</em> < 2 provides a fat tail and infinite variance. These distributions are also used in geopolitics, economics, and risk modeling: see <a href="https://en.wikipedia.org/wiki/Fat-tailed_distribution" target="_blank" rel="noopener">here</a>. The case <em>β</em> = 2 encompasses the Gaussian distribution, and yields a finite variance. </span></p>
<p><span>Likewise, divisible distributions such as Poisson have been used in a number of contexts. Most recently new divisible distributions were devised to model counts in data sets (see <a href="https://www.tandfonline.com/doi/abs/10.1080/03610926.2018.1433847" target="_blank" rel="noopener">here</a>.) Our class of semi-stable or weakly semi-stable distributions is more narrow but have nicer properties, making them good candidates when you need a distribution that has more stability than divisible distributions, and more flexible, more varied than stable distributions: in other words, something in between divisible and stable distributions, offering the best of both worlds.</span></p>
<p><strong>Related article</strong></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-perspective-on-central-limit-theorem-and-related-stats-topics" target="_blank" rel="noopener">New Perspective on the Central Limit Theorem and Statistical Testing</a> (also discussing stable distributions)</li>
</ul>
<p></p>Top 7 Data Science Use Cases in Administrationtag:www.datasciencecentral.com,2019-11-27:6448529:BlogPost:9110502019-11-27T15:27:20.000ZIgor Bobriakovhttps://www.datasciencecentral.com/profile/IogrBobriakov
<p><span style="font-weight: 400;"><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744004297?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/3744004297?profile=RESIZE_710x"></img></a></span></p>
<p><span style="font-weight: 400;">A successful business requires new approaches to data management in this age. Modern advances in data science area provide an efficient solutions for numerous use cases.</span></p>
<p><span style="font-weight: 400;">Data science embraces a broad spectrum of tasks in the sphere of…</span></p>
<p><span style="font-weight: 400;"><a href="https://storage.ning.com/topology/rest/1.0/file/get/3744004297?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3744004297?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p><span style="font-weight: 400;">A successful business requires new approaches to data management in this age. Modern advances in data science area provide an efficient solutions for numerous use cases.</span></p>
<p><span style="font-weight: 400;">Data science embraces a broad spectrum of tasks in the sphere of public administration and brings new opportunities to this field. Let us take into consideration key data science use cases in administration to clear out this term based on practical examples.</span></p>
<h1><strong>Fraud detection</strong></h1>
<p><span style="font-weight: 400;">Considerable attention is paid to fraud detection and prevention in the sphere of public administration. When it comes to the public sector, and administration key threats are corruption, bribery, forgery, and misuse of authority. Despite all the prevention and detection measures, this issue remains topical.</span></p>
<p><span style="font-weight: 400;">Nowadays, fraudsters use sophisticated techniques to still personal data, money or to forge. Therefore, public authorities and agencies have to be inventive in this struggle. Fraud is often regarded as the main issue undermining the effectiveness of the governmental policies and procedures of the public administration offices. One of the most important measures to be taken is the real-time detection of fraud attacks. First of all, it is essential to develop an end-to-end strategy to fight fraud. This strategy should cover both internal and external processes which will become a subject for real-time data analytics. This data is crucial for data consolidation resulting in processing, detecting and preventing of fraudulent activities.</span></p>
<p><span style="font-weight: 400;">Identifying and assessing the potential risks are among the key factors influencing the success of fraud management. Real-time analytics and prediction techniques are a must for the toolkit in public administration. </span></p>
<h1><strong>Managing customer data</strong></h1>
<p><span style="font-weight: 400;">An essential function of all the governmental institutions is to maintain trusted information and valuable data insights in total security for people. In its turn, managing this data is a top task in the sphere of public administration.</span></p>
<p><span style="font-weight: 400;">Managing and proper using of this vast amount of data for the benefit of the whole state may be complicated and challenging. A new trend in data management is the application of blockchain technologies. Blockchain technologies enable secure access to public data for governmental agencies due to the specific approach to data storage. A blockchain comprises the data in the form of blocks, these blocks get organized into a chain, and after that none of the actors can make any change or delete some of them. Instead, these blocks may be verified or managed.</span></p>
<p><span style="font-weight: 400;">The most vivid example of the blockchain application for data management refers to the transactions of the land registry. Detailed information on selling, registering and sales transactions are automatically recorded to reduce paper documentation and reduce time spent on land registration.</span></p>
<p><span style="font-weight: 400;"><a href="https://storage.ning.com/topology/rest/1.0/file/get/965166116?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/965166116?profile=RESIZE_710x" class="align-center"/></a></span></p>
<h1><strong>Industry knowledge</strong></h1>
<p><span style="font-weight: 400;">People working within the field of public administration are leaders by their nature. Their daily work is aimed at making societies, public services and management advanced and efficient. </span></p>
<p><span style="font-weight: 400;">The public sector is well aware of the big data advancements and does not miss a chance to implement them for its benefit. Big data brings advanced analytics, an increase in efficiency and data-based decisions to public administration. The key trends of the public administration are in prioritizing of the Internet of Things, encouraging the private-public partnership, predictive modeling using, personalizing public services and application of behavioral science. The combination of these factors causes high demand in the professionals capable of combining the knowledge of law, governance, communication, psychology, finance, management and data science. </span></p>
<h1><strong>Reporting</strong></h1>
<p><span style="font-weight: 400;">Administrative reports are the documents representing general status and the progress of the operation concerning some type or dimension of work. The report aims to highlight the current state, the progress made by the time of reporting and the main milestones achieved. These reports used to present a massive pile of documents. The procedure of their compiling required a lot of time and effort, due to the amount of data and multiple sources of this data.</span></p>
<p><span style="font-weight: 400;">Modern techniques and smart data algorithms brought automation to this process and made it far more efficient and less time-consuming. Advanced smart solutions and platforms allow building reports of various complexity within several minutes. Such tools have the benefit of data collection, tracking, and analysis as well as data visualization and customization. The tradition statistical approach is no longer enough to take into account all the parameters and provide a report that can contribute to the general procedure of decision making.</span></p>
<p><span style="font-weight: 400;">Besides, modern tools allow using access control checks plugins to prevent theft or fraud.</span></p>
<p><span style="font-weight: 400;">Key task of public administration is to ensure the implementation of the governmental policies. The success of the implementation must be proved by careful reporting and reliable data, which amount is enormous on the scale of the whole country. </span></p>
<h1><strong>Real-time analytics</strong></h1>
<p><span style="font-weight: 400;">The public sector has successfully adopted big data and enjoys the benefits of real-time data analytics. Big data solutions and smart algorithms have become an integral part of public administration.</span></p>
<p><span style="font-weight: 400;">The sector of public administration presupposes various time-sensitive operations and transactions; thus the support of real-time intelligence is crucial. A key benefit of real-time analytics is an capability to conduct constant monitoring and gain a greater extent of control over the processes and operations. The progress or a general status of transactions or processes may be tracked according to various indicators and within different time frames.</span></p>
<p><span style="font-weight: 400;">Besides, there appears a high demand in the real-time data transmission, at the same time. Many institutions use real-time data analytics to detect improper payments.</span></p>
<p><span style="font-weight: 400;">Real-time analytics enables faster and far more efficient decision making.</span></p>
<p><span style="font-weight: 400;"><a href="https://storage.ning.com/topology/rest/1.0/file/get/965172249?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/965172249?profile=RESIZE_710x" class="align-center"/></a></span></p>
<h1><strong>Robotization</strong></h1>
<p><span style="font-weight: 400;">The robots are actively taking more and more responsibilities every day. They are winning more trust from people and prove to be extraordinarily efficient and thriving in the completion of routine tasks. There is an opinion or even fear that robots will replace people in many positions, and some professions will become extinct shortly.</span></p>
<p><span style="font-weight: 400;">The sphere of public administration is one of those fields where robotization finds broad perspective. </span></p>
<p><span style="font-weight: 400;">Here are several functionalities that may be successfully covered by the robots:</span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">Automated line recognition</span></li>
</ul>
<p><span style="font-weight: 400;">Recognition software makes robots capable of recognizing essential information in the invoices, and automatic data-entry.</span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">Matching and booking</span></li>
</ul>
<p><span style="font-weight: 400;">Robots can easily match the invoices to receipts, contracts, etc. Robot will handle minor discrepancies and mistakes.</span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">Automated notifications</span></li>
</ul>
<p><span style="font-weight: 400;">Various processes need approval or some other action for finalization. Robots handle automatic notifications and status of the operations and transactions in real-time mode. </span></p>
<h1><strong>Process automation</strong></h1>
<p><span style="font-weight: 400;">The sphere of public administration has always been a sector burdened with multiple procedures and paperwork. Due to rapid development of technologies people have moved from recording everything on paper. However, the tradition to keep everything documented remains the same. </span></p>
<p><span style="font-weight: 400;">Thanks to automation that technological advancements brought to our life now it is possible to perform multiple operations and keep an eye on various processes in one click. Robotic process automation reduces the workload of the staff by trusting tasks to intelligent machines. Robots or AI-powered software are widely used to mimic and complete repeatable tasks. In addition, these techniques are expected to alleviate some security concerns as well. </span></p>
<p><span style="font-weight: 400;">Automation of the administrative tasks brings a considerable relief to those managers and businessmen who need to be reactive and responsive to new challenges every minute. Here are just several examples of those tasks that may be subjects for automation.</span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">Emails responding and sorting</span></li>
</ul>
<p><span style="font-weight: 400;">Smart label systems easily sort emails and mark the level of importance to make the working process more efficient and productive.</span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">Scheduling</span></li>
</ul>
<p><span style="font-weight: 400;">Synchronization of multiple calendars and meetings provides a chance to plan working time and use it to the full.</span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">Auto-payments</span></li>
</ul>
<p><span style="font-weight: 400;">Smart systems are capable of managing payment procedures to reduce delays.</span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">Reporting</span></li>
</ul>
<p><span style="font-weight: 400;">Overviews of completed cases and reports may be easily created by smart automation software for the convenience of managers and employees. </span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">Calculations.</span></li>
</ul>
<h1><span style="font-weight: 400;"><strong>Conclusion</strong> </span></h1>
<p><span style="font-weight: 400;">Public administration is carried out following public interests. As a result of digital technologies introduction into various spheres of human activities, public interest moved forward. Public administration sphere is adjusting to the rapidly developing digital environment.</span></p>
<p><span style="font-weight: 400;">The overall trend is digital openness and automation of the process keeping a sufficient security level. Improvement of governmental policies and administration services requires the application of data science.</span></p>
<p><span style="font-weight: 400;">It is essential to remember that traditional governmental models are now replaced with digital ones under the pressure of immense development of IT technologies and Internet culture. The digitalization of public administration allows delivering services faster, securely and safely. </span></p>The Likelihood Principle, the MVUE, Ghosts, Cakes and Elvestag:www.datasciencecentral.com,2019-11-26:6448529:BlogPost:9110042019-11-26T22:44:39.000ZDavid Harrishttps://www.datasciencecentral.com/profile/DavidHarris
<p>In my prior blog post, I wrote of a clever elf that could predict the outcome of a mathematically fair process roughly ninety percent of the time. Actually, it is ninety-three percent of the time and why it is ninety-three percent instead of ninety percent is also important.</p>
<p>The purpose of the prior blog post was to illustrate the weakness of using the minimum variance unbiased estimator (MVUE) in applied finance. Nonetheless, that begs a more general question of when and why it…</p>
<p>In my prior blog post, I wrote of a clever elf that could predict the outcome of a mathematically fair process roughly ninety percent of the time. Actually, it is ninety-three percent of the time and why it is ninety-three percent instead of ninety percent is also important.</p>
<p>The purpose of the prior blog post was to illustrate the weakness of using the minimum variance unbiased estimator (MVUE) in applied finance. Nonetheless, that begs a more general question of when and why it should be used, or a Bayesian or Likelihood-based method should be applied. Fortunately, the prior blog post provides a way of looking at the problem.</p>
<p>Fisher’s Likelihood-based, Pearson and Neyman’s Frequency-based and Laplace’s method of inverse probability really are at odds with one another. Indeed, much of the literature of the mid-twentieth century had a polemical ring to it. Unfortunately, what ended up coming about was a hybridization of the tools, and so it can be challenging to see how the differing properties matter.</p>
<p>In fact, each type of tool was created to solve different kinds of problems. It should be unsurprising that they excel in some places and may even be problematic in some cases.</p>
<p>In the prior blog post, the clever elf was able to have perfect knowledge of the outcome of a mathematically fair process in eighty percent of the cases and was superior in thirteen of the remaining twenty percent of the cases because the MVUE violates the likelihood principle. Of course, that is by design. It is supposed to violate the likelihood principle. It could not display its virtues if it tried to conform. Nonetheless, it forces a split between Pearson and Neyman on one side and Fisher and Laplace on the other.</p>
<p>In most statistical disagreements, the split is among methods built around null hypothesis methods and Bayesian methods. In this case, Fisher’s method will sit on Laplace’s side of the fence rather than Pearson and Neyman’s. The goal of this post is neither to defend nor to attack the likelihood principle. Others have done that with great technical skill.</p>
<p>This post is to provide a background of the issues separate from a technical presentation of it. While this post could muck around in measure theory, the goal is to extend the example of the cakes so the differences can be made apparent. As it happens, there is a clean and clear break in the information used between the methodologies.</p>
<p>The likelihood principle is divisive in the field of probability and statistics. Some adherents to the principle argue that it rules out Pearson and Neyman’s methodology entirely. Opponents either say that its construction is flawed in some way or, simply state that for most practical problems, no one need care about the difference because orthodox procedures work often enough in practical situations. Yet these positions illustrate why not knowing the core arguments could cause a data scientist or subject matter expert to choose the wrong method.</p>
<p>The likelihood principle follows from two separate tenets that are not individually controversial or at least not very controversial. There has been work to explore it, such as by Berger and Wolpert. There has also been work to refute it and to refute the refutations. See, for example, Deborah Mayo’s work. So far, no one has generated an argument so convincing that the opposing sides believe that the discussion is even close to being closed. It remains a fertile field for graduate students to produce research and advancements.</p>
<p>The first element of the likelihood principle is the sufficiency principle. No one tends to dispute it. The second is the conditionality principle. It tends to be the source of any contention. We will only consider the discrete case here, but for a discussion of the continuous case, see Berger and Wolpert’s work on it listed below.</p>
<p>A little informally, the weak conditionality principle supposes that two possible experiments could take place regarding a parameter. In Birnbaum’s original formulation, he considered two possible experiments that could be chosen, each with a probability of one-half. The conditionality principle states that all of the evidence regarding the parameter comes only from the experiment that was actually performed. The experiment that did not happen plays no role. That last sentence is the source of the controversy.</p>
<p>Imagine that a representative sample is chosen from a population to measure the heights of members. There will be several experiments performed by two research groups for many different studies over many unrelated topics. </p>
<p>The lab has two types of equipment that can be used. The first is a carpenter’s tape that is accurate to 1/8<sup>th</sup> of an inch (3.125 mm), while the other is a carpenter’s tape that is accurate to 1 millimeter. A coin is tossed to determine which team gets which carpenter’s tape.</p>
<p>The conditionality principle states that the results of the experiment only depend on the accuracy of the instrument used and the members of the sample and that the information that would have been collected by using the other device or a different sample has to be ignored. To most people, that would be obvious, but that is the controversial part.</p>
<p>Pearson and Neyman’s methods choose the optimal solution before seeing any data. Any randomness that impacts the process must be accounted for, and so the results that could have been obtained but were not are supposed to affect the form of the solution. </p>
<p>Pearson and Neyman’s algorithm is optimal, having never seen the data, but may not be optimal after seeing the data. There can exist an element of the sample space that would cause Pearson and Neyman’s method to produce poor results. The guarantee is for good average results upon repetition over the sample space, not good results in any one experiment.</p>
<p>There are examples of pathological results in the literature where a Frequentist and a Bayesian statistician can draw radically different solutions with the same data. To understand another place that a violation of the likelihood principle may occur, consider the lowly t-test.</p>
<p>Imagine a more straightforward case where the lab only had one piece of equipment, and it was accurate to 1 millimeter. If a result is significant, then the sample statistic is as extreme or more extreme than what one would expect if the null is true. It compares the result to the set of all possible samples that could have been taken if the null is true. </p>
<p>Of course, more extreme values were not observed for the sample mean. If a more extreme result were found, then that would have been the result and not the one actually observed. What if the result is the most extreme result any person has ever seen, can someone really argue that the tail probability is full?</p>
<p>The conditionality principle says that if you didn’t see it, then you do not have information about it. You cannot use samples that were not seen to perform inference. That excludes all t-, F-, z-tests, and most Frequentist tests because they are conditioned on a model that assumes that certain things are real that have never been observed.</p>
<p>A big difference between Laplace and Fisher on one side and Pearson and Neyman on the other is whether all the evidence that you have about a parameter is in the sample, or whether samples unseen must be included as well.</p>
<p>The non-controversial part of the likelihood principle is the sufficiency principle. The sufficiency principle is a core element of all methods. It states something pretty obvious.</p>
<p>Imagine you are performing a set of experiments to gather evidence about some parameter, and you are going to use a statistic <a href="https://storage.ning.com/topology/rest/1.0/file/get/3742899904?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3742899904?profile=RESIZE_710x" class="align-full"/></a>, where t is a statistic sufficient for the parameter. Then if you conducted two experiments and the statistics were equal, <a href="https://storage.ning.com/topology/rest/1.0/file/get/3742902531?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3742902531?profile=RESIZE_710x" class="align-full"/></a>, then the evidence about the parameter in both experiments is equal.</p>
<p>When the two principles are combined, Birnbaum asserts that the likelihood principle follows from it. The math lands on the following proposition. If you are performing an experiment, then the evidence about a parameter should depend only on the experiment actually conducted and the data observed through the likelihood function.</p>
<p>In other words, Fisher’s likelihood function and Laplace’s Likelihood function are the only functions that contain all the information from the experiment. If you do not use the method of maximum likelihood or the method of inverse probability, then you are not using all of the information. You are surrendering information if you choose something else. </p>
<p>Before we look at the ghostly cakes again, there are two reasons to rule out Fisher’s method of maximum likelihood. The first is that, as mentioned in the prior blog post, there is not a unique maximum likelihood estimator in this case. The second, however, is that Fisher’s method isn’t designed so that you could make a decision from knowing its value. It is designed for epistemological inference. The p-value does not imply that there is an action that would follow from knowing it. It is designed to provide knowledge, not to prescribe actions or behaviors.</p>
<p>If you use Fisher’s method as he intended, then the p-value is the weight of the evidence against your idea. It doesn’t have an alternate hypothesis or an automatic cut-off. Either you decide that the weight is enough that you should further investigate the phenomenon, or it isn’t enough, and you go on with life investigating other things.</p>
<p>In the prior blog post, the engineer was attempting to find the center of the cake using Cartesian coordinates. The purpose was to take an action that is cutting a cake through a particular point. </p>
<p>She had a blade that was long enough regardless of where the cake sat that was anchored in the origin. In practice, her only real concern was the angle, but not the distance. Even though two Cartesian dimensions were measured, only one is used in polar coordinates, the angle.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3742892108?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3742892108?profile=RESIZE_710x" class="align-full"/></a></p>
<p>The clever elf, however, was using a Bayesian method, and the likelihood was based on the distance between points as well as the angles. As such, it had to use both dimensions to get a result. The reason the MVUE was less precise is that it violates the likelihood principle and throws away information.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3742893197?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3742893197?profile=RESIZE_710x" class="align-full"/></a></p>
<p>It is here we can take another look at our ghostly cakes by leaving the Cartesian coordinate system and moving over to polar coordinates so we can see the source of the information loss directly. This difference can be seen in the sampling distribution of the two tools by angle.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3742893749?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3742893749?profile=RESIZE_710x" class="align-full"/></a></p>
<p>Why bother with the MVUE at all? After all, when it doesn’t match Fisher’s method of maximum likelihood, then it must be a method of mediocre likelihood. What does the MVUE have to offer that the Bayesian posterior density cannot</p>
<p>The MVUE is an estimator that comes with an insurance policy. It also permits answers to questions that the Bayesian posterior cannot answer.</p>
<p>Neither Laplace’s Bayesian nor Fisher’s Likelihood methods directly concern themselves with either precision or accuracy. Both tend to be biased methods, but that is in part because neither method cares about bias. An unbiased estimator solves a type of problem that a biased estimator cannot solve.</p>
<p>Imagine an infinite number of parallel universes where each is slightly different. A method is either unbiased and accurate or biased and inaccurate. For someone trying to determine which world they live in, the use of a Bayesian method implies they will always tend to believe they live in one of the nearby universes, but never find which one is their own, except by chance.</p>
<p>Using Pearson and Neyman’s method also allows a guarantee against the frequency of false positives and a way to control for false negatives. Such assurance can be valuable, particularly in manufacturing quality control. That assurance extends to confidence, tolerance, and predictive intervals. Such a guarantee of coverage also holds value in academia.</p>
<p>Finally, under mild circumstances such as correct instrumentation and methodology, Pearson and Neyman’s method allows for a solution to inferences in two ways that are unavailable to the Bayesian approach.</p>
<p>First, Frequency methods allow for a complete form of reasoning that is not generally available to Bayesian methods. Bayesian methods lack a null hypothesis and are not restricted to two hypotheses. There should be one hypothesis for every possible combination of ways the world could exist. Unfortunately, it is possible that the set of Bayesian hypotheses cannot contain the real model of the world.</p>
<p>Before Einstein and relativity, there couldn’t have been a hypothesis that included curvatures in space-time and so a Bayesian test would have found the closest fit to reality but also would have been wrong. Without knowing about relativity, a null hypothesis could test whether Newton’s laws were valid for the orbit of Mercury and discover that they were not. That does not mean a good solution exists from current knowledge, but it would show that there is something wrong with Newton’s laws.</p>
<p>Additionally, Bayesian methods have no good solution to solve a sharp null hypothesis. A null hypothesis is sharp if it is of the form <a href="https://storage.ning.com/topology/rest/1.0/file/get/3742906382?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3742906382?profile=RESIZE_710x" class="align-full"/></a>. Although there is a Bayesian solution in the discrete case, in the continuous case, there cannot be one because it would have zero measure. If it is assumed that <a href="https://storage.ning.com/topology/rest/1.0/file/get/3742907315?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3742907315?profile=RESIZE_710x" class="align-full"/></a>, then the world should work in a specific manner. If that is the real question, then a Bayesian solution cannot provide a really good solution. There are Bayesian approximations but no exact answer.</p>
<p>In applied finance, only Bayesian methods make sense for data scientists and subject matter experts, but in academic finance, there is a strong argument for Frequentist methods. The insurance policy is paid for in information loss, but it provides benefits unavailable to other methods.</p>
<p>If the engineer in the prior blog post had been using polar coordinates rather than Cartesian coordinates, there would not be a need to measure the distance from the origin to find the MVUE because the blade was built to be long enough. The Bayesian method would have required the measurement of the distance from the origin.</p>
<p>At a certain level, it seems strange that adding any variable to the angles observed could improve the information about the angle alone, yet it does. The difference between the MVUE and the posterior mean is obvious. The likelihood function requires knowledge of the distances. Even though the distance estimator is not used in the cutting of the cake, and even though there are errors in the estimation of the distance to the center from each point, the increase in information substantially outweighs the added source of error. Overall, the noise gets reduced from the added information.</p>
<p>Finally, the informational differences should act as a warning to model builders. Information that can be discarded using a Frequency-based model may be critical information in a Bayesian model. Models like the CAPM come apart in a Bayesian construction, implying that a different approach is required by the subject matter expert. The data scientist performing the implementation will have differences too.</p>
<p>Models of finance tend to be generic and have a cookbook feel. That is because the decision-tree implicit in financial modeling is around finding the right tool for the right mix of general issues. Concerns for things such as heteroskedasticity, autocorrelation, or other symptoms or pathologies all but vanish in a Bayesian methodology. Instead, the headaches often revolve around combinatorics, finding local extreme values, and the data generation process. To take advantage of all available information, the model has to contain it. In the ghostly cakes, more than the angle is necessary. The model needs to be proposed and measured.</p>
<p></p>
<p>Berger, J. O., & Wolpert, R. L. (1984). <em>The likelihood principle</em>. Hayward, Calif: Institute of Mathematical Statistics.</p>
<p>Mayo, Deborah G. (2014) On the Birnbaum Argument for the Strong Likelihood Principle. Statist. Sci. 29, no. 2, 227-239</p>An easy way to evaluate the probability of winning a commercial opportunitytag:www.datasciencecentral.com,2019-11-26:6448529:BlogPost:9107622019-11-26T12:05:05.000ZPablo Gutierrezhttps://www.datasciencecentral.com/profile/PabloGutierrez
<p>When ever we visit a client and present our proposal, we start wondering if it will be accepted or rejected by the customer. Usually, our customer will analyze our proposal, compare it with other competitors’ and make a decision.</p>
<p>In order to build our commercial forecast system, we need to assign a probability to every proposal we have presented and assign a numerical value to every one of them.<br></br> One way of doing this is multiplying the value of the proposal by the probability of…</p>
<p>When ever we visit a client and present our proposal, we start wondering if it will be accepted or rejected by the customer. Usually, our customer will analyze our proposal, compare it with other competitors’ and make a decision.</p>
<p>In order to build our commercial forecast system, we need to assign a probability to every proposal we have presented and assign a numerical value to every one of them.<br/> One way of doing this is multiplying the value of the proposal by the probability of wining it.</p>
<p>Expected_income=proposal_value* proposal_probability</p>
<p>But, how to assign a probability to the different opportunities? <br/> Most commercial departments calculate the probability of wining the opportunity using the knowledge, experience and instinct of the team.</p>
<p>And, is there any other way to calculate the chances for this opportunity to be successful?</p>
<p>The answer is yes. Using logistic regression, one of the most popular techniques in machine learning, it is possible to train one algorithm that calculates the probability for one commercial opportunity to be successful.</p>
<p>Logistic regression can be applied to many variables, but to keep it simple we will use only one, and that is the time (in days) that the opportunity has been “alive” since it was created.</p>
<p>The reason why we choose this variable is because most of purchasing departments need a certain time to analyze proposals and make a decision for a specific kind of product, after this time the opportunity doesn’t have many chances to be successful.</p>
<p>To train the model it is necessary to use the historical information and prepare the data in such a way that in the X column we have the time one opportunity was “alive” and in the Y column we place “0” or “1” depending if this opportunity was successful or not.</p>
<p>Using R we can perform this regression in very easily<br/> <em>fit <- glm(result ~ time, data = Products, family = 'binomial')</em></p>
<p>R will store the values for the independent variable and the intercept, so we will be able to construcnt the expression z=a*time+intercept .</p>
<p>Finally we calculate the time dependent probability as p(t)=sigmoid(z)</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3742219732?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3742219732?profile=RESIZE_710x" class="align-full"/></a>We shoul obtain a trend like this. The final shape of the curve will depend on the market nature.</p>
<p></p>
<p></p>How Spotify know a lot about you using machine learning and AI.tag:www.datasciencecentral.com,2019-11-26:6448529:BlogPost:9109492019-11-26T08:30:00.000ZSameer Nigamhttps://www.datasciencecentral.com/profile/SameerNigam
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3742021452?profile=original" rel="noopener" target="_blank"><img class="align-full" src="https://storage.ning.com/topology/rest/1.0/file/get/3742021452?profile=RESIZE_710x"></img></a></p>
<h3><strong>In this article, we will talk about</strong></h3>
<ol>
<li><strong>How Spotify is using Artificial Intelligence and Machine Learning to enhance the experience of listeners?</strong></li>
<li><strong>How it is helping artists and creators?</strong></li>
<li><strong>Which machine learning, loss function, training model technologies…</strong></li>
</ol>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3742021452?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3742021452?profile=RESIZE_710x" class="align-full"/></a></p>
<h3><strong>In this article, we will talk about</strong></h3>
<ol>
<li><strong>How Spotify is using Artificial Intelligence and Machine Learning to enhance the experience of listeners?</strong></li>
<li><strong>How it is helping artists and creators?</strong></li>
<li><strong>Which machine learning, loss function, training model technologies Spotify uses in its different applications.</strong></li>
<li><strong>What Spotify is planning to do in the upcoming future?</strong></li>
</ol>
<p>Spotify is a music streaming industry started in 2006. It got its first official launch in India in February 2019 and it already had millions of subscribers in its list. Spotify is known for its user experience, music recommendation and it is continuously getting improved. It uses artificial intelligence, machine learning, and big data to improve and personalize the music experience for its listeners.</p>
<p>Spotify needs no introduction. Spotify is one of the best music streaming industry in the market. But what excites us the most is the amazing ways it uses to enhance the user experience.</p>
<h3><strong>How Spotify is using Artificial Intelligence and Machine Learning to enhance the user experience of listeners?</strong></h3>
<p>We all would be familiar with "discover weekly" which is a personalized playlist unique to each user. It is using artificial intelligence and machine learning algorithms to generates the playlist. It learns through your music preferences, streaming history or how many times you listened to a particular song. Everyone's discovers weekly is different at different times of the day.</p>
<p></p>
<p><img src="http://galvanize-wp.s3.amazonaws.com/wp-content/uploads/2016/08/18104334/Screen-Shot-2016-08-17-at-4.48.55-PM.png?profile=RESIZE_710x" class="align-full"/></p>
<p></p>
<p>When you are listening to music, Spotify will monitor whether you are listening to the whole song or just skipping through it. And over time it builds up and understands the type of music you like. They even dissect this type of music by beats per minute and style the type of voices and so on. So this helps users who don't have time, energy or skills to create their own playlist getting the playlist according to their interest.</p>
<p>The more you listen to the music the more data they get about you and the better their algorithm becomes of your kind of music and hence taking them on a personal listening journey.</p>
<p>In the further section, we will discuss the in-depth working of this system.</p>
<h3>How it is helping artists and creators?</h3>
<p><a href="https://miro.medium.com/max/928/0*zl0-pZtZzslGC-R8." target="_blank" rel="noopener"><img src="https://miro.medium.com/max/928/0*zl0-pZtZzslGC-R8.?profile=RESIZE_710x" class="align-full"/></a></p>
<p>There was one problem in the traditional music industry of the past and that was that new creators had to go through a lot of struggle to reach the audience, even if they create the music that people will like. Spotify's music recommendation system works on machine learning that learns about your song type and it predicts and recommends you a new song that you probably haven't listened but you will like.</p>
<p>This gives a chance to music creators to get known by the people and listeners to get songs they will like. This makes happy both listeners and creators and especially help creators to become the best version of themselves. They don't have to go through hurdles to get recognized and they can focus on creating music.</p>
<h3><strong>Which machine learning, loss function, training model technologies Spotify uses in its different applications.</strong></h3>
<p>Firstly Spotify tries to collect as much data as it can and tries to make sense of it in different ways. It creates many shared models representing the data and is used as many different applications.</p>
<p>And some of them are discussed below:</p>
<p><strong>1. Guess the missing track from a playlist.</strong></p>
<p>They have millions of playlists and they filter out the playlist that is relevant for the training. Selection is an important factor here because if you train on all available playlist it will definitely not give better results.</p>
<p>So what it does is it removes the song from a particular playlist and then try to guess which track is missing in using the context of other playlists. It uses the Word2Vec type algorithm.</p>
<p>Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located close to one another in the space.</p>
<p>Out of that, they get a cloud of similarities between playlists, tracks, and artists and try to map that how these artist's music types are close to each other or how this album is close to a particular listener's music taste.</p>
<p>2. <strong>Spotify Home screen</strong>: Spotify Home screen uses machine learning algorithm known as BaRT.</p>
<p>Basically BaRT is a Bayesian Additive Regression Trees which is a Bayesian “sum-of-trees” model where each tree is constrained by a regularization prior to being a weak learner, and ﬁtting and inference are accomplished via an iterative Bayesian backﬁtting MCMC algorithm that generates samples from a posterior.</p>
<p>In Spotify, BaRT is used to predict the wide range of different shelves and shelf could be made for you or recommendations related to recent listening history.</p>
<p><a href="https://cpb-ap-se2.wpmucdn.com/blogs.unimelb.edu.au/dist/3/41/files/2018/08/spotify_algo_green-1wtl1k5.png" target="_blank" rel="noopener"><img src="https://cpb-ap-se2.wpmucdn.com/blogs.unimelb.edu.au/dist/3/41/files/2018/08/spotify_algo_green-1wtl1k5.png?profile=RESIZE_710x" class="align-full"/></a><strong>How it gives a personalized experience?</strong></p>
<p>BaRT algorithm work in a very interesting way to know about its user.</p>
<ul>
<li>It is <strong>optimized for >30 seconds</strong> streams. It means if you listen to a song and you listen to it for more than 30 seconds it considers it your interest.</li>
<li>And then they <strong>retrain the model once a day</strong> based on interaction data collected</li>
<li>Then built the <strong>system to de-bias for positional bias</strong>. Meaning if you clicked on something on the top will be of less worth and the clicks on the bottom will be of more worth.</li>
</ul>
<p><strong>3. Search Bars on Spotify</strong></p>
<p>Whenever users search about a query it categories its searches in a different manner like search item popularity, whether the user has searched about this item before, similarity of the item to the user taste and the distance between prefix query and the matched items. The ranking model trained on search interaction logs and use search sessions that end in success action as positive examples. And all these predictions happen in just milliseconds.</p>
<p><strong>So how ranking algorithm gets its data?</strong></p>
<p>It basically takes two things into account and gives the score on that basis.</p>
<ul>
<li><strong>Search results seen by the users in the past.</strong></li>
<li><strong>Successful Interactions in the past.</strong></li>
<li><strong>Score(4 on success item, 2 on the related item and 0 on everything else).</strong></li>
</ul>
<p><strong>Loss Functions</strong> used in this system is <strong>LISTWISE FUNCTION</strong>.</p>
<p>The listwise approach addresses the ranking problem in the following way. In learning, it takes ranked lists of objects (e.g., ranked lists of documents in IR) as instances and trains a ranking function through the minimization of a listwise loss function deﬁned on the predicted list and the ground truth list. The listwise approach captures the ranking problems, particularly those in IR in a conceptually more natural way than previous work.</p>
<p>And <strong>Training Model used is Lambda Mart</strong> with <strong>Maximizing NDCG</strong>(average over training dataset) <strong>using GBDT</strong> (Gradient Boosting decision trees.)</p>
<p>Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical score(4 on success item, 2 on the related item and 0 on everything else) for each item. The ranking model's purpose is to rank, i.e. produce a permutation of items in new, unseen lists in a way that is "similar" to rankings in the training data in some sense.</p>
<h3><strong>Why helped Spotify achieve this level?</strong></h3>
<p>Spotify describes its successful implementation of machine learning in a <a href="https://www.youtube.com/watch?v=2VvM98flwq0" rel="noopener nofollow" target="_blank">Hyperight AB</a> keynote in the following three ways.</p>
<ol>
<li><strong>The large volume of playlists created by the users.</strong></li>
<li><strong>The emotion attached to the user in creating those playlists.</strong></li>
<li><strong>9 years of continuous iteration and hard work.</strong></li>
<li><strong>Team of user researcher, data scientist, and data engineer.</strong></li>
</ol>
<h3><strong>What Spotify is planning to do in the upcoming future?</strong></h3>
<p>In a <a href="https://www.youtube.com/watch?v=g4jJ8BsuJ4g" rel="noopener nofollow" target="_blank">video</a> of <a href="https://www.linkedin.com/in/bernardmarr/" target="_blank" rel="noopener">Mr. Bernard Marr</a> he provided the information when he met with the data scientist team of Spotify, they revealed</p>
<ol>
<li>They will combine this data with other data sources like GPS location, age, and work. For example, if you are commuting to your work in the morning or coming back from it. Or you are listening to music in the evening at your home or what type of music you like to listen when you are going to the gym.</li>
<li>Also when it gets connected to the fitness tracker band or apple watch they will now know what your pulse rate is and what type of music will help you.</li>
<li>In the upcoming future, they will be using machine learning and Artificial intelligence to automate their music recommendations. </li>
<li>Now you don't have to pick the playlist manually when you are traveling or going to the gym or taking heavyweight Spotify will know what songs you will be liking at that moment of time.</li>
<li>This will be a great implementation of AI providing real value to their customers.</li>
</ol>Difference Between Standard Deviation and Standard Error in One Picturetag:www.datasciencecentral.com,2019-11-25:6448529:BlogPost:9109342019-11-25T22:21:48.000ZStephanie Glenhttps://www.datasciencecentral.com/profile/StephanieGlen
<p>The <a href="https://www.statisticshowto.datasciencecentral.com/what-is-the-standard-error-of-a-sample/" rel="noopener" target="_blank">standard error</a> is really just a type of <a href="https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/standard-deviation/" rel="noopener" target="_blank">standard deviation</a>. For this simple example, I've used three samples as an illustration of how the standard deviation and standard error differ as they relate to…</p>
<p>The <a href="https://www.statisticshowto.datasciencecentral.com/what-is-the-standard-error-of-a-sample/" target="_blank" rel="noopener">standard error</a> is really just a type of <a href="https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/standard-deviation/" target="_blank" rel="noopener">standard deviation</a>. For this simple example, I've used three samples as an illustration of how the standard deviation and standard error differ as they relate to <a href="https://www.statisticshowto.datasciencecentral.com/sample-mean/" target="_blank" rel="noopener">sample means</a>.</p>
<p></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3741388417?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3741388417?profile=RESIZE_710x" class="align-full"/></a></p>The Logjam in AI/ML Platforms is About to Complicate Your Lifetag:www.datasciencecentral.com,2019-11-25:6448529:BlogPost:9106572019-11-25T18:44:41.000ZWilliam Vorhieshttps://www.datasciencecentral.com/profile/WilliamVorhies
<p><strong><em>Summary:</em></strong><em> Too many solutions. We are at an inflection point where too many vendors are offering too many solutions for moving our AI/ML models to production. The very real risk is duplication of effort, fragmentation of our data science resources, and incurring unintended new technical debt as we bind ourselves to platforms that have hidden assumptions or limitations in how that approach problems.</em></p>
<p> …</p>
<p></p>
<p><strong><em>Summary:</em></strong><em> Too many solutions. We are at an inflection point where too many vendors are offering too many solutions for moving our AI/ML models to production. The very real risk is duplication of effort, fragmentation of our data science resources, and incurring unintended new technical debt as we bind ourselves to platforms that have hidden assumptions or limitations in how that approach problems.</em></p>
<p> </p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3741157490?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3741157490?profile=RESIZE_710x" width="300" class="align-right"/></a>Remember when our biggest problem was getting our models off of data science platforms and into production. Well the market is nothing if not efficient and hundreds of platform companies have been laboring away to help solve your pain point.</p>
<p>The problem arising for the CDO, CAO or any other CXX is trying to decide which and how many of these you need. And the problem is exacerbated by the fact that adopting any one of these solutions:</p>
<ul>
<li>Creates a cadre of champions advocating for their particular platform.</li>
<li>Creates more technical debt by locking in certain styles of solutions or even APIs without fully understanding the blind spots of any given platform or the rigidity of the solution.</li>
<li>Means that your data science team will now need to spread out over a variety of platforms each of which requires some technical depth and probably increases cost.</li>
<li>Some of these platforms present themselves as general purpose (Intelligent Automation), some unique to particular processes (CRM, Marketing Automation, Content Management, Supply Chain Management), and some are enhanced features promoted by your existing ERPs like PeopleSoft, Oracle, and SAP all suggesting duplication of effort.</li>
</ul>
<p>The problem is frankly mind boggling not only about which combination of platforms make the best and most efficient sense for your company but at a more fundamental level, who should be making these decisions.</p>
<p>The answer in the short run is that the problem isn’t recognized, no one is trying to rationalize these tools, and the net result is significantly diluting the effectiveness of your AI/ML strategy.</p>
<p> </p>
<p><span style="font-size: 12pt;"><strong>Victims of Our Own Success</strong></span></p>
<p>After many years of advocating for companies to adopt and implement AI/ML strategies we’re finally getting some traction. The good folks at McKinsey recently published their <a href="https://www.mckinsey.com/featured-insights/artificial-intelligence/global-ai-survey-ai-proves-its-worth-but-few-scale-impact?cid=other-eml-alt-mip-mck&hlkid=9551985a87694455862463669c8647dd&hctky=11403569&hdpid=5b270382-a9ca-4baf-b803-e4cb233f2619"><em><u>2019 Global AI Adoption Survey</u></em></a>, and while there are champions and laggards the overall pace of adoption is impressive.</p>
<p>I’ll only offer one chart that shows that pretty much all adopters are seeing meaningful cost reductions and revenue increases from their projects.</p>
<p> <a href="https://storage.ning.com/topology/rest/1.0/file/get/3741158395?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3741158395?profile=RESIZE_710x" width="500" class="align-center"/></a></p>
<p>The main benefit of this chart is to organize these improvements around the major processes of the organization showing that all of them are benefiting. </p>
<p>The downside of this is that every single one of these mega-processes already has a platform with a toe-hold in your organization. As we add AI, every one calls out for support from your data science resources that are in danger of becoming increasingly fragmented across these systems.</p>
<p> </p>
<p><span style="font-size: 12pt;"><strong>About the Major Platforms with AI/ML Enhancement</strong></span></p>
<p>This isn’t intended to be a comprehensive evaluation or even to offer a single solution. More like which rocks to look under in evaluating the problem.</p>
<p>The major decision support platforms that immediately come to mind with AI/ML enhancements on the marketing and sales side are these:</p>
<ul>
<li>CRM</li>
<li>Content Management</li>
<li>Marketing Automation</li>
</ul>
<p>The scope of all three of these can overlap significantly depending on the vendors. It’s likely that CRM and Marketing automation already have NLP, NLU, and chatbot functionality. The underlying text and language capabilities have largely been commoditized but not the logic trees that drive your customer interface chatbots.</p>
<p>The largest exposure I see here however is in the scoring models that drive customer acquisition, upsell, cross sell, and churn prevention. If these exist in different systems do you even have an inventory of what they are and where they are? Are the results of closely duplicated scoring models even similar?</p>
<p>Supply Chain Management, HRIS, and most financial and risk mitigation systems are a bit easier to categorize so that you know exactly which AI/ML models are at work in each.</p>
<p> </p>
<p><span style="font-size: 12pt;"><strong>RPA and Intelligent Automation (IA) – Growing Fast and Frequently Misleading</strong></span></p>
<p>One of the major market developments creating complexity here are the rapidly expanding RPA and IA platforms which promise to be general purpose and can cross many of the processes currently handled by the specialty platforms above. Should you let them?</p>
<p>Unfortunately there is a huge variation in capabilities among these platforms that’s made even more complicated by vendors engaged in AI-washing (claiming there’s AI inside when there are only simple rules). There are standard definitions for these platforms but it’s not unusual to see diagrams like this one that attempts to rename all of AI/ML as Intelligent Automation.</p>
<p> <a href="https://storage.ning.com/topology/rest/1.0/file/get/3741159617?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3741159617?profile=RESIZE_710x" width="450" class="align-center"/></a></p>
<p><a href="https://www.datasciencecentral.com/profiles/blogs/is-robotic-process-automation-rpa-really-ai"><em><u>The technical literature tells a different story</u></em></a>. That RPA is a rules driven system originally intended primarily for moving information from one system to another where APIs don’t exist. IA is an elaboration on this that does employee what we might call commodity AI/ML especially in image and text recognition.</p>
<p>A good short example is automated processing of invoices. An RPA solution could find incoming invoices in email streams presumably by looking in the title line and moving those to a separate folder without human intervention. </p>
<p>The Intelligent Automation version uses image and text recognition models to extract the specific invoice details from highly unstructured vendor invoices and moving that data to specific fields in your financial system so they can be processes.</p>
<p>It is frequently said that both RPA and IA are more about ‘tasks’ than what we think of as ‘processes’ but there is a strong movement afoot among IA vendors to build platforms that can use text and image AI/ML and even ML scoring models to automate broader processes. This is where the potential overlap with your existing systems becomes a challenge and requires rationalizing before committing to teams of internal IA experts supported by your data science resources.</p>
<p>A quick look at the McKinsey adoption study shows that RPA is the most frequently adopted ‘AI/ML’ tool for adoption. Your challenge is to determine whether this is actually AI/ML or simple rules driven automation, and to what extent you should encourage its adoption.</p>
<p>Sorry to leave you with a problem without offering a solution. Finding the right C-Level exec in your organization to be charged with the efficient use of data science is the first challenge. </p>
<p>The second is to inventory all the platforms that require data science support, identify any duplication, blind spots, and rigidity in solution approach, and then formulate a policy that optimizes the benefits of AI/ML while also optimizing efficiency. And since we are in that Wild West period of vendor claims, particularly watch out for AI/ML that isn’t there, that requires over much support, or claims to be able to cut across many mega-processes where domain expertise is as important as the AI/ML embedded tools.</p>
<p> </p>
<p> </p>
<p><a href="https://www.datasciencecentral.com/profiles/blog/list?user=0h5qapp2gbuf8"><em><u>Other articles by Bill Vorhies</u></em></a></p>
<p> </p>
<p>About the author: Bill is Contributing Editor for Data Science Central. Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001. His articles have been read more than 2 million times.</p>
<p>He can be reached at:</p>
<p><a href="mailto:Bill@DataScienceCentral.com">Bill@DataScienceCentral.com</a> <span>or</span> <a href="mailto:Bill@Data-Magnum.com">Bill@Data-Magnum.com</a></p>
<p><span> </span></p>What hires make the best data scientists?tag:www.datasciencecentral.com,2019-11-25:6448529:BlogPost:9107452019-11-25T17:30:00.000ZLucas Fincohttps://www.datasciencecentral.com/profile/LucasFinco
<p>(I will give you a hint. It’s in the name.)</p>
<p><a href="https://solarsystem.nasa.gov/system/people/detail_images/2041_PitmanK1-full.jpg"></a><a href="https://solarsystem.nasa.gov/system/people/detail_images/2041_PitmanK1-full.jpg" rel="noopener" target="_blank"><img class="align-full" src="https://solarsystem.nasa.gov/system/people/detail_images/2041_PitmanK1-full.jpg?profile=RESIZE_710x" style="padding: 4px;"></img></a> <a href="https://solarsystem.nasa.gov/system/people/detail_images/2041_PitmanK1-full.jpg"></a></p>
<p></p>
<p><strong>Scientists!</strong></p>
<p>This post is intended as a response to an interesting discussion on…</p>
<p>(I will give you a hint. It’s in the name.)</p>
<p><a href="https://solarsystem.nasa.gov/system/people/detail_images/2041_PitmanK1-full.jpg"></a><a href="https://solarsystem.nasa.gov/system/people/detail_images/2041_PitmanK1-full.jpg" target="_blank" rel="noopener"><img src="https://solarsystem.nasa.gov/system/people/detail_images/2041_PitmanK1-full.jpg?profile=RESIZE_710x" class="align-full" style="padding: 4px;"/></a><a href="https://solarsystem.nasa.gov/system/people/detail_images/2041_PitmanK1-full.jpg"></a></p>
<p></p>
<p><strong>Scientists!</strong></p>
<p>This post is intended as a response to an interesting discussion on LinkedIn started by analytics manager at Ford, Michael Cavaretta. <a href="https://www.linkedin.com/posts/michael-cavaretta-ph-d-795a965_industry40-datasciencejobs-manufacturing-activity-6576451636602945536-i8PE">https://www.linkedin.com/posts/michael-cavaretta-ph-d-795a965_industry40-datasciencejobs-manufacturing-activity-6576451636602945536-i8PE</a><span>.</span> He asked what hires make the best data scientists.</p>
<p>In my real-world experience in the data science field, I find that physicists make the best data scientists. Why? Because physicists work at the intersection of math and reality. They have been trained to hypothesize ideas about how things work, and then to design experiments to test these hypothesizes, gather the data, interpret it, and communicate the results. This is mainly what we are asking data scientists to do. (I do not believe model building is the sole role of data scientists, but that’s for another post.) Physicists can also work independently and can learn new techniques on their own. </p>
<p>While I specifically call out physicists from my experience, all physical scientists, chemists, astrophysicists, and even (sometimes) biologists should have the same skill sets and expertise. Physical scientists perform the best when you have a lot of data and you do not know what to do with it. They are good with uncertainty and can propose hypotheses to move analyses forward when working in a vacuum.</p>
<p>This doesn’t mean you cannot hire individuals with different educational backgrounds. Here are some cases when it would be a good idea to hire:</p>
<ul>
<li>Social Scientists – When investigating human behavior. Social scientists have experience with different tool sets than physical scientists, but the process is the same and statistics is still statistics. I find individuals with these backgrounds excel at survey analysis, behavioral analysis, and marketing applications.</li>
<li>Engineers – While engineers study a lot of physics and math and are quite competent in both fields, engineers are trained to apply discoveries in pure science disciplines in the real world. They work with certainty and well-defined systems, or very bad things could happen (Think of engineering a bridge… how do you want that engineer to think about it? Would you want them to try some things and see what works? No.). For this reason, engineers tend to struggle with the “unknown” involved in a lot of data science work. However, engineers working as data scientists can still excel, especially in industrial applications and operations research work, where conditions are controlled and systems well defined.</li>
<li>Computer Scientists – Computer scientists work at the interface of the computer & the real world and are great for developing applications & data structures that surround data scientists. The also tend to be very good at optimizing data science algorithms and deploying them optimally. However, they tend to struggle with what is actually inside the data and what that data means. They make good data engineers, architects and developers of ML consuming applications, and ML model deployers and optimizers, but not necessarily good data analysts.</li>
<li>Anyone with a questioning attitude and critical thinking skills – I am not trying to be a naysayer and tell people they cannot be a data scientist. I believe almost anyone can become a great data scientist. The fundamental characteristics are a questioning attitude, critical thinking skills, and a constant curiosity to learn.</li>
</ul>
<p>What background do you think makes for the best data scientist? Let me know in the comments below.</p>List of Quantum Cloudstag:www.datasciencecentral.com,2019-11-25:6448529:BlogPost:9108212019-11-25T16:30:00.000ZRobert R. Tuccihttps://www.datasciencecentral.com/profile/RobertRTucci
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3740404862?profile=original" rel="noopener" target="_blank"><img class="align-full" src="https://storage.ning.com/topology/rest/1.0/file/get/3740404862?profile=RESIZE_710x"></img></a> Clouds for doing quantum computing are becoming increasingly popular. Here is a list with links of those quantum clouds that already exist or are imminent. All are commercial but usually free for small jobs and open to the public. Most use open source q c software but some don't and have opted to keep their software proprietary. In Alphabetical Order. ✅…</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3740404862?profile=original" target="_blank" rel="noopener"><img class="align-full" src="https://storage.ning.com/topology/rest/1.0/file/get/3740404862?profile=RESIZE_710x"/></a>Clouds for doing quantum computing are becoming increasingly popular. Here is a list with links of those quantum clouds that already exist or are imminent. All are commercial but usually free for small jobs and open to the public. Most use open source q c software but some don't and have opted to keep their software proprietary. In Alphabetical Order. ✅ indicates working qc hardware.</p>
<ol>
<li><a href="http://artiste-qb.net/bayesforge/">Bayesforge on Amazon</a>. See also <a href="https://market.cloud.tencent.com/products/8513#">Bayesforge on Tencent</a></li>
<li><a href="https://cloud.dwavesys.com/leap/">DWave "Leap"</a>✅</li>
<li>Google (promised. Their quantum cloud will probably be closely linked to <a href="https://colab.research.google.com/notebooks/welcome.ipynb">Google "Colab"</a>)✅</li>
<li><a href="https://www.ibm.com/quantum-computing/technology/experience">IBM "Quantum Experience"</a>✅</li>
<li>IonQ (promised, so far via Azure, probably will eventually use other cloud service providers too)✅</li>
<li><a href="https://azure.microsoft.com/en-us/servi++ces/quantum/">Microsoft "Azure Quantum"</a> (promised)</li>
<li>PsiQuantum (not yet promised but highly likely due to $230M funding)✅</li>
<li><a href="https://forge.qcware.com">QCWare "Forge"</a></li>
<li><a href="https://qutech.nl/quantum-inspire/">QuTech "Quantum Inspire"</a></li>
<li><a href="https://www.hyrax.ai/">Rahko "Hyrax"</a></li>
<li><a href="https://www.rigetti.com/qcs">Rigetti "QCS"</a>✅</li>
<li><a href="https://quantumcomputing.com/">Strangeworks</a></li>
<li>Xanadu AI (promised)</li>
<li><a href="https://www.zapatacomputing.com/orquestra">Zapata "Orchestra"</a>(promised)</li>
</ol>
<p>footnote: I have definite opinions about the strengths and weaknesses of each of these quantum clouds which I will gladly share with you, but only by private channels because this topic is very controversial. I've worked on quantum computing for almost 2 decades, so my opinions on this subject are very well informed. Full disclosure: I used to work for Artiste-qb.net, the authors of Bayesforge, but I no longer do.</p>Critical tools used in the Data Science Domaintag:www.datasciencecentral.com,2019-11-25:6448529:BlogPost:9109042019-11-25T12:30:00.000ZSimran Agarwalhttps://www.datasciencecentral.com/profile/SimranAgarwal
<p>Data Scientists help find insights about the market and help make products better. They are responsible for analyzing and handling a massive amount of structured and unstructured data and require various tools to do so. Some of the tools used by Data Scientists to carry out their data operations are mentioned below.<br></br></p>
<p><span style="font-size: 12pt;"><strong>1. SAS-</strong></span> <br></br> Designed for statistical operations, SAS is an open source proprietary software that is used to…</p>
<p>Data Scientists help find insights about the market and help make products better. They are responsible for analyzing and handling a massive amount of structured and unstructured data and require various tools to do so. Some of the tools used by Data Scientists to carry out their data operations are mentioned below.<br/></p>
<p><span style="font-size: 12pt;"><strong>1. SAS-</strong></span> <br/> Designed for statistical operations, SAS is an open source proprietary software that is used to analyze data. Base SAS programming language, which is generally used for statistical modeling is used by SAS. It offers a number of statistical libraries and tools that can be used for modeling and organizing data. SAS is highly reliable, it is also quite expensive and thus is used mainly by larger industries. <br/> <br/> <span style="font-size: 12pt;"><strong>2. Apache Spark-</strong></span> <br/> Spark has been specifically designed to handle batch processing and stream processing. It is one of the most widely used Data Science tools and comes with various APIs that facilitate data scientists to make powerful predictions with the data given to them. It is highly superior to other big-data platforms as it is able to process real-time data, unlike other analytics tools which process batches of historical data. It can perform operations 100 times faster than MapReduce. <br/> <br/> <span style="font-size: 12pt;"><strong>3. BigML-</strong></span> <br/> BigML provides a fully interactable, cloud based GUI environment that can be used for processing various ML Algorithms. Through BigML, companies are able to use Machine Learning Algorithms across various parts of their company, for example, this software can be used in sales forecasting, product innovation etc. It also specializes in predictive modeling. <br/> <br/> <span style="font-size: 12pt;"><strong>4. D3.js-</strong></span><br/> It is a JavaScript library that allows you to make interactive visualization and analysis of data on your web-browser. JavaScript is used mainly as a client-side scripting language. One of the powerful features of D3.js is the usage of animated transitions. It also makes documents dynamic by allowing updates on the client side and actively using this change in data to reflect visualizations on the browser. It can be very beneficial for Data Scientists who are working on IOT based devices. <br/> <br/> <span style="font-size: 12pt;"><strong>5. MATLAB-</strong></span> <br/> It is a closed-source software which facilitates matrix functions, algorithmic implementation and statistical modeling of data and is a multi-paradigm numerical computing environment which is used for processing mathematical information. <br/> MATLAB is used to stimulate neural networks and fuzzy logic in Data Science. We are able to create powerful visualization by using the MATLAB graphics library. It is also used for image and signal processing. <br/> <br/> <span style="font-size: 12pt;"><strong>6. Tableau-</strong></span><br/> It is a Data Visualization software packed with powerful graphics which are used to make interactive visualization. One of the important aspects of Tableau is its ability to interface with databases, spreadsheets, Online Analytical Processing cubes, etc. It can also be used for plotting longitudes and latitudes in a map. It is an enterprise software, but comes with a free version called Tableau Public.<br/> <br/> <span style="font-size: 12pt;"><strong>7. Matplotlib-</strong></span><br/> This tool is one of the most popular tools for generating graphs with data that has been analyzed. It is a plotting and visualization library that has been developed for Python and is mainly used to plot complex graphs using a simple line of code. We are able to generate bar plots, histograms, etc. This tool is used over other contemporary tools as it is the more preferred tool and is ideal for beginners in learning data visualization with Python. In fact, NASA, during the landing of Phoenix Spacecraft used Matplotlib for illustrating data visualization. <br/> <br/> <span style="font-size: 12pt;"><strong>8. NLTK-</strong></span><br/> One of the most popular fields in Data Science is Natural Language Processing or NLP and deals with the development of statistical models which help computers understand human language. Natural Language Toolkit or NLTK is a collection of libraries that comes under Python language and has been developed for this particular purpose. Word Segmentation, Speech Recognition, Machine Translation etc., are some of the applications. <br/> <br/> <span style="font-size: 12pt;"><strong>9. Scikit-learn-</strong></span> <br/> A library based in Python, Scikit-learn is used for implementing ML Algorithms. It is widely used for analysis and data science because it is a tool that is easy to implement. It makes it easy to use complex ML Algorithms and is therefore used in situations that require rapid prototyping. It is also an ideal platform to perform research which requiring basic ML. Several underlying Python libraries such as Numpy, Matplotlib etc., are used by Scikit-learn.</p>
<p><span style="font-size: 12pt;"><strong>10. TensorFlow-</strong></span></p>
<p>A standard tool for Machine Learning, it is widely used for advanced ML algorithms like Deep Learning. It was named TensorFlow after Tensors, which are multidimensional arrays. It is an open-source toolkit and is known for its performance and high computational abilities. It can run on CPUs as well as GPUs and has emerged on other powerful TPU platforms, this gives TensorFlow an unprecedented edge in terms of the processing power of advanced ML algorithms. It has a variety of applications such as image classification, drug discovery, speech recognition etc., due to its high processing ability. <br/> <br/> <span style="font-size: 12pt;"><em>This is a brief explanation about the various Data Science tools that are available today. Read more <a href="https://www.greatlearning.in/blog/what-is-data-science/" target="_blank" rel="noopener">here</a>.</em></span></p>Microsoft Dynamics 365: How it Changes a Manufacturing Business For the Bettertag:www.datasciencecentral.com,2019-11-25:6448529:BlogPost:9104942019-11-25T07:00:00.000ZRyan Williamsonhttps://www.datasciencecentral.com/profile/RyanWilliamson
<p>If there’s one thing that’s common to all businesses across all industries in the world, it’s that the customer is always the primary focus. While this holds for all sectors and companies as mentioned above, the fact remains that it can be somewhat more relevant to certain types of businesses. Take the manufacturing industry, for example. So, as more and more manufacturing companies seek to adopt an increasingly customer-focused path forward, they realize they need for modern solutions and…</p>
<p>If there’s one thing that’s common to all businesses across all industries in the world, it’s that the customer is always the primary focus. While this holds for all sectors and companies as mentioned above, the fact remains that it can be somewhat more relevant to certain types of businesses. Take the manufacturing industry, for example. So, as more and more manufacturing companies seek to adopt an increasingly customer-focused path forward, they realize they need for modern solutions and technologies that can enable the requisite transformation and also fortify their existing endeavors.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3740396978?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3740396978?profile=RESIZE_710x" class="align-center"/></a>It should come as no surprise that there is an endless list of tools and technologies that claim to offer the capabilities to support the digital transformation manufacturing industries have their sights set on. But in this figurative sea of options, there’s one name that is widely revered as the silver bullet: Microsoft Dynamics 365. It is because it empowers manufacturing businesses to close the rift that typically results from siloed enterprise solutions such as ERP, CRM, and more. And since data has also become pivotal to this pursuit, Dynamics 365 also brings along with its relevant capabilities such as data intelligence and analytics. So, let’s take a closer look at exactly how this robust tool from Microsoft is driving transformation across the global manufacturing industry.</p>
<ol>
<li><strong>Enhanced efficiency</strong>: The staff’s productivity is the bedrock of any successful manufacturing business, but it can be surprisingly challenging to achieve. But with Dynamics 365, companies gain access to machine learning-powered functionalities, analytics, and collaboration features that ensure the company’s employees have all the tools requisite they need to do their job without a hitch.</li>
<li><strong>Better supply chain management</strong>: The key to precise inventory management in the manufacturing world lies in two things: knowing exactly where the inventory is and the ability to make successful predictions about demand. To that end, Microsoft Dynamics 365 helps companies by streamlining warehouse management, quality control, distribution strategy, and procurement of raw material among myriad other things.</li>
<li><strong>Top-notch sales management</strong>: Dynamics 365, of course, offers a world of features and functionalities, but there are some that trump others in importance, such as the ones meant for better sales management. The tools use CRM-based insights and more to help companies to focus on achieving better sales and, thus, improved customer satisfaction.</li>
<li><strong>Improved customer service</strong>: Dynamics 365 helps companies engage with new-age customers, implement a more customer-focused approach, offer several value-added services, and more. And you know what all this helps with? Yes, ensuring high levels of customer satisfaction.</li>
</ol>
<p>Times are changing and, frankly, at a mind-numbing pace. And no matter how challenging this may seem, to remain unbeaten in such a continually evolving environment brimming with competition, businesses must take up modern solutions, like Dynamics 365, to help them organize operations, enhance employee productivity, and more. So, what’s the quickest way to get started with that, you ask. Well, that would be getting in touch with a trusted <a href="https://www.rishabhsoft.com/microsoft/dynamics-crm-consulting-development" target="_blank" rel="noopener">Microsoft Dynamics 365 consulting</a> firm at the earliest possible.</p>What can learning analytics do for accessibility and for disabled students' support?tag:www.datasciencecentral.com,2019-11-25:6448529:BlogPost:9105852019-11-25T01:30:00.000ZMartyn Cooperhttps://www.datasciencecentral.com/profile/MartynCooper
<p><strong><span style="font-size: 14pt;">Introduction</span></strong></p>
<p>I was formally (1998-2016) a Senior Research Fellow in the Institute of Educational Technology (IET) at the Open University (OU) in the UK. It was in that context that I first started thinking about the potential of Learning Analytics in my field which is Accessibility of eLearning and Disabled Student Support. Looking back through my work-related blog (…</p>
<p><strong><span style="font-size: 14pt;">Introduction</span></strong></p>
<p>I was formally (1998-2016) a Senior Research Fellow in the Institute of Educational Technology (IET) at the Open University (OU) in the UK. It was in that context that I first started thinking about the potential of Learning Analytics in my field which is Accessibility of eLearning and Disabled Student Support. Looking back through my work-related blog (<a href="https://martyncooper.wordpress.com/">https://martyncooper.wordpress.com/</a>) I can see early evidence of that thinking going back to 2014. </p>
<p></p>
<p><span style="font-size: 14pt;"><strong>Early Thinking</strong></span></p>
<p>I posted a SlideShare on "Models of Disability, Models of Learning, Accessibility and Learning Technologies" that illustrated a point with the example of learning analytics; see Slide 10 of: <a href="https://www.slideshare.net/martyncooper/models-of-disability-models-of-learning-accessibility-calrg2014">https://www.slideshare.net/martyncooper/models-of-disability-models-of-learning-accessibility-calrg2014</a>. What I had come to realise is that provided the institution collected data on which of its students declared a disability then you can use that information, which general leaning analytics approaches, to improve access for disabled students and to target support for those that were underperforming or even were predicted to underperform. The OU did collect that information so I began exploring what was actually possible here using historic data.</p>
<p></p>
<p><span style="font-size: 14pt;"><strong>Initial Paper Introducing the Field to the World</strong></span></p>
<p>In 2016, working with a couple of colleagues, we wrote what we believe to be a seminal paper introducing these approaches to the world. This paper is freely available from the Open University's repository at: <a href="http://oro.open.ac.uk/45313/">http://oro.open.ac.uk/45313/</a>. If you are interested in this field then this paper would be a good place to start finding out more.</p>
<p></p>
<p><span style="font-size: 14pt;"><strong>Ongoing Work</strong></span></p>
<p>The same team that wrote the above-mentioned paper with a couple of additional authors are in the process of undertaking some data mock-ups and writing a journal paper for the Journal of Learning Analytics (<a href="https://www.solaresearch.org/journal/">https://www.solaresearch.org/journal/</a>) entitled: "Data Science Promoting Inclusion in Education". It is anticipated that this will be submitted for peer review in February 2020.</p>
<p></p>
<p><strong><span style="font-size: 14pt;">Conclusion</span></strong></p>
<p>This blog post has briefly introduced the topic of Learning Analytics for accessibility and the targeted support of disabled students. It has pointed to published resources that have discussed this topic. If you are interested in joining the small existing team working in this field then please post as such in the comments below. Any other questions or points on the topic are of course also welcome.</p>Weekly Digest, November 25tag:www.datasciencecentral.com,2019-11-24:6448529:BlogPost:9105812019-11-24T23:30:00.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" rel="noopener" target="_blank">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" rel="noopener" target="_blank">follow this link</a>. …</span><br></br></p>
<p><span>Monday newsletter published by Data Science Central. Previous editions can be found <a href="https://www.datasciencecentral.com/page/previous-digests" target="_blank" rel="noopener">here</a>. The contribution flagged with a + is our selection for the picture of the week. To subscribe, <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">follow this link</a>. </span><br/> <span><a href="https://www.datasciencecentral.com/profiles/blogs/weekly-digest-september-16-1"></a></span></p>
<p><strong>Announcement</strong></p>
<ul>
<li><span>If you need people to process some portion of the big data that feeds your artificial intelligence, you need a reliable workforce. More businesses are using in-house staff, contractors & crowdsourcing to get this work done. <a href="https://dsc.news/32IRj1t" target="_blank" rel="noopener">Download this whitepaper</a> and learn how to determine what kind of AI data work to outsource and best practices to <a href="https://dsc.news/32IRj1t" target="_blank" rel="noopener">Accelerate and Scale High-Quality Data for AI</a>.</span></li>
</ul>
<div><div><p><span><strong>Featured Resources and Technical Contributions </strong></span></p>
<ul>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/orchestrating-dynamic-reports-in-python-and-r-with-rmd-files">Orchestrating Dynamic Reports in Python and R with Rmd Files</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/why-including-effect-size-and-knowing-your-statistical-power-are">Why Including Effect Size and Knowing your Statistical Power </a>are Important +</span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/fun-with-maps-part-1">Fun with Maps: Part 1</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/quantum-supremacy-by-google-explained-in-easy-way">Quantum Supremacy by Google. Explained in Easy way</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/machine-learning-s-limits-part-1-why-machine-learning-works-in">Why machine learning works in some cases and not in others</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/what-is-map-reduce-programming-and-how-does-it-work">What is Map Reduce Programming and How Does it Work</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/kaggle-s-rachel-tatman-on-what-to-do-when-applying-deep-learning">What to do when applying Deep Learning is overkill</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/a-guide-to-predictive-analysis-in-r-1">A Guide to Predictive Analysis in R</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/importance-of-big-data-analytics-tool-hadoop">Top Hadoop Certifications</a></span></li>
</ul>
<p><span><strong>Featured Articles</strong></span></p>
<ul>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/how-ai-is-manipulating-economics-to-create-appreciating-assets">How AI Is Manipulating Economics to Create Appreciating Assets</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/building-the-high-performing-team-for-enterprise-analytics">Building the High Performing Team for Enterprise Data Analytics</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-implement-digital-transformation-in-an-enterprise-using">How to implement Digital Transformation in an Enterprise using AI?</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/visually-explained-how-can-executives-make-sense-of-machine">Visually Explained: How Can Executives Make Sense of Machine Learning </a>& Deep Learning?</span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/gartner-and-forrester-begin-to-weigh-in-on-automated-machine-lear">Gartner and Forrester Weigh in on Automated Machine Learning</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/why-did-your-chatbot-fail-miserably">Why did your chatbot fail miserably?</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/need-a-fast-safe-amp-flexible-app-go-for-cloud-based-mobile-apps">About Cloud Based Mobile Apps</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/top-7-data-science-use-cases-in-trust-and-security-1">Top 7 Data Science Use Cases in Trust and Security</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/how-to-leverage-deep-learning-for-automation-of-mobile">How To Leverage Deep Learning For Automation Of Mobile Applications</a></span></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/blog-descriptive-versus-predictive-analytics">Descriptive Versus Predictive Analytics</a></span></li>
</ul>
<p><span><strong>Picture of the Week</strong></span></p>
<p><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/3739933953?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3739933953?profile=RESIZE_710x" class="align-center"/></a></span></p>
<p style="text-align: center;"><span><em>Source: article flagged with a + </em></span></p>
<p style="text-align: center;"></p>
<p><strong>From our Sponsors</strong></p>
<ul>
<li><span><a href="https://dsc.news/32IRj1t" target="_blank" rel="noopener">Accelerate and Scale Quality Data for AI</a><span> </span>- Whitepaper</span></li>
<li><span><a href="https://dsc.news/33n52Mp" target="_blank" rel="noopener">Tune Your Machine Learning Algorithm</a><span> </span>- Dec 5</span></li>
<li><span><a href="https://dsc.news/33L5bcL" target="_blank" rel="noopener">Automating Regulatory Compliance with Data Wrangling</a><span> </span>- Dec 10</span></li>
<li><span><a href="https://dsc.news/2qSPyBw" target="_blank" rel="noopener">ML/AI Models: Continuous Integration & Deployment</a><span> </span>- Dec 11</span></li>
<li><span><a href="https://dsc.news/2pw1kl2" target="_blank" rel="noopener">Real-Time Analytics at Scale with High Velocity Data</a><span> </span>- Dec 12</span></li>
<li><span><a href="https://dsc.news/2QBhyEC" target="_blank" rel="noopener">From Degas to Dashboards: Lessons of the Great Masters</a><span> </span>- Dec 17</span></li>
<li><a href="https://dsc.news/2J5nrFB" target="_blank" rel="noopener"></a><a href="https://www.datasciencecentral.com/group/announcements/forum/topics/data-science-analytics-experts-converge-at-mads-west" target="_blank" rel="noopener">Data Science & Analytics Experts Converge at MADS West</a></li>
<li><a href="https://www.datasciencecentral.com/group/announcements/forum/topics/realizing-the-benefits-of-automated-machine-learning" target="_blank" rel="noopener">Realizing the benefits of Automated Machine Learning</a></li>
<li><span><a href="https://www.datasciencecentral.com/group/announcements/forum/topics/ud-analytics-master-s-online-gmat-waiver-avail" target="_blank" rel="noopener">UD Analytics Master's Online — GMAT Waiver Avail.</a></span></li>
<li><a href="https://www.datasciencecentral.com/group/announcements/forum/topics/an-ms-in-data-science-online-for-any-background" target="_blank" rel="noopener">An MS in Data Science Online for Any Background</a></li>
<li><a href="https://www.datasciencecentral.com/group/announcements/forum/topics/build-an-ai-center-of-excellence-ebook" target="_blank" rel="noopener">Build an AI Center of Excellence | eBook</a></li>
<li><a href="https://www.datasciencecentral.com/group/announcements/forum/topics/data-revolution-join-this-virtual-forum" target="_blank" rel="noopener">Data Revolution: Join this virtual forum</a></li>
</ul>
<p><strong>New Books and Resources for DSC Members</strong><span> </span>- [<a href="https://www.datasciencecentral.com/profiles/blogs/new-books-and-resources-for-dsc-members">See Full List</a>]</p>
<ul>
<li><span><a href="https://dsc.news/2IyZgPk" rel="noopener" target="_blank">Getting Started with TensorFlow 2.0</a></span></li>
<li><a href="https://dsc.news/2pZ2aXt" rel="noopener" target="_blank">Online Encyclopedia of Statistical Science</a></li>
<li><a href="https://dsc.news/2IByRkm" rel="noopener" target="_blank">Statistics -- New Foundations, Toolbox, and Machine Learning Recipes</a></li>
<li><span><a href="https://dsc.news/2EbQCo4" rel="noopener" target="_blank">Classification and Regression In a Weekend</a></span></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes" rel="noopener" target="_blank">Applied Stochastic Processes</a></li>
<li><span><a href="https://www.datasciencecentral.com/profiles/blogs/new-book-enterprise-ai-an-applications-perspective" rel="noopener" target="_blank">Enterprise AI - An Applications Perspective</a></span></li>
</ul>
<p style="text-align: center;"></p>
<p><span>To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click <a href="https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter" target="_blank" rel="noopener">here</a>. Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a>.</span></p>
</div>
</div>
<div id="insideblog"><div class="dscAdAppear"></div>
</div>How to implement Digital Transformation in an Enterprise using artificial intelligence?tag:www.datasciencecentral.com,2019-11-24:6448529:BlogPost:9106092019-11-24T17:54:35.000Zajit jaokarhttps://www.datasciencecentral.com/profile/ajitjaokar
<p></p>
<p> <a href="https://storage.ning.com/topology/rest/1.0/file/get/3739574798?profile=original" rel="noopener" target="_blank"><img class="align-full" src="https://storage.ning.com/topology/rest/1.0/file/get/3739574798?profile=RESIZE_710x"></img></a></p>
<p> </p>
<p>Digital transformation is getting some traction now.</p>
<p>There are many definitions of digital transformation.</p>
<p>For example, according to salesforce <a href="https://www.salesforce.com/products/platform/what-is-digital-transformation/">digital transformation</a> is - <em>Digital transformation is the process of using…</em></p>
<p></p>
<p> <a href="https://storage.ning.com/topology/rest/1.0/file/get/3739574798?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3739574798?profile=RESIZE_710x" class="align-full"/></a></p>
<p> </p>
<p>Digital transformation is getting some traction now.</p>
<p>There are many definitions of digital transformation.</p>
<p>For example, according to salesforce <a href="https://www.salesforce.com/products/platform/what-is-digital-transformation/">digital transformation</a> is - <em>Digital transformation is the process of using digital technologies to create new — or modify existing — business processes, culture, and customer experiences to meet changing business and market requirements. This reimagining of business in the digital age is digital transformation.</em></p>
<p>But, how exactly do implement digital transformation in an enterprise using artificial intelligence?</p>
<p>In a recent workshop, I proposed a set of steps for implementing digital transformation in an enterprise using artificial intelligence as a <a href="https://en.wikipedia.org/wiki/Thought_experiment">gedankenexpriment (thought experiment)</a></p>
<p>The steps are</p>
<ol>
<li>For an existing process in an enterprise, break down the steps of the process</li>
<li>For each step, model the process as best as you can by using data and algorithms (supervised, unsupervised, reinforcement)</li>
<li>Then remove humans entirely from the process</li>
<li>Automate the process as much as you can by using the models in the previous step</li>
<li>Then bring back the humans as experts in specific steps of the process as needed i.e. rethink humans as experts</li>
<li>Then constantly think of improving the new process by learning from experience (improving the models with the human experts and the models working together)</li>
</ol>
<p>The steps are radical because you remove humans and then add them back.</p>
<p>However, this thought experiment approach forces you to rethink the process from first principles with AI at the centre</p>
<p>Comments welcome</p>
<p>Image source: <a href="https://erenow.net/biographies/einstein-for-dummies/10.php">Einstein for dummies</a> – based on one of the best-known thought from Einstein i.e. how would the world look like from a beam of light. You can follow me on <a href="https://www.linkedin.com/in/ajitjaokar/">linkedin at Ajit Jaokar</a> </p>Decision Scientist vs. Data Scientisttag:www.datasciencecentral.com,2019-11-24:6448529:BlogPost:9105632019-11-24T17:47:11.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><em>Originally posted by Igor Bobriakov.</em></p>
<p class="justifyfull" dir="ltr"><span>Data science has become a</span><span> </span><span>widely used term and a</span><span> </span><span>buzzword as well. It is a broad field representing a combination of multiple disciplines. However, there are adjacent areas that deserve proper attention and should not be confused with data science. One of them is decision science. Its importance should not be underestimated, so it is useful to know the…</span></p>
<p><em>Originally posted by Igor Bobriakov.</em></p>
<p class="justifyfull" dir="ltr"><span>Data science has become a</span><span> </span><span>widely used term and a</span><span> </span><span>buzzword as well. It is a broad field representing a combination of multiple disciplines. However, there are adjacent areas that deserve proper attention and should not be confused with data science. One of them is decision science. Its importance should not be underestimated, so it is useful to know the actual differences and peculiarities of these two fields. Data science and decision science are related but still separate fields, so at some points, it might be hard to compare them directly.</span></p>
<p class="justifyfull" dir="ltr"><span>In general, data scientist is a specialist involved in finding insights from data after this data has been collected, processed, and structured by data engineer. Decision scientist considers data as a tool to make decisions and solve business problems.</span><span> </span><span>To demonstrate other differences, we decided to prepare an infographic which puts data science and decision science in contrast according to several criteria. Let’s dive right in.<a rel="nofollow" href="https://www.datasciencecentral.com/profiles/blogs/content/blog/Data%20Science%20vs.%20Decision%20Science/ds-vs-ds-bigger-font07.png"></a></span></p>
<p class="justifyfull" dir="ltr"><span><a href="https://storage.ning.com/topology/rest/1.0/file/get/2055076333?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/2055076333?profile=RESIZE_710x" class="align-full"/></a></span></p>
<p class="justifyfull" dir="ltr"><span>In terms of definition, data science appears to be an interdisciplinary field that uses scientific algorithms, methods, techniques and various approaches to extract valuable insights. Thus, its primary purpose is to reveal the insights from data for further application to the benefit of the various industries. In contrast, decision science is an application of a complex of quantitative techniques to the decision-making process. Its purpose is to apply the data-driven insights in combination with the elements of cognitive science to policies planning and development. So, data is equally important for both, yet the mechanisms are quite different.</span></p>
<p class="justifyfull" dir="ltr"><span>Now, let's move on to the areas of application. Data science is applied in numerous industries like retail, FMCG, entertainment, media, healthcare, insurance, telecommunication, finance, travel, manufacturing, agriculture, sports, etc. Decision science touches more theoretical areas of business and management, law and education, environmental regulation, military science, public health, and public policy.</span></p>
<p class="justifyfull" dir="ltr"><span>Critical challenges the specialists face in these areas also vary. For instance, data scientists struggle with the problems of dirty data, difficulties in sourcing development, security issues, etc. Decisions scientists search for new ways to overcome the lack of reliable data, difficulties caused by complex data environments, and complexity of applied techniques. They should possess knowledge in math, finance, and analytics to make the right decision.</span></p>
<p class="justifyfull" dir="ltr"><span>Finally, let’s consider future trends shedding the light on further development and prospects of data science and decision science. According to our expectations, data science will continue its way towards automation, further evolution and extensive use of chatbots and virtual assistants. There will be widespread use of augmented reality elements, further robotization of industries and increasing popularity of reinforcement learning. In contrast, decision science will continue to move us towards automated decision-making and data empowerment. For sure, it is going to achieve vital importance and broad application in industries which will result in increasing demand in specialists. </span></p>
<h2 dir="ltr"><strong>Conclusion</strong></h2>
<p class="justifyfull" dir="ltr"><span>Data science can be a crucial component of decision science and quite often business owners rely on data science as on a solution to all their problems and worries. However, it is not enough to only use data science. The truth is somewhere in between data science and decision science.</span></p>
<p class="justifyfull" dir="ltr"><span>We attempted to show our vision of the commonalities, differences, and specific features of data science and decision science. If you have some ideas or thoughts related to this infographic, feel free to share them for further discussion in the comment section.</span></p>How to Build Customize Maps in Power BI?tag:www.datasciencecentral.com,2019-11-24:6448529:BlogPost:9102792019-11-24T17:43:39.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><em>Originally posted by Deepak Kumar Gupta. A another interesting article on this topic can be found <a href="https://medium.com/weareservian/power-bi-custom-maps-part-ii-shape-map-939873da3f66" rel="noopener" target="_blank">here</a>. </em></p>
<div class="mceTemp">In Today’s data era, it is very important to represent data in a way which is suitable for every type of user (Technical or Non-Technical). Use of data visualizations is getting increased exponentially. We have various data…</div>
<p><em>Originally posted by Deepak Kumar Gupta. A another interesting article on this topic can be found <a href="https://medium.com/weareservian/power-bi-custom-maps-part-ii-shape-map-939873da3f66" target="_blank" rel="noopener">here</a>. </em></p>
<div class="mceTemp">In Today’s data era, it is very important to represent data in a way which is suitable for every type of user (Technical or Non-Technical). Use of data visualizations is getting increased exponentially. We have various data visualization tools available in the market such as QlikView, Power BI, Tableau, Google DI studio.</div>
<div class="mceTemp"></div>
<p>Power BI provides an advantage to create custom design/custom visual of any image, maps, floor plans not restricted to geographical maps. Visuals can be dynamically designed based on the values of the measures such as Colours. Synoptic Design by okViz assists users to accomplish this task.</p>
<p>This tutorial will guide you through a step by step process of how to use the synoptic panel in Power BI.</p>
<p><strong><a href="https://storage.ning.com/topology/rest/1.0/file/get/3739562468?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3739562468?profile=RESIZE_710x" class="align-center"/></a></strong></p>
<p><strong>Step 1:<span> </span></strong>Download any image of your choice or requirement. For example, a hex map of USA as shown below:</p>
<p><strong>Step 2:<span> </span></strong>Open a web browser (Google Chrome, Internet Explorer, etc) and go to<span> </span><a href="http://synoptic.design/"><strong>http://synoptic.design</strong></a><span> </span>and drag your chosen image<span> </span>on<span> </span>the canvas.</p>
<p><strong>Step 3:</strong> Now you can select the area you want to label your map. Place the cursor on the canvas and it will automatically select the area and create yellow<span> </span>coloured<span> </span>boundary. Alternatively, you can write the area coordinates and it will display the selected area on the canvas.</p>
<p><strong>Step 4:</strong> Like step 3, you can select multiple areas and label them as per your requirements.</p>
<p><strong>Step 5:</strong> Once you are done selecting areas and labeling them. You can export the image by clicking Export to Power BI button present in the bottom right corner of the website.</p>
<p>Once you select Export to Power BI, a pop up will open and you can then right-click the image and save image as “.svg” file at your desktop.</p>
<p><strong>Step 6:</strong> Open Power BI on your desktop and import your dataset into .pbix file and import synoptic design by<span> </span>okviz<span> </span>from the store. Then select category and measure and drag them to the visualization.</p>
<p><strong>Step 7:</strong> You can change the intensity of the colors while changing states by applying rules as desired by the business problem.<img class="aligncenter wp-image-683 size-large img-responsive" src="http://www.reckonanalytics.com/wp-content/uploads/2018/02/10-1-1024x486.png" alt="" width="688" height="327"/></p>
<p>You can play with the visualization as per your requirement and change the variables and states accordingly.</p>
<p><em>Read more <a href="http://www.reckonanalytics.com/how-to-build-customize-maps-in-power-bi/" target="_blank" rel="noopener">here</a>.</em> <em> </em></p>Email Classification into relevant labels using Neural Networkstag:www.datasciencecentral.com,2019-11-24:6448529:BlogPost:9106052019-11-24T17:30:27.000ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><em>Originally posted by Deepak Kumar Gupta.</em></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3739551027?profile=original" rel="noopener" target="_blank"><img class="align-center" src="https://storage.ning.com/topology/rest/1.0/file/get/3739551027?profile=RESIZE_710x"></img></a></p>
<p><strong>Abstract</strong></p>
<p>In the real world, many online shopping websites or service provider have single email-id where customers can send their query, concern etc. At the back-end service provider receive million of emails every week, how they can identify which email is belonged of a…</p>
<p><em>Originally posted by Deepak Kumar Gupta.</em></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3739551027?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3739551027?profile=RESIZE_710x" class="align-center"/></a></p>
<p><strong>Abstract</strong></p>
<p>In the real world, many online shopping websites or service provider have single email-id where customers can send their query, concern etc. At the back-end service provider receive million of emails every week, how they can identify which email is belonged of a particular department? This paper presents an artificial neural network (ANN) model that is used to solve this problem and experiments are carried out on user personal Gmail emails datasets. This problem can be generalised as typical Text Classification or Categorization [8].</p>
<p><strong>Keywords:</strong> Artificial Neural Network, Email Classification, Natural Computing, Text Categorization</p>
<p><em>View full article <a href="https://arxiv.org/abs/1802.03971" target="_blank" rel="noopener">here</a>.</em></p>What is Map Reduce Programming and How Does it Worktag:www.datasciencecentral.com,2019-11-24:6448529:BlogPost:8432682019-11-24T05:30:00.000ZDivya Singhhttps://www.datasciencecentral.com/profile/DivyaSingh456
<h3>Introduction</h3>
<p>Data Science is the study of extracting meaningful insights from the data using various tools and technique for the growth of the business. Despite its inception at the time when computers came into the picture, the recent hype is a result of the huge amount of unstructured data that is getting generated and the unprecedented computational capacity that modern computers possess.</p>
<p>However, there is a lot of misconception among the masses about the true meaning of…</p>
<h3>Introduction</h3>
<p>Data Science is the study of extracting meaningful insights from the data using various tools and technique for the growth of the business. Despite its inception at the time when computers came into the picture, the recent hype is a result of the huge amount of unstructured data that is getting generated and the unprecedented computational capacity that modern computers possess.</p>
<p>However, there is a lot of misconception among the masses about the true meaning of this field with many of the opinion that it is about predicting future outcomes from the data. Though predictive analytics is a part of Data Science, it is certainly not all of what Data Science stands for. In an analytics project, the first and foremost role is to get the build the pipeline and get the relevant data to perform predictive analytics later on. The professional who is responsible for building such ETL pipelines and the creating the system for flawless data flow is the Data Engineer and this field is known as Data Engineering.</p>
<p>Over the years the role of Data Engineers has evolved a lot. Previously it was about building Relational Database Management System using Structured Query Language or run ETL jobs. These days, the plethora of unstructured data from a multitude of sources has resulted in the advent of Big Data. It is nothing but a different forms of voluminous data which carries a lot of information if mined properly.</p>
<p>Now, the biggest challenge that professionals face is to analyse these huge terabytes of data which traditional file storage systems are incapable of handling. This problem was resolved by Hadoop which is an open-source Apache framework built to process large data in the form of clusters. Hadoop has several components which takes care of the data and one such component is known as Map Reduce.</p>
<p> </p>
<h3>What is Hadoop?</h3>
<p>Created by Doug Cutting and Mike Cafarella in 2006, Hadoop facilitates distributed storage and processing of huge data sets in the form parallel clusters. HDFS or Hadoop Distributed File System is the storage component of Hadoop where different file formats could be stored to be processed using the Map Reduce programming which we would cover later on in this article.</p>
<p>The HDFS runs on large clusters and follows a master/slave architecture. The metadata of the file i.e., information about the relative position of the file in the node is managed by the NameNode which is the master and could save several DataNodes to store the data. Some of the other components of Hadoop are –</p>
<ul>
<li>Yarn – It manages the resources and performs job scheduling.</li>
<li>Hive – It allows users to write SQL-like queries to analyse the data.</li>
<li>Sqoop – Used for to and fro structured data transfer between the Hadoop Distributed file System and the Relational Database Management System.</li>
<li>Flume – Similar to Sqoop but it facilitates the transfer of unstructured and semi-structured data between the HDFS and the source.</li>
<li>Kafka – A messaging platform of Hadoop.</li>
<li>Mahout – It used to create Machine Learning operations on big data.</li>
</ul>
<p>Hadoop is a vast concept and in detail explanation of each components is beyond the scope of this blog. However, we would dive into one of its components – Map Reduce and understand how it works.</p>
<p> </p>
<h3>What is Map Reduce Programming</h3>
<p>Map Reduce is the programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop Cluster, i.e. suppose you have a job to run and you write the Job using the MapReduce framework and then if there are a thousand machines available, the Job could run potentially in those thousand machines.</p>
<p>The Big Data is not stored traditionally in HDFS. The data gets divided into chunks of small blocks of data which gets stored in respective data nodes. No complete data’s present in one centralized location and hence a native client application cannot process the information right away. So a particular framework is needed with the capability of handling the data that stays as blocks of data into respective data nodes, and the processing can go there to process that data and bring back the result. In a nutshell, data is processed in parallel which makes processing faster.</p>
<p>To improve performance and for better efficiency, the idea of parallelization was developed. The process is automated and concurrently executed. The instructions which are fragmented could also run on a single machine or on different CPU’s. To gain direct disk access, multiple computers uses SAN or Storage Area Networks which is a common type of Clustered File System unlike the Distributed File Systems which sends the data using the network.</p>
<p>One term that is common in this maser/slave architecture of data processing is Load Balancing where among the processors the tasks are spread to avoid overload on any DataNode. Unlike the static balancers, there is more flexibility provided by the dynamic balancers.</p>
<p>The Map-Reduce algorithm which operates on three phases – Mapper Phase, Sort and Shuffle Phase and the Reducer Phase. To perform basic computation, it provides abstraction for Google engineers while hiding fault tolerance, parallelization, and load balancing details.</p>
<ul>
<li>Map Phase – In this stage, the input data is mapped into intermediate key-value pairs on all the mappers assigned to the data.</li>
<li>Shuffle and Sort Phase – This phase acts as a bridge between the Map and the Reduce phase to decrease the computation time. The data here is shuffled and sorted simultaneously based on the keys i.e., all intermediate values from the mapper phase is grouped together with respect to the keys and passed on to reduce function.</li>
<li>Reduce Phase– The sorted data is the input to the Reducer which aggregates the value corresponding to each key and produces the desired output.</li>
</ul>
<p> </p>
<h3>How Map Reduce works</h3>
<ul>
<li>Across multiple machines, the Map invocations are distributed and the input data is automatically partitioned into M pieces of size sixteen to sixty four megabytes per piece. On a cluster of machines, many copies of the program are then started up.</li>
</ul>
<ul>
<li>Among the copies, one is the master copy while the rest are the slave copies. The master assigns M map and R reduce tasks to the slaves. Any idle worker would be assigned a task by the master.</li>
</ul>
<ul>
<li>The map task worker would read the contents of the input and pass key-value pairs to the Map function defined by the user. In the memory buffer, the intermediate key-value pairs would be produced.</li>
</ul>
<ul>
<li>To the local disk, the buffered pairs are written in a periodic fashion. The partitioning function then partitions them into R regions. The master would forward the location of the buffered key-value pairs to the reduce workers.</li>
</ul>
<ul>
<li>The buffered data is read by the reduce workers after getting the location from the master. Once it is read, the data is sorted based on the intermediate keys grouping similar occurrences together.</li>
</ul>
<ul>
<li>The Reduce function defined the user receives a set of intermediate values corresponding to each unique intermediate key that it encounters. The final output file would consists of the appended output from the Reduce function.</li>
</ul>
<ul>
<li>The user program is woken up by the Master once all the Map and Reduce tasks are completed. In the R output files, the successful MapReduce execution output could be found.</li>
</ul>
<ul>
<li>Each and every worker’s aliveness is checked by the master after the execution by sending periodic pings. If any worker does not respond to the ping, it is marked as failed after a certain point if time and its previous works are reset.</li>
</ul>
<ul>
<li>In case of failures, the map tasks which are completed would be re-executed as their output would be inaccessible in the local disk. Output which are stored in the global file system need not to be re-executed.</li>
</ul>
<p> </p>
<p><strong>Some of the examples of Map Reduce programming are –</strong></p>
<ul>
<li>Map Reduce programming could count the frequencies of the URL access. The logs of web page would be processed by the map function and stored as output say <URL, 1> which would be processed by the Reduce function by adding all the same URL and output their count.</li>
</ul>
<ul>
<li>Map Reduce programming could also be used to parse documents and count the number of words corresponding to each document.</li>
</ul>
<ul>
<li>For a given URL, the list of all the associated source URL’s could be obtained with the help of Map Reduce.</li>
</ul>
<ul>
<li>To calculate per host term vector, the map reduce programming could be used. The hostname and the term vector pair would be created for each document by the Map function which would be processed by the reduce function which in turn would remove less frequent terms and give a final hostname, term vector.</li>
</ul>
<p> </p>
<h3>Conclusion</h3>
<p>Data Engineering is a key step in any Data Science project and Map Reduce is undoubtedly an essential part of it. In this article we have a brief intuition about Big Data and provided an overview of Hadoop. Then we explained Map Reduce programming and its workflow and gave few real life applications of Map Reduce programming as well.</p>
<p><em>Read more <a href="http://dimensionless.in/blog">here</a>.</em></p>
<div class="ss-inline-share-wrapper ss-left-inline-content ss-small-icons ss-with-spacing ss-rounded-icons"></div>How AI Is Manipulating Economics to Create Appreciating Assetstag:www.datasciencecentral.com,2019-11-24:6448529:BlogPost:9103662019-11-24T03:42:31.000ZBill Schmarzohttps://www.datasciencecentral.com/profile/BillSchmarzo
<p><em>“If you buy a Tesla today, I believe you're buying an <strong>appreciating</strong> asset, not a <strong>depreciating</strong> asset.”</em> – Elon Musk</p>
<p> </p>
<p>Think about that statement for a second…you’re buying an <strong><em>appreciating</em></strong> asset, not a <strong><em>depreciating</em></strong> asset.<span> </span>And what is driving the <strong><em>appreciation</em></strong> of that asset?<span> </span> It’s likely courtesy of Tesla’s FSD (Full Self-Driving) Deep…</p>
<p><em>“If you buy a Tesla today, I believe you're buying an <strong>appreciating</strong> asset, not a <strong>depreciating</strong> asset.”</em> – Elon Musk</p>
<p> </p>
<p>Think about that statement for a second…you’re buying an <strong><em>appreciating</em></strong> asset, not a <strong><em>depreciating</em></strong> asset.<span> </span>And what is driving the <strong><em>appreciation</em></strong> of that asset?<span> </span> It’s likely courtesy of Tesla’s FSD (Full Self-Driving) Deep Reinforcement Learning Autopilot brain.<span> </span> Tesla cars become “smarter” and consequently more valuable with every mile each of the 400,000 Autopilot-equipped cars are driven.</p>
<p>Imagine a mindset of leveraging Deep Reinforcement Learning with new operational data to create products (vehicles, trains, cranes, compressors, chillers, turbines, drills) that appreciate with usage because the products are getting more reliable, more predictive, more efficient, more effective, safer and consequently more valuable. That’s H-U-G-E!</p>
<p>An asset that appreciates in value through usage and learning is yet another example of how a leading organization can exploit the unique characteristics of digital assets that not only never deplete or wear out but can be used across an unlimited number of use cases at a near zero marginal cost.</p>
<p>Let’s talk about what this means to YOUR organization!</p>
<h1><strong>Autonomous Vehicles Require a Mobile IoT Platform</strong></h1>
<p>Autonomous vehicles generate vast amounts of data courtesy of the external facing cameras, radar, <a href="https://en.wikipedia.org/wiki/Lidar">LIDAR</a>, ultrasonic sensors and GPS.<span> </span> Consequently, autonomous vehicles basically require a modern Internet of Things (IoT) technology architecture that supports both IoT Edge analytics as well as a Core analytics where combinations of Machine Learning and Deep Reinforcement Learning make the decisions that guide the successful operations of the vehicle (see Figure 1).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3738751440?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3738751440?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure</strong> <strong><span>1</span></strong><strong>: <span> </span>Autonomous Vehicle IoT Architecture</strong></p>
<p>An autonomous vehicle is comprised of multiple edge devices capturing and analyzing real-time time series data at the subsystem level (braking, navigation, climate control, transmission, engine, suspension).<span> </span> And then a centralized advanced analytics model (like Tesla’s FSD) that aggregates and integrates the data in real-time from the different edge subsystems to optimize the performance, reliability, costs, emissions and safety of the autonomous vehicle.</p>
<h1><strong>New Monetization Opportunities</strong></h1>
<p>The real-time object detection capabilities of an autonomous vehicle are not only necessary to ensure the safe and efficient operations of the vehicle, but introduce new opportunities to monetize the customer, product and operational insights gathered during the autonomous vehicle’s interaction with its environment (see Figure 2).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3738751653?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3738751653?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure</strong> <strong><span>2</span></strong><strong>: Environmental View from Autonomous Vehicle</strong></p>
<p>Imagine the new opportunities to monetize new customer, product and operational insights based upon the autonomous vehicle’s interaction with its environment. What is Tesla learning about the environment in which it operates?<span> </span> What sorts of customer, product and operational insights is Tesla learning as it goes about its normal operations?<span> </span></p>
<p>Here are some monetization ideas that came from my University of San Francisco Big Data MBA class:</p>
<ul>
<li><span>Real-time traffic congestion, accidents and navigational recommendations</span></li>
<li><span>Road maintenance recommendations</span></li>
<li><span>Popular destinations across multiple demographic and geographical dimensions</span></li>
<li><span>Branded vehicles operating around them and the condition of those vehicles</span></li>
<li><span>Number of cars in a store, office or factory parking lot</span></li>
<li><span>Location of available parking spots</span></li>
<li><span>Location of derelict and/or stolen vehicles</span></li>
<li><span>Illegal garbage or waste dumping</span></li>
<li><span>And more…</span></li>
</ul>
<p>As a homework assignment, please send me your monetization ideas (twitter @schmarzo).</p>
<h1><strong>Basic Introduction to Real-time Object Detection<a name="_ftnref1"><span>[1]</span></a></strong></h1>
<p>One critical feature of the autonomous vehicle is its ability to leverage video analytics for real-time object detection in identifying the location and movement of surrounding objects.<span> </span> A recent commercial shows how Microsoft is using object detection (AI) to identify snow leopards (see Figure 3).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3738751870?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3738751870?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure</strong> <strong><span>3</span></strong><strong>:</strong> <a href="https://www.b2bnn.com/2019/09/microsoft-snow-leopard-tv-commercial/"><strong>Microsoft TV commercial uses snow leopards to demonstrate business case for AI</strong></a></p>
<p>Object detection identifies the location and movement of objects in a view using a Neural Network technique called <a href="https://medium.com/@alittlepain833/simple-understanding-of-mask-rcnn-134b5b330e95">Mask R-CNN</a> (Region Convolutional Neural Network). Mask R-CNN proposes a bunch of boxes within an image and checks if any of these boxes (or regions) contain an object. <span> </span>Mask R-CNN is an instance segmentation model that enables pixel-level analysis location in segmenting a scene or view into individual objects, i.e. individual cars, pedestrians, stop lights, street signs, bicyclist. </p>
<p>Figure 4 is an example of how an object detection algorithm works. Each object in the image, from a person to a kite, have been located and identified with a certain level of precision.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3738752189?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3738752189?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure</strong> <strong><span>4</span></strong><strong>: Object Detection Example</strong></p>
<p>Mask R-CNN is a combination of <a href="https://medium.com/@alittlepain833/simple-understanding-of-mask-rcnn-134b5b330e95">Faster R-CNN</a> that does object detection (class + bounding box) and <a href="https://www.quora.com/How-is-Fully-Convolutional-Network-FCN-different-from-the-original-Convolutional-Neural-Network-CNN">FCN (Fully Convolutional Network</a>) that does pixel-wise boundary analysis (see Figure 5).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3738752394?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3738752394?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure</strong> <strong><span>5</span></strong><strong>:<span> </span> Faster R-CNN Processing Flow</strong></p>
<p>We pass an image to the network, and it is then sent through various convolutions and pooling layers. Finally, we get the output in the form of the object’s class (pedestrian, vehicle, traffic light, bicyclist, etc.).</p>
<h1><strong>Exploiting the Economic Value of Learning</strong></h1>
<p>The Elon Musk quote is yet another example of a company that is exploiting the unique economic characteristics of data and analytics – assets that not only never wear out or deplete and can be used across an unlimited number of use cases at near zero marginal cost, while the assets acquire more value – become more predictive, more accurate, more complete – through use!</p>
<p>As I discussed in the ground-breaking blog “<a href="https://www.linkedin.com/pulse/digital-transformation-economies-learning-more-than-scale-schmarzo/">In Digital Transformation, Economies of Learning More Powerful than Economies of Scale</a>”, the “Economies of Learning” are more powerful than “Economies of Scale” because of the ability to build powerful AI models that learn and re-deploy those learnings within digital assets faster and at lower risk (see Figure 6).</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3738752620?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3738752620?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure</strong> <strong><span>6</span></strong><strong>:</strong> <a href="https://www.linkedin.com/pulse/digital-transformation-economies-learning-more-than-scale-schmarzo/"><strong>Schmarzo Economic Digital Asset Valuation Theorem</strong></a></p>
<p>The “<a href="https://www.linkedin.com/pulse/digital-transformation-economies-learning-more-than-scale-schmarzo/">Schmarzo Economic Digital Asset Theorem</a>” yields three key economic “effects”:</p>
<ul>
<li><strong><span>Effect #1: Marginal Costs Flatten.</span></strong> <span>Since data never depletes, never wears out and can be reused at near zero marginal cost, marginal costs of sharing and reusing “curated” data and packaged analytic modules (Solution Cores) flattens.</span></li>
<li><strong><span>Effect #2: Economic Value Grows.</span></strong> <span>Sharing and reusing the data and packaged Analytic Modules (Solution Cores) shrinks time-to-value and de-risks subsequent use cases.</span></li>
<li><strong><span>Effect #3: Economic Value Accelerates</span></strong><span>. Economic value of previous use cases increases through the accumulative appreciate (learning) of the packaged analytic modules (Solution Cores) that lifts the value of all associated use cases. I will now call this the “Elon Musk Tesla Effect”!</span></li>
</ul>
<h1><strong>Summary</strong></h1>
<p><em>“If you buy a Tesla today, I believe you're buying an <strong>appreciating</strong> asset, not a <strong>depreciating</strong> asset.”</em> – Elon Musk</p>
<p>This aspirational statement may very well be the most powerful insight that I have heard during this age of digital transformation.<span> </span> Not only are data and analytics assets that never deplete, never wear out and can be used across an unlimited number of use cases at a near zero marginal cost, but power digital transformation through appreciating (not depreciating) from use.<span> </span> How can that happen?<span> </span> Because the models that power digital transformation, like TensorFlow for Google or FSD for Tesla, actually increase in value through usage and learning; that these unique digital assets actually become more reliable, more predictive, more efficient, more effective, safer and consequently more valuable as they are constantly learning via every interaction and transaction.</p>
<p>“<em>The Economies of Learning are more powerful than the Economies of Scale</em>” – Bill Schmarzo</p>
<p>Yep, I can feel that <a href="https://www.linkedin.com/pulse/economic-value-learning-why-google-open-sourced-bill-schmarzo/">Nobel Prize in Economics</a> getting closer very day!</p>
<p> </p>
<p>By the way, an end to another illuminating year of teaching with Professor Mouwafac Sidaoui at the University of San Francisco (not UCSF!). The students were engaging, inquisitive and fearless. What more could I ask. These are tomorrow’s business and society leaders!<span> </span> We are in good hands.</p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/3738752973?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/3738752973?profile=RESIZE_710x" class="align-full"/></a></p>
<p><strong>Figure</strong> <strong><span>7</span></strong><strong>:<span> </span> Class of 2019 University of San Francisco Big Data MBA visiting Hitachi</strong></p>
<p></p>
<p><a name="_ftn1"><span>[1]</span></a> To learn more about RCNN and Object Detection, check out these sources:</p>
<ul>
<li><a href="https://www.analyticsvidhya.com/blog/2019/08/introduction-slimyolov3-real-time-object-detection/"><span>Real-Time Object Detection using SlimYOLOv3 - A Detailed Introduction</span></a></li>
<li><a href="https://www.analyticsvidhya.com/blog/2018/10/a-step-by-step-introduction-to-the-basic-object-detection-algorithms-part-1/"><span>A Step-by-Step Introduction to the Basic Object Detection Algorithms (Part 1)</span></a></li>
</ul>
<p> </p>