]]>

]]>

]]>

]]>

]]>

]]>

This post is based on two insightful threads I read online (References below)Based on these, I address the question of ‘The difference between Statistics and Data Science’. Traditionally, most people, including me, would say that ‘statistics came first and Data Science builds upon statistics’. This chain of thought is valid but as you see below – it misses a much bigger picture - that of emphasis. Note that - Here, we discuss a purist approach for the sake of learning. In practice, the domains and the tools are convergingThe two main differences between a purist statistical approach and a data scientist approach are:The use of Big Data (common in data science) andThe use of Inferential statistics (common in statistics). So, with this background, here are some differences in approaches from a purist statistical standpoint which differ from the typical datascience approachSmall data: We are so used to the world of big data – that we do not fully appreciate that another world exists – that of ‘small data’. But in some domains, small data is very common especially in medicine, clinical trials etc because the procedures are risky and expensive. So, it you end up with 20 or 30 samples only (small data). This leads to the greater reliance on inferential statisticsThe use of inferential statistics: Inferential statistics use a random sample of data taken from a population to describe and make inferences about the population. Inferential statistics are valuable when examination of each member of an entire population is not convenient or possible. For example, to measure the diameter of each nail that is manufactured in a mill is impractical. You can measure the diameters of a representative random sample of nails. You can use the information from the sample to make generalizations about the diameters of all of the nails. Source: minitab. Statistics makes more use of the inferential / frequentist approach because of small data sizes (as above) Increased reliance on Domain knowledge: The first two points also lead to a greater reliance on domain knowledge for statistics – for example in the choice of features.Confirmatory data analysis: Exploratory data analysis is complemented by Confirmatory data analysisIncreased reliance on Statistical tests many of which are domain specificStatistics needs interpretive models as opposed to black box models.Data science emphasises automation – in contrast to statistics which involves greater manual intervention due to the above factors (such as the increased use of domain knowledge)Handling outliers and imputation: Much greater emphasis on manual correction of outliers and imputation (missing values) To conclude, the difference in approaches originates from the use of small data. While the above is a purist approach i.e. in practice – tools and techniques across the domains are more fluid. References below (including the comments on these threads). Image source – the pioneering statistician George Box and his book the Accidental statistician – which made me think that we are all accidental statisticians! ReferencesIsaac Faber on linkedin - If I had to guess, I would say that currently there is one order of magnitude (10x) more #python users in #datascience compared to #R.Adrian-Olszewski on Quora - Why do so many statisticians not want to become data-scientists See More

]]>

]]>

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link. AnnouncementFind your Data Scientist today with a special Buy 1 and Get 50% off the second Job Posting By February 29th - Data Scientists are a rare breed and AnalyticTalent / Data Science Central is the largest community of its kind with a million+ members that engage in discussions, trends and the best practices. It is the only job board devoted to its own scientific community. Learn more and get the promo code here.Featured Resources and Technical Contributions Which masters or PhD program should I choose for Data Science or AI?Regularization in Machine LearningPopular Programming Languages: 1960 to 2020Dataframe Storage Efficiency in Python-PandasThe difference between Statistics and Data ScienceThe 10 Deep Learning Methods AI Practitioners Need to ApplyTop 19 Data Science Interview Questions for BeginnersDrawing Attention to Climate Change With Interactive Generative ArtIRIS ML Toolkit can now use IntegratedMLFeatured ArticlesHow the Use of Design Thinking Prevents Rushing into Solution Mode +Likely, unlikely, certain and impossibleAI for Retail in 2020: 12 Real-World Use CasesAdvanced Analytic Platforms – Changes in the Leaderboard 2020Driving Stakeholder Engagement with an Interactive Hypothesis Development CanvasAbility to generalize - A measure of intelligence?Cloud Adoption: Myths Enterprises Must Steer Clear OfAI-Powered Big Data and It’s Business ImpactsHow Machine Learning Is Changing the WorldPicture of the WeekSource: article flagged with a + From our SponsorsTrends in Social Network Analysis - DSC PodcastHow a Physics-Driven Analytics Platform Detects Reliability Threats - Feb 26Developing and Testing Shiny Apps - March 12How to get more of your models to productionThe Complete Guide to Data Acquisition For MLNew Books and Resources for DSC Members - [See Full List]Getting Started with TensorFlow 2.0Online Encyclopedia of Statistical ScienceStatistics -- New Foundations, Toolbox, and Machine Learning RecipesClassification and Regression In a WeekendApplied Stochastic ProcessesEnterprise AI - An Applications PerspectiveTo make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click here. Follow us: Twitter | Facebook.See More