No one “perfect” method exists for filling in missing data; You can view this one picture as a **starting point** with some suggestions, rather than an absolute. You may want to decide beforehand if you care about statistical power or uncertainty; If you do, you'll want to…

Added by Stephanie Glen on August 12, 2020 at 6:54am — No Comments

This one picture shows **what areas of calculus and linear algebra** are most useful for data scientists.

If you read any article worth its salt on the topic *Math Needed for Data Science*, you'll see calculus mentioned. Calculus (and it's closely related counterpart, linear algebra) has some very narrow (but very useful) applications to data science. If you have a decent algebra background (which I'm assuming you do, if you're a data scientist!) then you can learn…

Added by Stephanie Glen on July 31, 2020 at 9:09am — 2 Comments

P-values and critical values are so similar that they are often confused. They both do the same thing: enable you to support or reject the null hypothesis in a test. But they differ in *how* you get to make that decision. In other words, they are two different approaches to the same result. This picture sums up the p value vs critical value approaches.…

Added by Stephanie Glen on July 26, 2020 at 7:42am — No Comments

If you scour the internet for "ANOVA vs Regression", you might be confused by the results. Are they the same? Or aren't they? The answer is that they *can* be the same procedure, if you set them up to be that way. But there are differences between the two methods. This one picture sums up those differences.

Added by Stephanie Glen on July 15, 2020 at 12:13pm — No Comments

The following graphic is based on Sam Priddy's excellent DSC/Tableau Webinar How to Accelerate and Scale Your Data Science Workflows. Sam covered many interesting points for organizing, analyzing and presenting data--including which graph is best suited for different data types. This graphic is an overview of some of Sam's points. For more…

ContinueAdded by Stephanie Glen on July 8, 2020 at 9:02am — No Comments

**Math and statistics are vital components of any data scientist's tool box.** While some view statistics as a type of math, the reality is that they are completely different subjects. Math is all about numbers and concrete answers, while statistics is making sense of numbers via educated "guesses." This one picture, based on Rossman et al's essay Some Key…

Added by Stephanie Glen on June 29, 2020 at 2:30pm — No Comments

If you've spent any time with modeling data, you'll know that there are many pitfalls to be had when it comes to data presentation (I addressed some common pitfalls in Misleading Graphs Part 1). Misleading graphs can be the result of incorrect data collection, ignorance of the basic "rules" of data presentation (like labeling axes), or even deliberate attempts to mislead. A fourth…

ContinueAdded by Stephanie Glen on June 18, 2020 at 6:00am — No Comments

"Data Scientist" is 2020's equivalent of the rocket scientist of the 1950's: mysterious, sexy, and well-paid. But are you actually a "scientist"? While “data science” isn't fully defined yet as an academic subject (National Academies of Sciences, Engineering, and Medicine, 2018), more and more **evidence seems to point to it being more of an art, rather than a science. …**

Added by Stephanie Glen on June 11, 2020 at 7:00am — 2 Comments

**Misleading graphs are abound on the internet**. Sometimes they are deliberately misleading, other times the people creating the graphs don't fully understand the data they are presenting. "Classic" cases of misleading graphs include leaving out data, not labeling data properly, or skipping numbers on the vertical axis.

I came across the following misleading graphic in a…

ContinueAdded by Stephanie Glen on May 31, 2020 at 8:00am — 1 Comment

**Naming conventions are often quite different in statistics and data science**, which causes quite a bit of confusion. Part of the problem with naming conventions is that "...*data science* *is the child of statistics and computer science*” (Blei & Symth, 2017) . In essence, data science then is the child of two parents who speak different languages. In one sense, this makes the job of the data scientist not only to apply the knowledge from both…

Added by Stephanie Glen on May 24, 2020 at 12:08pm — No Comments

**Regression** and **classification** are both supervised machine learning techniques that use known data to make predictions. Where they differ is in what type of question you want answer, and how your output data is structured. For example, do you want discrete, categorical answer choices, like yes/no, or a range of possible values from 0 to 100? This one picture shows the basic differences between the two methods.…

Added by Stephanie Glen on May 17, 2020 at 5:58am — No Comments

Inference and prediction are two often confused terms, perhaps in part because they are not mutually exclusive. Both provide pieces of the "*What is data telling me*?" puzzle. In fact, **many inferential questions are raised as a result of predictions**: For example, you might *predict* how input variables X, Y, and Z affect an output variable B. Then you can…

Added by Stephanie Glen on May 10, 2020 at 6:12am — 1 Comment

There are a few **key differences between the Binomial, Poisson and Hypergeometric Distributions**. These distributions are used in data science anywhere there are dichotomous variables (like yes/no, pass/fail). This one picture sums up the major differences.…

Added by Stephanie Glen on April 30, 2020 at 9:45am — No Comments

Data mining includes statistics and elements of statistical analysis. Some people describe the two as interconnected, others as them being on a continuum. This one picture shows an overview of how statistics and…

ContinueAdded by Stephanie Glen on April 26, 2020 at 11:04am — 1 Comment

If you plug "statistics interview questions" into a search engine, you're going to get hundreds of questions and answers. And if your interview is looming in a few days, trudging through (and trying to memorize) hundreds of questions probably isn't your idea of a fun weekend. And if you're looking for that shoe in, having the perfect answer to every question might not be your best plan of attack. Why? **Because that's what everyone else is doing.**

So how do you stand out…

ContinueAdded by Stephanie Glen on April 18, 2020 at 5:10am — 2 Comments

Hypothesis testing can be an overwhelming topic to grasp if you're new to the subject. As well as dealing with all of the different terminology, you have to perform **several steps** to run a test. Even if you use software, you have decisions to make at each step, such as what you're testing in the first place and what kind of wiggle room for error you're…

Added by Stephanie Glen on April 6, 2020 at 12:58pm — No Comments

In a previous blog post, I created a flow chart showing how to choose a statistical test from a dozen different tests. While researching the article, I came across a short and sweet version which only includes four of the more basic tests:…

ContinueAdded by Stephanie Glen on March 30, 2020 at 6:30am — 1 Comment

At first glance, the Lognormal, Weibull, and Gamma distributions distributions look quite similar to each other. Selecting between the three models is "quite difficult" (Siswadi & Quesenberry) and the problem of testing which distribution is the best fit for data has been studied by a multitude of researchers.

If all the models fit the data fairly…

ContinueAdded by Stephanie Glen on March 27, 2020 at 7:30am — No Comments

If you've been keeping up on the statistics for Covid-19 in the last week (and who hasn't?), you've probably noticed a **wide variety of projections for deaths in the United States,** ranging from the "best-case" scenario (327 people) to the "doomsday" figure (2.2 million). Recent statistics published include:

**327 to 1.6 million**(Former Former CDC director Tom Frieden, cited in the…

Added by Stephanie Glen on March 21, 2020 at 8:00am — 1 Comment

My original intent with this article was to write about **how to understand statistics** in general. However, with the global pandemic on everyone's minds right now, it seems blithe to write an article on understanding statistics without a nod to current events. If you're uncomfortable or unfamiliar with statistics, you might find the facts and figures surrounding Covid-19 hard to decipher. Let's break down the key statistics into plain English and shed a little light on a few…

Added by Stephanie Glen on March 17, 2020 at 6:30am — 5 Comments

- Regression Analysis in One Picture
- Different Probability Distributions in One Picture
- Predictive Analytics Techniques in One Picture
- Machine Learning vs Statistics vs Statistical Learning in One Picture
- P-Value Explained in One Picture
- Difference Between Stratified Sampling, Cluster Sampling, and Quota Sampling
- Math vs. Statistics in One Picture

© 2020 TechTarget ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- Data Science Leadership Exchange: Best Practices for Driving Outcomes

Despite an increasing awareness of the role data science plays in successful business outcomes, data science leaders still struggle to organize, implement and communicate effective data science initiatives.

Join this latest DSC webinar and gain advice on optimizing your data management strategies. Some of the industry’s best and brightest from Bayer, S&P Global and Transamerica will be presenting their insights and experiences. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- Data Science Leadership Exchange: Best Practices for Driving Outcomes

Despite an increasing awareness of the role data science plays in successful business outcomes, data science leaders still struggle to organize, implement and communicate effective data science initiatives.

Join this latest DSC webinar and gain advice on optimizing your data management strategies. Some of the industry’s best and brightest from Bayer, S&P Global and Transamerica will be presenting their insights and experiences. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions