Subscribe to DSC Newsletter

Stephanie Glen's Blog (89)

Types of Variables in Data Science in One Picture

While there are several dozen different types of possible variables, all can be categorized into a few basic areas. This simple graphic shows you how they are related, with a few examples of each type. 

More info:…


Added by Stephanie Glen on October 17, 2020 at 4:00pm — No Comments

5 Rules of Probability in One Picture (Cat and Dog Edition)

Knowledge of the basic rules of probability is a must-have for any data scientist. But if you're a visual learner like me, learning the algebraic representations of the 5 basic rules of probability (i.e. P(A) + P(B) = 1) is a challenge. I've never been very good at memorizing formulas, but images stick in my head like ear worms.  Whenever I come across a new formula, I try to make it visual with a picture or doodle. The following picture shows some of the images I created to…


Added by Stephanie Glen on October 10, 2020 at 8:33am — No Comments

Why You Need to Know Those Probability Distributions

If you're in the beginning stages of your data science credential journey, you're either about to take (or have taken) a probability class. As part of that class, you're introduced to several different probability distributions, like the binomial distribution,…


Added by Stephanie Glen on September 30, 2020 at 3:00pm — No Comments

Correlation Coefficients in Data Science and Machine Learning (in One Picture)

In my first post on correlation coefficients, I outlined the differences between five popular coefficients: Pearson's,…


Added by Stephanie Glen on September 25, 2020 at 10:00am — No Comments

Stumped by Bayes' Theorem? Try This Simple Workaround

Bayes' Theorem formula.

Bayes' Theorem, which The Stanford Encyclopedia of Philosophy calls "...a simple mathematical formula" can be surprisingly difficult to actually solve. If you struggle with Bayesian logic, solving the "simple"…


Added by Stephanie Glen on September 15, 2020 at 11:29am — No Comments

MicroMasters: The Fast Way to Get Into Data Science

If the prospect of earning a masters degree in data science sounds too daunting (and expensive), then a MicroMasters  might be a good fit for you. A MicroMasters is a "mini" masters degree, typically comprised of four courses. The courses are offered at a fraction of the cost of a typical masters program (around a tenth of the cost), so are a great way to wet your feet and see if data science is right for you.

I'm actually enrolled in the MIT MicroMasters program…


Added by Stephanie Glen on September 9, 2020 at 8:30am — 1 Comment

Model Fitting Tests You've Probably Never Heard Of (In One Picture)

When choosing a statistical test, you generally want to go for one of the more well-known ones, like the chi-square goodness of fit test.That's because more people are going to be able to understand your results, and you have the backing of a slew of…


Added by Stephanie Glen on August 31, 2020 at 6:30am — No Comments

Real Life Applications of Logarithms in Data Science and Beyond

Ah, the logarithm. It's the black sheep of the mathematics family, loved by a few slide-rule wielding, grey-haired professors and math Olympiad competitors. For the rest of us, logarithms remain on the "I'll get back to understanding that later when I see the point" shelf. However,…


Added by Stephanie Glen on August 27, 2020 at 11:50am — No Comments

Why You Should Care About Hypothesis Statements

Last year, I posted an infographic titled "Hypothesis Tests in One Picture". But formulating a hypothesis statement can be tricky--and you need one to even start choosing tests. That's why I like this simplified graphic. …


Added by Stephanie Glen on August 20, 2020 at 8:00am — No Comments

How to Handle Missing Data

No one “perfect” method exists for filling in missing data; You can view this one picture as a starting point with some suggestions, rather than an absolute. You may want to decide beforehand if you care about statistical power or uncertainty; If you do, you'll want to…


Added by Stephanie Glen on August 12, 2020 at 6:54am — No Comments

Calculus For Data Science: What Do You Really Need to Know?

This one picture shows what areas of calculus and linear algebra are most useful for data scientists.

If you read any article worth its salt on the topic Math Needed for Data Science, you'll see calculus mentioned. Calculus (and it's closely related counterpart, linear algebra) has some very narrow (but very useful) applications to data science. If you have a decent algebra background (which I'm assuming you do, if you're a data scientist!) then you can learn…


Added by Stephanie Glen on July 31, 2020 at 9:00am — 2 Comments

P Value vs Critical Value

P-values and critical values are so similar that they are often confused. They both do the same thing: enable you to support or reject the null hypothesis in a test. But they differ in how you get to make that decision. In other words, they are two different approaches to the same result. This picture sums up the p value vs critical value approaches.…


Added by Stephanie Glen on July 26, 2020 at 7:42am — No Comments

ANOVA vs Regression in One Picture

If you scour the internet for "ANOVA vs Regression", you might be confused by the results. Are they the same? Or aren't they? The answer is that they can be the same procedure, if you set them up to be that way. But there are differences between the two methods. This one picture sums up those differences.



Added by Stephanie Glen on July 15, 2020 at 12:13pm — No Comments

How to Communicate Data

The following graphic is based on Sam Priddy's excellent DSC/Tableau Webinar How to Accelerate and Scale Your Data Science Workflows. Sam covered many interesting points for organizing, analyzing and presenting data--including which graph is best suited for different data types. This graphic is an overview of some of Sam's points. For more…


Added by Stephanie Glen on July 8, 2020 at 9:02am — No Comments

Math vs. Statistics in One Picture

Math and statistics are vital components of any data scientist's tool box. While some view statistics as a type of math, the reality is that they are completely different subjects. Math is all about numbers and concrete answers, while statistics is making sense of numbers via educated "guesses." This one picture, based on Rossman et al's essay Some Key…


Added by Stephanie Glen on June 29, 2020 at 2:30pm — No Comments

Misleading Graphs Part 2: Ladders, Spaghetti, and Other Ways to Ruin a Graph

If you've spent any time with modeling data, you'll know that there are many pitfalls to be had when it comes to data presentation (I addressed some common pitfalls in Misleading Graphs Part 1). Misleading graphs can be the result of incorrect data collection, ignorance of the basic "rules" of data presentation (like labeling axes), or even deliberate attempts to mislead. A fourth…


Added by Stephanie Glen on June 18, 2020 at 6:00am — No Comments

You're a Data Artist, not a Data Scientist

"Data Scientist" is 2020's equivalent of the rocket scientist of the 1950's: mysterious, sexy, and well-paid. But are you actually a "scientist"? While “data science” isn't fully defined yet as an academic subject (National Academies of Sciences, Engineering, and Medicine, 2018), more and more evidence seems to point to it being more of an art, rather than a science. …


Added by Stephanie Glen on June 11, 2020 at 7:00am — 2 Comments

Misleading Graphs Part 1: Avoid These Common Mistakes

Misleading graphs are abound on the internet. Sometimes they are deliberately misleading, other times the people creating the graphs don't fully understand the data they are presenting. "Classic" cases of misleading graphs include leaving out data, not labeling data properly, or skipping numbers on the vertical axis.

I came across the following misleading graphic in a…


Added by Stephanie Glen on May 31, 2020 at 8:00am — 1 Comment

Statistics Used in Data Science (A Dictionary in One Picture)

Naming conventions are often quite different in statistics and data science, which causes quite a bit of confusion. Part of the problem with naming conventions is  that " science is the child of statistics and computer science” (Blei & Symth, 2017) . In essence, data science then is the child of two parents who speak different languages. In one sense, this makes the job of the data scientist not only to apply the knowledge from both…


Added by Stephanie Glen on May 24, 2020 at 12:08pm — No Comments

Difference Between Classification and Regression in One Picture

Regression and classification are both supervised machine learning techniques that use known data to make predictions. Where they differ is in what type of question you want answer, and how your output data is structured.  For example, do you want discrete, categorical answer choices, like yes/no, or a range of possible values from 0 to 100? This one picture shows the basic differences between the two methods.…


Added by Stephanie Glen on May 17, 2020 at 5:58am — No Comments


  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service