Subscribe to DSC Newsletter

Stephanie Glen's Blog (32)

Z Test / T Test in One Picture

The following picture shows the differences between the Z Test and T Test. Not sure which one to use? Find out more here:…

Continue

Added by Stephanie Glen on August 13, 2019 at 10:30am — No Comments

Machine Learning in Hospitals: Easing Wait Times in the ER

Like many emergency rooms in the United Kingdom, the A&E department at Salford Royal NHS Foundation Trust, Greater Manchester, faces high congestion. This results in treatment delays and access issues. The Data Science team at the Northern Care Alliance (NCA) National Health Service (NHS) Group of hospitals is implementing support mechanisms to ease wait times, using machine learning and regression to…

Continue

Added by Stephanie Glen on August 5, 2019 at 5:29am — 1 Comment

Regression Analysis in One Picture

The basic idea behind regression analysis is to take a set of data and use that data to make predictions. A useful first step is to make a scatter plot to see the rough shape of your data.…

Continue

Added by Stephanie Glen on July 31, 2019 at 4:00am — No Comments

Comparing Model Evaluation Techniques Part 3: Regression Models

In my previous posts, I compared model evaluation techniques using Statistical Tools & Tests and commonly used Classification and Clustering evaluation techniques

In this post, I'll take a look at how you can compare regression models. Comparing…

Continue

Added by Stephanie Glen on July 24, 2019 at 3:12pm — No Comments

Comparing Model Evaluation Techniques Part 2: Classification and Clustering

In part 1, I compared a few model evaluation techniques that fall under the umbrella of 'general statistical tools and tests'. Here in Part 2 I compare three of the more popular model evaluation techniques for classification and clustering: confusion…

Continue

Added by Stephanie Glen on July 21, 2019 at 9:47am — No Comments

Comparing Model Evaluation Techniques Part 1: Statistical Tools & Tests

Evaluating a model is just as important as creating the model in the first place. Even if you use the most statistically sound tools to create your model, the end result may not be what you expected. Which metric you use to test your model depends on the type of data you’re working with and your comfort level with statistics.

Model evaluation techniques answer three main questions:

  1. How well does your model match your data (in other words, what is the…
Continue

Added by Stephanie Glen on July 10, 2019 at 5:30am — 1 Comment

Model evaluation techniques in one picture

The sheer number of model evaluation techniques available to asses how good your model is can be completely overwhelming.  As well as the oft-used confidence intervals, confusion matrix and…

Continue

Added by Stephanie Glen on June 29, 2019 at 7:38am — No Comments

Comparing Classifiers: Decision Trees, K-NN & Naive Bayes

A myriad of options exist for classification. In general, there isn't a single "best" option for every situation. That said, three popular classification methods— Decision Trees, k-NN & Naive Bayes—can be tweaked for practically every situation.

Overview

Naive Bayes and K-NN, are both examples of supervised learning (where the…

Continue

Added by Stephanie Glen on June 19, 2019 at 6:49am — No Comments

Assumptions of Linear Regression in One Picture

If any of the main assumptions of linear regression are violated, any results or forecasts that you glean from your data will be extremely biased, inefficient or misleading. Navigating all of the different assumptions and recommendations to identify the assumption can be overwhelming (for example, normality has more than half a dozen options for testing).
This image highlights the assumptions and the most common testing options.…
Continue

Added by Stephanie Glen on June 15, 2019 at 7:53am — No Comments

Alternatives to R-squared (with pluses and minuses)

R-squared can help you answer the question "How does my model perform, compared to a naive model?". However, r2 is far from a perfect tool. Probably the main issue is that every data set contains a certain amount of unexplainable data. R-squared can't tell the difference between the explainable and the…

Continue

Added by Stephanie Glen on June 10, 2019 at 5:30am — No Comments

R-Squared in One Picture

R-squared measures how well your data fits a regression line. More specifically, it's how much variation in the…

Continue

Added by Stephanie Glen on May 31, 2019 at 8:00am — No Comments

Cross Validation in One Picture

Cross Validation explained in one simple picture. The method shown here is k-fold cross validation, where data is split into k folds (in this example, 5 folds). Blue balls represent training data; 1/k (i.e. 1/5) balls are held back for model testing.

Monte Carlo cross validation works the same way, except that the balls would be chosen with replacement. In other words, it would be possible for a ball to appear in more than one sample.…

Continue

Added by Stephanie Glen on May 25, 2019 at 8:30am — No Comments

Confidence Intervals in One Picture

Confidence intervals (CIs) tell you how much uncertainty a statistic has. The intervals are connected to confidence levels and the two terms are easily confused, especially if you're new to statistics. Confidence Intervals in One Picture is an intro to CIs, and explains how each part interacts with margins of error and where the different components come…

Continue

Added by Stephanie Glen on May 17, 2019 at 10:00am — No Comments

The Lifecycle of Data

The lifecycle of data travels through six phases:

The lifecycle "wheel" isn't set in stone. While it's common to move through the phases in order, it's possible to move in either direction (i.e. forward, backward) at any stage in the cycle. Work can also happen in several phases at the same time, or you can skip over…

Continue

Added by Stephanie Glen on May 6, 2019 at 10:00am — 4 Comments

Determining Number of Clusters in One Picture

If you want to determine the optimal number of clusters in your analysis, you're faced with an overwhelming number of (mostly subjective) choices. Note that there's no "best" method, no "correct" k, and there isn't even a consensus as to the definition of what a "cluster" is. With that said, this picture focuses on three popular methods that should fit almost every need: Silhouette, Elbow, and Gap Statistic.…

Continue

Added by Stephanie Glen on April 28, 2019 at 12:30am — No Comments

Naive Bayes in One Picture

Naive Bayes is a deceptively simple way to find answers to probability questions that involve many inputs. For example, if you're a website owner, you might be interested to know the probability that a visitor will make a purchase. That question has a lot of "what-ifs", including time on page, pages visited, and prior visits. Naive Bayes essentially allows you to take the raw inputs (i.e. historical data), sort the data into more meaningful chunks, and input them into a formula. …

Continue

Added by Stephanie Glen on April 25, 2019 at 10:00am — No Comments

Bayes Theorem in One Picture

Bayes’ Theorem is a way to calculate conditional probability. The formula is very simple to calculate, but it can be challenging to fit the right pieces into the puzzle. The first challenge comes from defining your event (A) and test (B); The second challenge is rephrasing your question so that you can work backwards: turning P(A|B) into P(B|A). The following image shows a…

Continue

Added by Stephanie Glen on April 12, 2019 at 6:30am — No Comments

A/B testing in One Picture

A non-technical look at A/B testing, based on Dan Siroker & Pete Koomen's book, A / B Testing, The Most Powerful Way to Turn Clicks Into Customers. 

Perhaps the two most important points:

  1. Make sure you are testing a clear hypothesis. For example., "Will adding a photo to the landing page…
Continue

Added by Stephanie Glen on April 3, 2019 at 4:30pm — No Comments

Ensemble Methods in One Picture

Ensemble methods take several machine learning techniques and combine them into one predictive model. It is a two step process: 

  1. Generate the Base Learners: Choose any combination of base learners, based on accuracy and diversity. Each base learner can produce more than one predictive model, if you change variables such as case weights, guidance parameters, or input space partitions.
  2. Combine Estimates from the Base…
Continue

Added by Stephanie Glen on March 27, 2019 at 3:30pm — No Comments

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service