Bayes’ Theorem is a way to calculate conditional probability. The formula is very simple to calculate, but it can be challenging to fit the right pieces into the puzzle. The first challenge comes from defining your event (A) and test (B); The second challenge is rephrasing your question so that you can work backwards: turning P(A|B) into P(B|A). The following image shows a…

ContinueAdded by Stephanie Glen on April 12, 2019 at 6:30am — No Comments

A non-technical look at A/B testing, based on Dan Siroker & Pete Koomen's book, *A / B Testing, The Most Powerful Way to Turn Clicks Into Customers. *

Perhaps the two most important points:

**Make sure you are testing a clear hypothesis.**For example., "Will adding a photo to the landing page…

Added by Stephanie Glen on April 3, 2019 at 4:30pm — No Comments

Ensemble methods take several machine learning techniques and combine them into one predictive model. It is a two step process:

**Generate the Base Learners:**Choose any combination of base learners, based on accuracy and diversity. Each base learner can produce more than one predictive model, if you change variables such as case weights, guidance parameters, or input space partitions.**Combine Estimates from the Base…**

Added by Stephanie Glen on March 27, 2019 at 3:30pm — No Comments

SVMs (Support Vector Machines) are a way to classify data by finding the optimal plane or hyperplane that separates the data. In 2D, the separation is a plane; In higher dimensions, it's a hyperplane. For simplicity, the following picture shows how SVM works for a two-dimensional set.

*Click on picture to zoom…*

Added by Stephanie Glen on March 25, 2019 at 10:30am — 1 Comment

Logistic regression is regressing data to a line (i.e. finding an average of sorts) so you can fit data to a particular equation and make predictions for your data. This type of regression is a good choice when modeling binary variables, which happen frequently in real life (e.g. work or don't work, marry or don't marry, buy a house or rent...). The logistic regression model is…

ContinueAdded by Stephanie Glen on March 22, 2019 at 11:30am — No Comments

This is a simple overview of the k-NN process. Perhaps the most challenging step is finding a *k* that's "just right". The square root of n can put you in the ballpark, but ideally you should use a training set (i.e. a nicely categorized set) to find a "*k*" that works for your data. Remove a few categorized data points and make them "unknowns", testing a few values for *k* to see what works.…

Added by Stephanie Glen on March 18, 2019 at 10:30am — 1 Comment

Determining sample sizes is a challenging undertaking. For simplicity, I've limited this picture to the one of the most common testing situation: **testing for differences in means**. Some assumptions have been made (for example, normality and…

Added by Stephanie Glen on March 17, 2019 at 7:00am — 4 Comments

The EM algorithm finds maximum-likelihood estimates for model parameters when you have incomplete data. The "E-Step" finds probabilities for the assignment of data points, based on a set of hypothesized probability…

ContinueAdded by Stephanie Glen on March 12, 2019 at 7:00pm — 1 Comment

With a ROC curve, you're trying to find a good model that optimizes the trade off between the False Positive Rate (FPR) and…

ContinueAdded by Stephanie Glen on March 9, 2019 at 9:00am — No Comments

There are dozens of different hypothesis tests, so choosing one can be a little overwhelming. The good news is that one of the more popular tests will usually do the trick--unless you have unusual data or are working within very specific guidelines (i.e. in medical research). The following picture shows several tests for a single population, and what…

ContinueAdded by Stephanie Glen on March 7, 2019 at 7:30am — No Comments

In the nascent field of Data Science, myths are abound. Here's my top 10, scoured from the internet (where better than to find a myth or two?).

This one is only *part* myth. Historically, women have been discouraged from entering the computing sciences for many reasons unrelated to talent (see my previous post,…

Added by Stephanie Glen on March 2, 2019 at 6:58am — 3 Comments

Whether you can call yourself a data scientist if you can't code is as hotly debated as Brexit. Type the question "Can you be a Data Scientist without coding?" into Google and you'll get a hundred different answers. The opinion will vary wildly depending on whether the author is a coder, or a non-coder. Search the job listings, and you won't find a definitive answer there either. A Glassdoor survey on the …

ContinueAdded by Stephanie Glen on February 23, 2019 at 10:30am — 2 Comments

Grab a copy of The Elements of Statistical Learning* (**"*the machine learning bible") and you might be a little overwhelmed by the mathematics. For example, this equation (p.34), for a cubic smoothing spline, might send shivers down your spine if math isn't your forte:…

Added by Stephanie Glen on February 17, 2019 at 7:34am — 2 Comments

I'm a female mathematician, statistician, and data scientist. How did I get to be all of those things? I wish I could say "I did well in school" or "I always loved computers." But the reality is, in middle/high school, I was bored stiff. I excelled in art, skipping classes, and bombing exams. At 16, I dropped out of school to begin an illustrious career in office cleaning.

The odds were stacked against me entering the computing industry for many…

ContinueAdded by Stephanie Glen on February 8, 2019 at 8:30am — 3 Comments

**Logistic regression (LR)** models estimate the probability of a binary response, based on one or more predictor variables. Unlike linear regression models, the dependent variables are categorical. LR has become very popular, perhaps because of the wide availability of the procedure in software. Although LR is a good choice for many situations, it doesn't work well for *all* situations. For example:

- In propensity score analysis where there are many…

Added by Stephanie Glen on February 2, 2019 at 6:55am — No Comments

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions