Whenever we make a decision in business, we test a hypothesis, no matter if it is in product, marketing or sales, at the end we make assumptions that will guide our actions. When we say that we will implement the next feature, or run this campaign we make a hypothesis that this particular action will have some positive impact to what we have set as a goal. The goal could be our revenues, our signups, the time it takes for a customer to use the product, put anything you want here.

You might wonder why I’m making these obvious statements. There’s a lot of buzz about statistics, data science machine learning etc. What these fields actually do is to codify in a scientific way processes that everyone inside a business executes. This new, more scientific, way of doing things has some serious advantages and that’s how all this hype is justified. At the end, it is just a more formal way of doing all the things that you already do. Hopefully by exposing the relevance between the already established practices and this new scientific way, it will also become more accessible and less frightening to anyone without a technical or scientific background.

Let’s start with a simple use case. Based on our metrics, we believe that when a new user sign-ups for the first time on our product, it takes a lot of time to actually figure out what to do with it. Now, this is something easily measured with a tool like Mixpanel, Segment or Intercom. We capture events that the user generates on the product and we measure the time it takes between two subsequent events. In this case, the first event would be the login event and the second could be any event/action on the product.

To improve the time it takes for the user to start using the product, we decided to introduce an intro video. So when she logs on for the first time that video explains what the product has to offer. Hopefully, by watching this video the user will be more educated about the capabilities of the product and the time it takes to start using it will be reduced.

So what a product manager would do is the following:

- Implement and deploy the playback for the video to new users
- Create a new funnel that will contain the events of login and any other subsequent event
- Measure the time it takes for a new signup to do anything
- Compare it with how long it was taking in the past

We are comparing two funnels that are either created using two different groups of people for the same period, the new signups who saw the video and those who didn’t. Or the product manager could use the same funnel for the sign ups that happened before the video was introduced.

So far the product manager has done the following:

- Made an observation: it takes a lot of time to new signups to start using the product
- Made a hypothesis: a video would reduce the time
- Took some actions: implemented the feature of a video playing when you first signup
- Figured out a way to see if it worked: created and compared funnels

There’s a very basic concept in statistics that is called “hypothesis testing”.

*“Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true.”*

Statistical hypothesis testing also has four steps, these are the following:

- State the hypothesis. Define the
*Null*&*Alternative*hypotheses. - Formulate an analysis plan. Find a test statistic suitable for your problem.
- Analyze your data. Compute the p-value based on the statistic you have chosen.
- Interpret your data. Compare the p-value to an acceptable significance value.

Now, the above might scare you a bit but at the end, it’s the formal way statistics offer to do the same steps that the product manager did. Let’s see how they relate.

1. The *Null Hypothesis* is the mean time between two funnel steps for the case where the video is not played to the new signup. The *Alternative Hypothesis* is that the mean time after the video is introduced will be smaller. This is exactly what the product manager is also measuring by comparing the two funnels we described earlier.

2. Instead of creating the funnels a statistician, based on the nature of the events that are measured, would choose a statistic and use that to calculate the p-values also for step 3. Here we are getting a bit technical but actually, it’s just a different tool that substitutes the funnels.

3. Instead of comparing the mean times directly and making a more qualitative assessment, we are using the p-value and another threshold to see if the hypothesis is likely to be true or not.

Now, we need to pay close attention to the word “likely”. When the product manager uses the funnels she is actually making a qualitative assessment of the validity of the hypothesis. It is not anything certain but if for example a big different between the mean times is observed she will feel comfortable to say that it was a right choice to introduce the video. If again the difference is marginal she will go back on the white board with the team to figure out new ways of fixing the problem they have. When we use statistics we again use the word “likely” again there’s not absolute certainty in the results, the major difference between what the product manager was doing and these statistical techniques is that this certainty (or lack of it) is quantified and produced by a totally controlled process.

Yes, there are quite a few new words introduced but hopefully, I managed to show you the affinity of the statistical techniques and the everyday life of someone who’s not a statistician. Of course, the product manager will not do the work of a data scientist and start using Chi-Square and Student’s tests or write down confidence intervals instead of product roadmaps. But at least by understanding how the arsenal of a data scientist related to our work will help us to embrace it and feel more comfortable with what it has to offer and also its limitations.

*Originaly posted at Medium by Kostas Pardalis, CEO of Blendo. *

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central