“*What if we add these variables?..*” is a deadly type of a question that can ruin your analytic project. Now, while curiosity is the best friend of a data scientist, there’s a curse that comes with it – some call it *analysis paralysis*, others – just *over-analysis*, but I call these situations “*analytic rabbit holes*”. As you start any data science project – be it an in-depth statistical research, machine learning model, or a simple business analysis – there are certain steps that are always involved. Some sources make them more granular, some make them more general but this view makes the most sense from a real-world business perspective.

The process goes as follows: a data scientist defines a hypothesis, then explores the data, gains insights into the data that help explain the hypothesis better. After this step the loop begins – a new information allows to refine the hypothesis and start “digging deeper” while repeating the data exploration, insight generation and… re-refining the hypothesis again. This is where the loop starts and it’s important to be conscious about it from the very beginning. Falling into an analytic rabbit hole starts here if one thing isn’t defined – a supported decision.

If the decision is not defined or it’s not the main goal of the analytic investigation – the project will go down the drains to the rabbit hole. Why? Because the over-analysis begins when the data scientist starts focusing on the hypothesis instead of the decision. While the two might look very similar, in reality this makes a fundamental difference between a successful data science project and an “analytic rabbit hole”. I am going to describe the two approaches and how one leads to success while the other is doomed to fail.

**Hypothesis-focused. **As the data exploration goes, the hypothesis is constantly refined and new insights are discovered. The curse of this process is that since the goal is to find the perfect answer or a solution to the hypothesis a data scientist will fall for many traps such as spurious correlations where relationship between un-related though correlated variables are discovered. Eventually the breadth of ways of analyzing and cutting through the data start having their side effect – the hypothesis is broken out into sub-segments each of which have a series of data points, assumptions and conflicting conclusions of their own. A typical end for this project is a happy data scientist presenting these immense findings to a non-technical team who get lost in the details faster than the data scientist starts explaining a second bullet-point. A question that knocks this effort down goes something like this – “can we do something about it?” That’s it. Weeks spent and one question derails the whole effort.

**Decision-focused. **The focus of this exploration is to find ways to influence and improve a decision. And to test whether it moves the needle as soon as possible. Then and only then a hypothesis can be refined. This doesn’t close the analytic loop, but it ensures that the focus of the data scientist is to discover insights that can improve the impact of the underlying decision. In this case the focus is on how the project’s output impacts the environment, and both the data scientist and the business can learn from the response the environment has to the data-refined actions. Hypothesis testing without any actual intervention that uses the generated generated is a perfect example of an analytic rabbit hole.

While this may sound very trivial, the amount of time data scientists waste on hypothesis-focused projects is incredibly high. If this hypothesis-focused philosophy is left unchallenged it might even ruin their careers, while others can end the trust put into the data science department. And believe me – it’s very tempting to wake up your inner geek and fall into the analytic rabbit hole trap every time you are handed with a very cool and interesting hypothesis.

Data scientist’s inner gut feeling tells that the main task of the job is to answer complex questions and gain in-depth insights. While in reality it’s all about solving problems – and the only way to solve a problem is to act on it. Our goal as data scientists is to support tough & complex decisions with actionable data-based recommendations. We are the ultimate internal consultants that drive actions through insights. And action with some insights is always better than no action with all the insights there can be discovered. So never forget to ask yourself a question – “what is the decision that this analysis supports?” It might save the project and maybe even your career as a data scientist.

For Original Article, click here

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central