Data scientists should always ask: are there other more reasonable assumptions that explain observations?

I often see beautiful models - with wonderful logic, statistics and mathematical equations as supporting evidence of an incredible conclusion or predictive technique. Sometimes predictive results appear too good to be true yet purportedly the result of an unbiased model. When I ask about assumptions built into the model I usually get vague and shifting responses and effort to change the subject. When I insist on reviewing specific assumptions in detail the trouble begins.

I suggest models should be judged by reasonable, dubious or untestable assumptions - not only predictive results (even a broken clock is right twice a day). Simplifying assumptions usually makes them unrealistic and disconnected from the real world.

I often get the feeling a model intentionally searched for certain assumptions to create a specific result. Yet searching for assumptions that produces a desired result is not acceptable data science practice. Bad assumptions have consequences: freedom to select any assumptions allows the creation of a model to support any result.

Models can be useful to help understand complex phenomena and increase prediction accuracy within certain margins of error. Yet predictive models usually only work under certain limited circumstances for a limited time (until they do not work anymore). One should always be skeptical of the usefulness of predictive models in high causal density environments (e.g., human behavior, climate, finance...etc.).

Data scientists should use models properly: to gain understanding of complex phenomena when no real alternatives are available. All models need to be subjected to rigorous empirical tests to avoid creating an illusion of reality that leads to data science malpractice and bad consequences.

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central