Subscribe to DSC Newsletter

The Fallacies of Data Science
Adnan Masood, PhD. & David Lazar

  1. Correlation = Causation, and Big Data = Information and Insights because Data Context Doesn't Matter.
  2. The random nature of the event drives the distribution, therefore the likely distribution also drive the events.
  3. Base Rate Fallacy only applies to small data-sets.
  4. Data dredging is negatively correlated to the data-size i.e. number of spurious correlations decrease with number of dimensions of a data-set.
  5. In Data Science, past performance implies Future Results! Modeling assumptions can be held as absolute truths after experiments, and variables are normally distributed unless otherwise specified.
  6. Random sampling in experiment design and hypothesis testing is optional. Of course real world data sets don’t have Cross validation "leakage".
  7. Extrapolating beyond the range of training data, especially in the case of time series data, is fine providing the data-set is large enough.
  8. Strong Evidence is same as a Proof! Prediction intervals and confidence intervals are the same thing, just like statistical significance and practical significance.
  9. Measurement Doesn't Change the System. Increasing the number of features increases the model's significance and accuracy.
  10. Over/under-fitting of a models can be performed irrespective of bias-variance trade-off.
  11. Bonus: Renaming your Analytics dept. to Data Science dept. gives you a data science discipline & specialty overnight.

Thanks Dr. Jim Java for reading the earlier draft and providing comments


Views: 2047


You need to be a member of Data Science Central to add comments!

Join Data Science Central


  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service