Discover, Access, Distill: The Essence of Data Science

Here's one of the main differences between data engineering and data science: ETL (Extract / Load / Transform) is for data engineers, or sometimes data architects or DBA's.

DAD (Discover / Access / Distill) is for data scientists. Sometimes data engineers do DAD, sometimes data scientists do ETL, but it's rather rare, and when they do it, it's purely internal (the data engineer doing a bit of statistical analysis to optimize some database processes, the data scientist doing a bit of database management to manage a small, local, private database of summarized info (not used in production mode usually, though there are exceptions).

What DAD means:

  • Discover: Find, identify the sources of good data, and the metrics. Sometimes requires the data to be created (work with data engineers, business analysts).
  • Access: Access the data. Sometines via an API, a web crawler, an Internet download, a database access or sometimes in-memory within a database.
  • Distill: Extract essence from data, the stuff that leads to decisions, increased ROI and actions (such as determining optimum bid prices in an automated bidding system). Involves exploring the data (creating a data dictionary, exploratory analysis), cleaning (removing impurities), refining (data summarization, sometimes multiple layers of summarization - or hierarchical summarization), and analyzing: statistical analyses (sometimes including stuff like experimental design which can take place even before the Access stage), both automated and manual.

The last step might or might nor require: statistical modeling (many predictors are now model-independent), presenting results to management (less important if the purpose is to design a machine-to-machine communication system, instead a proof-of-concept or prototype might be required first), or integrating results in some automated process. Documenting is always part of all these steps.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 2247


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service