Difference between data engineers and data scientists

Here's one of the main differences:

  • ETL (Extract / Load / Transform) is for data engineers, or sometimes data architects or DBA's
  • DAD (Discover / Access / Distill) is for data scientists.

Sometimes data engineers do DAD, sometimes data scientists do ETL, but it's rather rare, and when they do it, it's purely internal (the data engineer doing a bit of statistical analysis to optimize some database processes, the data scientist doing a bit of database management to manage a small, local, private database of summarized info (not used in production mode usually, though there are exceptions).

Let me explain what DAD means:

Discover: Find, identify the sources of good data, and the metrics. Sometimes request the data to be created (work with data engineers, business analysts).

Access: Access the data. Sometines via an API, a web crawler, an Internet download, a database access or sometimes in-memory within a database.

Distill: Extract essence from data, the stuff that leads to decisions, increased ROI and actions (such as determining optimum bid prices in an automated bidding system). Involves

  • exploring the data (creating a data dictionary, exploratory analysis) cleaning (removing impurities), refining (data summarization, sometimes multiple layers of summarization - or hierarchical summarization)
  • analyzing: statistical analyses (sometimes including stuff like experimental design which can take place even before the Access stage), both automated and manual. Might or might nor require statistical modeling.
  • presenting results or integrating results in some automated process

Related articles:

Views: 4758


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service