Subscribe to DSC Newsletter

Why Every Data Scientist Needs A Data Engineer

This article was written by Laurel Brunk.

Data scientists spend most of their time (up to 79%!) on the part of their job they hate most.

 The Role of a Data Scientist

Once an organization has a data scientist, however, what then? How do they cultivate an environment that maximizes that person’s skills and makes them want to stay?

Consider first what an average data scientist does all day:

  • Builds training sets (3% of the time)
  • Cleans and organizes data (60%)
  • Collects data sets (19%)
  • Mines for data patterns (9%)
  • Refines algorithms (4%)
  • Other (5%)

Here’s where we see just how un-sexy the role has become, because an overwhelming majority of data scientists agree that collecting data sets and cleaning and organizing them is their least favorite part of the job. Worse, collecting and organizing data has absolutely nothing to do with insights; it’s simply data preparation. It takes a high level of skill to do, but it’s not data science.

Companies could free their data scientists to spend up to 79% more of their time on analysis by having someone else prepare the data. Not only would companies derive more value from every extra moment spent on insights, but they would enable their data scientists to do what they love.

Data preparation, therefore, should be applied to the correct role—data engineer.

The Role of A Data Engineer

The need for data engineering is growing, too. In “The Rise of the Data Engineer,” Maxime Beauchemin, “data engineer extraordinaire” at Airbnb, writes about how he joined Facebook as a business intelligence engineer in 2011 and left as a data engineer two years later. The need for more complex, code-based ETL and changing data modeling drove the demand for data engineering.

So what is data engineering, exactly? It’s the act of accessing, processing, enriching, cleaning and/or otherwise orchestrating data analysis. Beauchemin puts it like this: “Data engineers build tools, infrastructure, frameworks, and services. In smaller companies — where no data infrastructure team has yet been formalized — the data engineering role may also cover the workload around setting up and operating the organization’s data infrastructure.”

In other words, data engineering alone doesn’t reveal insights; it readies your data to be analyzed reliably. By whom? The data scientist or analyst.

To read the rest of the article, click here.

 

 

Views: 2644

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by EDDISON HAYDEN LEWIS on November 24, 2019 at 12:29pm

Could the role of the Data Scientist synergize with the Data Engineer? 

Some Data Engineers would have progressed to the expertise of the Data Scientist.

Comment by jwork.ORG on October 21, 2019 at 2:01pm

I think "Data scientist" and "Data engineer" is the same notion. Data scientists do not discover laws, so not really "scientists" in the traditional sense. The tasks that are attributed to "data sciences" on this image are data engineering task too..

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service