Home » Uncategorized

Why Every Data Scientist Needs A Data Engineer

This article was written by Laurel Brunk.

Data scientists spend most of their time (up to 79%!) on the part of their job they hate most.

Picture1-5

 The Role of a Data Scientist

Once an organization has a data scientist, however, what then? How do they cultivate an environment that maximizes that person’s skills and makes them want to stay?

Consider first what an average data scientist does all day:

  • Builds training sets (3% of the time)
  • Cleans and organizes data (60%)
  • Collects data sets (19%)
  • Mines for data patterns (9%)
  • Refines algorithms (4%)
  • Other (5%)

Here’s where we see just how un-sexy the role has become, because an overwhelming majority of data scientists agree that collecting data sets and cleaning and organizing them is their least favorite part of the job. Worse, collecting and organizing data has absolutely nothing to do with insights; it’s simply data preparation. It takes a high level of skill to do, but it’s not data science.

Companies could free their data scientists to spend up to 79% more of their time on analysis by having someone else prepare the data. Not only would companies derive more value from every extra moment spent on insights, but they would enable their data scientists to do what they love.

Data preparation, therefore, should be applied to the correct role—data engineer.

The Role of A Data Engineer

The need for data engineering is growing, too. In “The Rise of the Data Engineer,” Maxime Beauchemin, “data engineer extraordinaire” at Airbnb, writes about how he joined Facebook as a business intelligence engineer in 2011 and left as a data engineer two years later. The need for more complex, code-based ETL and changing data modeling drove the demand for data engineering.

So what is data engineering, exactly? It’s the act of accessing, processing, enriching, cleaning and/or otherwise orchestrating data analysis. Beauchemin puts it like this: “Data engineers build tools, infrastructure, frameworks, and services. In smaller companies — where no data infrastructure team has yet been formalized — the data engineering role may also cover the workload around setting up and operating the organization’s data infrastructure.”

In other words, data engineering alone doesn’t reveal insights; it readies your data to be analyzed reliably. By whom? The data scientist or analyst.

To read the rest of the article, click here.