I wrote about this long ago (see here in 2014), and so did many other practitioners. This new post shows more maturity I think, a more coherent view about the various data scientist roles in the Industry (now that things are getting more clear for most hiring managers), and how these scientists interact between themselves and with other teams. It is also a short read for the busy professional.
Source for picture: AnalyticsInsight
There are all sorts of data scientists.
- Some are BI analysts, and rarely code (they even use GUI's to access databases, so they don't even write SQL queries - the tool does that for them; however they must understand database schema.) But they are the guys that define metrics and work with management to identify data sources, or to create data. They also work on designing data dashboards / visualizations with various end-users in mind, ranging from security, finance, sales, marketing, to executives. Many have an MBA degree.
- Data engineers get the requirements from these BI analysts to set up the data pipelines, and have the data flow throughout the company and outside, with little pieces (usually summarized data) ending up on various employee laptops for analysis or reporting. They work with sys admins to set up data access, customized for each type of user. They are familiar with data warehousing, the different types of cloud infrastructure (internal, external, hybrid), and about how to optimize data transfers and storage, balancing speed with cost and security. They are very familiar with how the Internet works, as well as with data integration and standardization. They are good at programming and deploying systems that are designed by the third type of data scientists, described below. Sometimes, particularly for senior roles, they are called data architects.
- Machine learning data scientists design and monitor predictive and scoring systems, have an advanced degree, are experts in all types of data (big, small, real time, unstructured etc.) They perform a lot of algorithm design, testing, fine-tuning, and maintenance. They know how to select/compare tools and vendors, and how to decide between home-made machine learning, or tools (vendor or open source.) They usually develop prototypes or proof of concepts, that eventually get implemented in production mode by data engineers. Their programming languages of choice are Python and R. The senior ones have learned how to automate mundane tasks.
- Data analysts are junior data scientists doing a lot of number crunching, data cleaning, and working on one-time analyses and usually short-term projects. They interact with and support BI or ML data scientists. They sometimes use more advanced statistical modeling techniques.
Depending on the size of the company, these roles can overlap. Many times, an employee is given a job title that does not match what she is doing (typically, "data scientist" for a job that is actually "data analyst".)
For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on on LinkedIn, or visit my old web page here.