Home » Uncategorized

Passive vs. active data scientists

While the skills required or expected from data scientists can vary based on the organization or domain they work in, being a data scientist can be viewed not merely as owning a set of skills, but also as having a certain mindset. In that sense I can differentiate between passive data scientists and active data scientists.

The passive data scientists will use the data they receive, or perform basic tasks to collect data stored in one or several central sources. Once they have the data, they perform rigorous tasks to analyze these data and turn them into insights. Such passive data scientists need strong skills in statistics, applied mathematics, or applied machine learning, as well as basic understanding of the environment in which they work.

The active data scientist, on the other hand, will work to learn the environment in which they work, identify all relevant sources of data, and collect all possible data that can be used to extract knowledge. Some of these data might not be structured, and in many cases the data might not even come in a digital format. As data analytics experts, they will identify the flow of all possible data, whether digital or not, and design analytic models that can analyze them. Like the passive data scientists, the active data scientists also need knowledge in statistics and machine learning, but in addition to that they also need good programming and scripting skills, ability to parse and render data, work with multiple data formats as well as structured and unstructured data, NLP, signal processing (machine vision, audio analysis), and ability to work with relational and non-relational databases. They have to understand their environment beyond merely the data to effectively identify the right data sources and design the solutions in the light of the business problems.

The training of active data scientists is much more comprehensive than the passive data scientists. It should include statistics and applied machine learning, but also substantial training in computer science such as programing, data structures, database systems, operating systems, algorithm design, signal processing, and theoretical machine learning. It requires the knowledge of a wide variety of data analytic tools that can read, process, and integrate various kinds of data.

In addition to these technical skills, they also need training in business or marketing, depends on the environment in which they work. That combination of skills is much more difficult to earn, and therefore active data scientists are rarely the product of merely academic training. Academia must take substantial actions to face that challenge and train true active data scientists.