Home » Business Topics » Digital Disruption

The Rise of the Dual Data Scientist / Machine Learning Engineer

There are thousands of articles explaining the differences between data scientist and machine learning engineer. Data science gets broken down even further, with data analysts contrasted to researchers. Professionals skilled in all these domains are called unicorns and believed not to exist. Indeed, they may not work for companies, and ignored when applying for a job. This article explains how to become one, and the benefits that come with it — both for yourself and for employers.

The Silo Mentality

Funneling people into narrow specialized roles is present both in education and in the industry. How many professionals have a degree spanning across ML engineering, statistics, operations research and marketing, to name one of the potential combinations? Such curricula don’t exist. You would have to accumulate multiple degrees. This is cost and time prohibitive. And once in the workforce, you face the same compartmentalization. Sure you may start as a data scientist or business analyst and become MLops engineer, or the other way around. But you can’t be all at once unless you work for multiple employers. Managers and HR can not handle this. They wish such people exist, recognize the benefits. But tunnel vision and long tradition prevents this from happening. Also, HR lacks experience to detect and assess the value of a “unicorn”. Over time, unicorns don’t apply anymore.

Why you Need a Dual Engineer / Scientist?

Companies realized a while back that hiring separate R, SQL and Python programmers is inefficient. Now they want people who know the three languages, or abandon R over time. Likewise, they realize that data scientists are unable to write production code and deal with the full data pipeline. Hiring now favors MLops professionals over scientists. But many ML engineers do not understand well the statistics and business thinking behind the scene. It can result in faulty or inadequate solutions.

Why not hiring someone who masters both? I discuss later how to become one and where to find them (clue: in the job applications that you receive, for instance). Many times, a team of two is linked to the equation 1 + 1 bigger than 2. But hiring a dual engineer / scientist is an example where you can have 1 bigger than 1 + 1. I illustrate this in the next section.

These dual professionals are full stack ML engineers, or full stack analysts. A terminology first coined to define software engineers / web developers that master both back-end and front-end.

When 1 is bigger than 1 + 1

I train software engineers to add ML science to their professional background. Each has his own programming environment, and brings his own datasets to the classroom. My code, which involves complex libraries such as TensorFlow, has to work smoothly on all platforms and datasets. My background is research, but I spend more time designing scalable and generic solutions, than inventing state-of-the-art algorithms. Running SDKs in my cloud, and dealing with incompatible libraries, complex data structures, or metadata, is part of my daily routine. I hire engineers to help on occasion. But because I know both research and engineering, I am able to develop better solutions faster. The integration of engineering and research takes place in one brain (mine), not too. Intricacies in each are resolved jointly via fast human intelligence: all connections and neurons are in a single brain optimized for this dual interaction. Thus 1 is bigger than 1 + 1 here. And less expensive.

Becoming a Dual Engineer / Scientist

It took me many years to become good enough at both. You don’t need hyper-specialization in each role. The trick is to continuously practice the two for long enough. This approach is more efficient than being a scientist for 5 years, than an engineer for 5 years. You could compare it to learning two languages, like English and French: do it simultaneously from day one (at birth) and you will soon outperform — in each language — people who acquired the two languages sequentially. The issue: there are very few corporate jobs or classes offering this opportunity. But you can learn one on the job, and the other one during your leisure time, working on real projects in both. My ML science classes are also a good investment for engineers.

Then you will have more job options. Or if working as a scientist, automate your tasks like an engineer to work very little (or for multiple employers!) If you become an independent consultant or start-up founder, or even a hacker, this dual experience will be invaluable.

Where to Find These Unicorns?

I like to say that unicorns are those who excel in just one job category, not two or three. Most of my connections — executive, founders, consultants — would qualify as unicorns. In my circle, it is the norm. Also, do you really want to hire one? They can scare hiring managers who are reluctant to hire people more competent than they are. Also HR may be unable to identify them due to limited horizon, faulty applicant tracking systems, and inability to connect the dots in a unicorn resume. The rise of generative AI may help with this, leaving the task of finding and hiring unicorns to tools like GPT, trained on LinkedIn profiles and GitHub portfolios.

About the Author

vgr2-1

Vincent Granville is a pioneering data scientist and machine learning expert, founder of MLTechniques.com and co-founder of Data Science Central (acquired by  TechTarget in 2020), former VC-funded executive, author and patent owner. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, CNET, InfoSpace. Vincent is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS).  

Vincent published in Journal of Number TheoryJournal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He is also the author of “Synthetic Data and Generative AI” (Elsevier), available here. He lives  in Washington state, and enjoys doing research on stochastic processes, dynamical systems, experimental math and probabilistic number theory.