Ten years ago, data was something an analyst reviewed and handed over to people who were going to use it. Now, businesses run on data, with automated processes, machine learning models, and hundreds, sometimes thousands, of people in the organization using data daily.
The data space now, with the spurring advancements in AI, has been exploding. And companies are investing heavily in data and data infrastructure, putting data to use in the business, whether it’s analytics or large language models that power different parts of the business or the customer-facing experience.
But it’s also easy to get distracted by the hype, which can impede progress. “Data scientist is kind of like a not well-defined job description,” says Dr. Ingo Mierswa, an industry-veteran computer scientist and founder of Altair RapidMiner. “It’s straightforward,” he reasons; “A software engineer writes code for programs. And if you’re into event marketing, it’s quite clear that you’ll be working on events for your organization. But for data scientists, it’s often very unclear what this person actually does.”
Is this person focusing most of the time on understanding the business, identifying additional use cases, and mapping those use cases and business problems to analytical approaches? It matters whether that person is actually sitting down and working with the dirty data to create it in a better shape—which is closer to data engineering. Or maybe they are focusing only on machine learning models and optimizing them, or taking those models and integrating them into other business applications.
What is this person going to do? Maybe there’s no machine learning involved at all, in which case the role is more like that of a traditional data analyst.
Mierswa, along with two notable data science leaders, each shares insights for both hiring decision-makers and data scientists seeking opportunities.
Keep an open mind
“Data scientists can do all of those things, but not every data scientist is equally good for all of them,” Mierswa elaborates: “When someone comes to me saying, ‘Hey, Ingo, I need a data scientist,’ the first thing I ask is, ‘Why do you need one? What is actually the problem you want to solve? What are the specific needs you have? Like things you think you may need—let’s actually have a conversation if that’s really the right profile or not?’ And one of the things I find interesting, many people fall into this trap and assume they’ll use machine learning for everything; but that’s typically not true for all the reasons. So, if you define the tasks that need to be solved, you also derive the skills you’re looking for in the process. It’s at this stage that I’m actually specifying if I’m hiring a data scientist, as I articulate these requirements. These details become part of the job description and are crucial in assessing people on those elements. Based on this,” he says, “when I’m hiring, I conduct a pretty thorough interview process.”
There are multiple stages in the interview process, as with any other job interview, but only the technical assessment holds more weight because it’s closer to what a data scientist would actually do. “I care way more about conceptual understanding, communication skills, and the fact that this data scientist can truly take a business problem and go end-to-end,” Mierswa expounds. “But most people mix things up—they hear data science and think they can play around all day with data and look for interesting stuff,” he exhorts. “No, you cannot. This is not the profile of a data scientist.”
It requires a more deliberate effort to think clearly about a specific problem. And if you need an end-to-end solution, that doesn’t mean the data scientist is the one person who delivers all of that, but is someone who can help move forward in the right direction. “Taking this into consideration,” he says, “when we go into the technical assessment, I feel much more focused on—putting more focus on, like, conversational skills, analytical thinking, critical thinking, and the understanding of data science concepts.”
Look for thinking skills
When something is in the buzz phase, magically, we encounter hype cycles — moments when certain skills or tools are fervently promoted as indispensable, encouraging people to close the skills gap. Mierswa dispels that, pointing out, “The truth is, data scientists in most cases don’t need programming skills. But let’s say one out of 50 data scientists I’m hiring is actually a coding data scientist, so this person has to have strong programming skills in addition to all the other things — communication skills, critical thinking, analytical thinking, etc.”
“It’s great if you have strong Python skills, some experience, and also if you are a good software engineer. But the good news is that it’s only one out of 50 and the other 49 data scientists don’t need any Python skills.” Humoring against the red herring, he says, “It seems like you’re setting up your data science organization in a way that you think you need to use and write code all the time. If you’re coding twice for the same problem, you’re not solving it efficiently because the true problem-solving in data science means creating reusable, not repetitive, solutions.”
I am always loving people who have a good understanding of databases, Mierswa remarks—some basic SQL always helps because that’s where your data is. And actually, I like to work with people who still have strong Excel, PowerPoint and other presentation skills. Why? Because that is going to be the format you’re going to share results with most of the time, even though some may dislike Excel.
Let’s face it, Excel is still one of the most widely used analytics tools globally. “In our particular case,” Mierswa states, “we are always looking for people with these skills, coupled with the ability to think in workflows, as that’s what data science is at the end.” It doesn’t matter if you code or if you don’t code, but you need to be able to structure your thoughts. It’s about thinking, ‘Now I’m doing this, and based on the outcome, I’ll proceed to the next step. Now I’m looping back…’—that’s what defines workflow thinking. It is mathematical intuition, like you have the mindset of somebody who understands algorithms, but you don’t have to have the technical skills of actually writing them down in code.
Workflow thinking is the technical skill you need to get ahead. And it’s also always good if you can write some Python code and build ML workflows. But this you can learn if you need it, and the rest of the type of the school of thinking—that’s something you have to bring. “I’m not investing years of time to change the way you think,” he says, “but I’m happy to invest into you to give you the right tools and train you in those tools.”
And you’d be remiss to skimp on AI. It’s a priority for data teams because Generative AI will change everything. “If you’re still not familiar with what’s going on out there and how to actually work with large language models, you have to get up to speed immediately,” Mierswa encourages every aspiring data scientist. “Set some time apart, even personal projects, and learn data science the right way—hack your way to getting up to speed on the general, in case you’re still not.”
Keep an eye on the landscape
Much of Mierswa’s optimism comes from his years of experience. More than a decade ago, he created Altair RapidMiner, a no-code data science platform, and his contributions have influenced the adoption and implementation of no-code data science and machine learning functionalities in the industry. And that’s part of why he’s for purpose bullish. “If you think you know it already, you’re in the wrong field. This is also true for software engineers, but for data scientists, in my opinion,” he makes clear, “it’s almost more true. It’s more critical because things change very very quickly in our field.”
And if you are just starting your career—you need to be hungry for learning new things. “I have zero interest,” he urges, “If you want to be a scientist, go work for a university. Don’t go work for an enterprise because for most enterprises we don’t have enough room for researchers.”
It’s important also to note that the job title ‘data scientist’ is often used broadly. But, the naked truth behind the ‘Data Scientist is a sexy Job’ hype is that there is a continental divide with a mountain range of complexity between textbooks and reality. Unlike the neatly organized and structured data often encountered in the academic sphere during learning experiences, the real-world industry data can be messy and less organized.
Varun Mandalapu, Senior Data Scientist in the Insurtech domain, says, ‘”I see many candidates having misconceptions just by relying on job descriptions, which are mostly generic, and some also out of compulsion making poor decisions, hoping to make a career change.” Having hired data scientists across various levels, he shares, “I’ve been fortunate to work on the cutting edge of AI and have seen the stark contrast between the underlying science and the art of what is possible. And for data scientists, there’s never a one-size-fits-all process, whether within a company or even within a team.”
“We solve problems,” he shares, “and it is only in practice, working with competitive people, that we identify the underlying requirements, which are essential for data teams to understand how to solve the problem or we exacerbate the problem by hiring a wrong candidate.”
“And for this reason, when I am interviewing,” he continues, “I don’t want to always talk much technical first but scan their data science portfolio. I want to see instead what impact they made either in their primary occupation or by contributing to projects.”
Shadowing Mierswa, Mandalapu also pointed out the problem of a myopic approach because each team has its unique requirements. One team might be diving into machine learning for marketing, while another is developing AI-powered products. Even within the same team, roles can differ widely — one data scientist could be working on linear ML models, while another tackles dynamic adaptive models.
There is always going to be noise and these distinctions aren’t always obvious in job postings. “It’s essential to recognize that hiring managers and data science teams should precisely articulate these needs,” he also highlights. “Any job interview is a great opportunity for candidates to showcase their data science skills but also a significantly important responsibility for the hiring managers to ensure that the role aligns with their expectations and career goals.”
When we asked about developing and retaining data science talent, Ravindra Patil, Ph.D., Senior Director of Data Science at Tredence, shared his insights on how leaders want to source and hire the right data science talent. He elaborated on the following traits he looks for when conducting interviews with candidates.
Solving Business Problems with Data Science Solutions: To hire candidates, I must be confident they can solve specific business challenges with data-driven solutions. In the interview, candidates should be able to describe how they have applied advanced data science techniques to solve industry and business problems and explain how their contributions have improved operational processes and decision-making.
Combining AI Models and Domain Expertise: One crucial trait for quality data scientists to possess is domain expertise. They understand industry trends, pain points, and enterprise goals and are able to think strategically about how to use their technical skills. Data scientists with domain knowledge can help create AI models that address real-world issues, automate tasks, enhance operational efficiency, and directly influence business decisions. They are able to do this work at a pace since business is speeding up and customer expectations are always increasing.
Using No-Code/Low-Code Tools in Entry-Level Roles: If I’m hiring for entry-level data science and AI roles, I’ll look for candidates’ experience with no-code and low-code platforms. These tools enable individuals with limited coding experience to build and deploy AI solutions more easily. But for complex projects, candidates need to possess traditional coding skills to ensure the reliability, scalability, and maintainability of the solutions.
Enabling Rapid Experimentation with Generative AI: Generative AI is now at the top of everyone’s wish list. Candidates who possess knowledge and understanding of LLMs, the inner workings of RAG, and Multimodal RAG are highly desirable. They even play an instrumental role in developing cost-optimized solutions using advanced AI solutions.
Scaling Data Across the Enterprise: Solving data problems at scale is another critical skill set data scientists must have, along with the ability to cleverly navigate data challenges using augmentation techniques. To democratize data across organizations, they must work effectively with growing data volumes, integrate both structured and unstructured data, collaborate closely with engineering teams to employ or develop cutting-edge tools for insights, ensure data accuracy, and address data sparsity issues. These capabilities empower teams to confidently make informed decisions.