Home » Technical Topics » Data Science

The Problem with Data Science Interviews

5478220098

The messiest job of the 21st century 

The interview process is likely the most daunting task a data scientist will face in their career. The pressure and competition to land a data science job are intense. On top of this, the hype around data science has resulted in mass confusion around the data science interview process. 

Data scientists have been the darling of the tech industry for about a decade now. In 2012, Harvard Business Review crowned the data scientist title the “sexiest job of the 21st century,”  setting off a hype cycle that has yet to peak.

The influx of both job seekers and employers rushing to take advantage of the hype has muddied the data science talent pool.

Seeking to transition to this exciting new career, thousands of engineers, statisticians, and analysts are repositioning themselves as data scientists. Hundreds of boot camps and certificate programs are popping up to aid them in their quest. And dozens of hiring managers are struggling to select the most promising candidates from the crushing mass of professionals seeking their fortunes.

The problem with data science interviews

This data science gold rush has downstream effects on the interview process. Confused hiring managers subject data scientists to interviews that are incongruent with the data science skillset. Either out of confusion or in an attempt to attract talent, some hiring managers rebrand data analyst and data engineering roles to data science. Red flags range from interviews focusing more on computer science algorithms than machine learning algorithms to interviews spending more time on SQL than scikit-learn. 

One or two bad interviews like this will cause a data scientist to spend hours studying the wrong subjects. So candidates end up overwhelmed by the wide array of subjects they’re expected to prove expertise in.

Compounding this, the demand for data scientists has risen faster than the supply of quality interview preparation. Interview questions (and sometimes answers) are easy to find in many places online. Yet, it’s unclear how trustworthy they are.

The solution to data science interviews

Luckily, top companies are converging on a standard data science interview process. The solution for candidates is to be fully prepared for questions on these topics:

  • Machine learning
  • Python programming
  • Data wrangling
  • Analytical problem-solving
  • Statistics
  • Culture fit

The machine learning and statistics portions are deal breakers for most companies because these disciplines form the theoretical basis of machine learning. Technical skills such as python and SQL are easier to learn, but it’s disastrous if a data scientist is weak on statistics or machine learning.

For statistics, candidates should focus on introductory level concepts such as:

  • Probability
  • Bayes theorem
  • Normal distributions
  • Central limit theorem
  • Hypothesis testing

For machine learning, candidates should know a range of topics with practical applications such as:

  • Bias-variance tradeoff
  • Curse of dimensionality
  • Cross-validation
  • Common loss functions
  • Appropriate model evaluation metrics

Big companies are almost always hiring data scientists. So, candidates control the interview timeline. It can take up to six months to prepare. 

That said, there’s no point in dragging it out. So, how long it takes to prepare for a data science interview will depend on a few factors.

1) Is this the candidate’s first data science interview? Data science interviews cover a lot of ground. They’re like engineering and analytics interviews rolled into one with some machine learning thrown in for good measure.

2) Is the interview with a big tech company? They tend to have especially rigorous interviews and a high bar for success.

3) When reviewing practice questions, how difficult was it to answer? The candidate must not only be familiar with the material but must be able to recall precise definitions and give concise answers.

Although it’s difficult to find a trusted source of quality interview preparation, some resources are starting to appear. Data scientists can accelerate their interviews with a site like Decode Data Science, which distills hundreds of interviews down into archetypal questions and answers and was created by me, a hiring manager with 8 years of experience in tech.