Subscribe to DSC Newsletter

46 SQL Job Interview Questions for Data Scientists

This article was posted by Ankit Gupta.  

Introduction: 

If there is one language, every data science professional should know – it is SQL. SQL stands for Structured Query Language. It is a programming language used to access data from relational databases.

We conducted a skilltest to test our community on SQL and it gave 2017 a kicking start. A total of 1666 participants registered for the skilltest.

This test focuses on practical aspects and challenges people encounter while using Excel. In this article, we provide answers to the test questions. If you took the test, check out which areas need improvement. If you did not take the test, here is your opportunity to look at the questions and check your sill level independently.

Two sample questions from the skilltest

Overall Scores

Below are the distribution of scores, this will help you evaluate your performance:

You can assess your performance here. More than 700 people participated in the skilltest and the highest score was 41. Here are a few statistics about the distribution.

Overall distribution

Mean Score: 22.32

Median Score: 25

Mode Score: 27

This is an interesting distribution. I think we are seeing 3 different profiles of people here:

  • There are ~20 people who did not score at all. They either faced some technical problem or did not like the test or did not know SQL.
  • There is another population which looks to have normal distribution between scores 1 to 10. These people either started the competition late and hence could not get enough time or they are just beginners in SQL.
  • Third population looks to have distribution between scores 10 and 41 and looks like a representative of people in industry. For this group, mean is 25.8 and standard deviation ~ 6.5 . So any one with a score of more than 32 is in top 16% of the population.

To see how much did you score and where do you fit, click here. For more about SQL, click here

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 18231

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Christopher Majka on January 18, 2017 at 8:56am

It makes sense.  We are still heavily involved in the ETL process, hence the use of SQL and other querying tools, such as SAS.

Comment by Javier Alonso on January 18, 2017 at 7:16am

And I agree with you in disagreeing with me. Every tool has an use, and SQL complies perfectly with DB management, data extraction, table joining, etc. mainly when in the ETL process.

Case is, my students (managers rather than data scientists) focus in the use of graphical tools like KNIME or Orange to get fast (simple, uncomplicated) insight to their data. They are more or less past the ETL phase of analysis.

Paraphrasing the SciFi writer Robert Heinlein: "Data scientist are like butterflies... Endless variety.".

Yours,   F.J.Alonso

Comment by Christopher Majka on January 18, 2017 at 6:51am

Javier - I respectfully disagree.  We use SQL daily in our analysis.  I suppose it depends on the data being analyzed and the industry you are in.  Regards

Comment by Javier Alonso on January 17, 2017 at 3:59pm

Second afterthought:

"There are ~20 people who did not score at all. They either faced some technical problem or did not like the test or did not know SQL."

Those we call 'outliers'. You should dispose of them in some way or another before facing analysis (ETL cycle).

"There is another population which looks to have normal distribution between scores 1 to 10. These people either started the competition late and hence could not get enough time or they are just beginners in SQL."

You can't say a part of a distribution is 'normal'. Or the whole distribution is normal or it isn't. this one seems skewed, I'd say.

Third population looks to have distribution between scores 10 and 41 and looks like a representative of people in industry. For this group, mean is 25.8 and standard deviation ~ 6.5 . So any one with a score of more than 32 is in top 16% of the population.

Why do you think it is representative? Because they are the most? Are your numbers (mean, sd.) referring this segment or the whole population?

Descriptive stats are Ok. for a first glance at data, but do not lead to conclusions...

Yours,   F.J.Alonso

Comment by Javier Alonso on January 17, 2017 at 3:23pm

Afterthought:

"If there is one language, every data science professional should know – it is SQL"

Why? I learned SQL back in 1992 as a DBAdmin and user. It's been quite some time I don't use SQL anymore in data analysis. Of course, it's still a good asset in your resumé (Curriculum Vitae) but I think it's much more interesting to have a sound knowledge of R, maybe Python, and tools like Tableau, Knime and the like. Not to speak about Machine Learning, Neural nets...

I've taught many courses in enterprise data analysis and never said a word about SQL. No one complained.

Yours,   F.J.Alonso

Comment by Javier Alonso on January 17, 2017 at 2:56pm

Sorry, SQL is not a programming language. It's a query language. It is not a programming language because it doesn't comply with what we call 'Turing completeness' (it's not 'Turing complete') so it's structure does not have a semantic clausure in its context.

Besides that, "practical aspects and challenges people encounter while using Excel" has nothing to do with SQL.

Yours.  F.J.Alonso

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service