This article was posted by Ankit Gupta.
Introduction:
If there is one language, every data science professional should know – it is SQL. SQL stands for Structured Query Language. It is a programming language used to access data from relational databases.
We conducted a skilltest to test our community on SQL and it gave 2017 a kicking start. A total of 1666 participants registered for the skilltest.
This test focuses on practical aspects and challenges people encounter while using Excel. In this article, we provide answers to the test questions. If you took the test, check out which areas need improvement. If you did not take the test, here is your opportunity to look at the questions and check your sill level independently.
Two sample questions from the skilltest
Below are the distribution of scores, this will help you evaluate your performance:
You can assess your performance here. More than 700 people participated in the skilltest and the highest score was 41. Here are a few statistics about the distribution.
Overall distribution
Mean Score: 22.32
Median Score: 25
Mode Score: 27
This is an interesting distribution. I think we are seeing 3 different profiles of people here:
To see how much did you score and where do you fit, click here. For more about SQL, click here.
Top DSC Resources
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge
Comment
It makes sense. We are still heavily involved in the ETL process, hence the use of SQL and other querying tools, such as SAS.
And I agree with you in disagreeing with me. Every tool has an use, and SQL complies perfectly with DB management, data extraction, table joining, etc. mainly when in the ETL process.
Case is, my students (managers rather than data scientists) focus in the use of graphical tools like KNIME or Orange to get fast (simple, uncomplicated) insight to their data. They are more or less past the ETL phase of analysis.
Paraphrasing the SciFi writer Robert Heinlein: "Data scientist are like butterflies... Endless variety.".
Yours, F.J.Alonso
Javier - I respectfully disagree. We use SQL daily in our analysis. I suppose it depends on the data being analyzed and the industry you are in. Regards
Second afterthought:
"There are ~20 people who did not score at all. They either faced some technical problem or did not like the test or did not know SQL."
Those we call 'outliers'. You should dispose of them in some way or another before facing analysis (ETL cycle).
"There is another population which looks to have normal distribution between scores 1 to 10. These people either started the competition late and hence could not get enough time or they are just beginners in SQL."
You can't say a part of a distribution is 'normal'. Or the whole distribution is normal or it isn't. this one seems skewed, I'd say.
Third population looks to have distribution between scores 10 and 41 and looks like a representative of people in industry. For this group, mean is 25.8 and standard deviation ~ 6.5 . So any one with a score of more than 32 is in top 16% of the population.
Why do you think it is representative? Because they are the most? Are your numbers (mean, sd.) referring this segment or the whole population?
Descriptive stats are Ok. for a first glance at data, but do not lead to conclusions...
Yours, F.J.Alonso
Afterthought:
"If there is one language, every data science professional should know – it is SQL"
Why? I learned SQL back in 1992 as a DBAdmin and user. It's been quite some time I don't use SQL anymore in data analysis. Of course, it's still a good asset in your resumé (Curriculum Vitae) but I think it's much more interesting to have a sound knowledge of R, maybe Python, and tools like Tableau, Knime and the like. Not to speak about Machine Learning, Neural nets...
I've taught many courses in enterprise data analysis and never said a word about SQL. No one complained.
Yours, F.J.Alonso
Sorry, SQL is not a programming language. It's a query language. It is not a programming language because it doesn't comply with what we call 'Turing completeness' (it's not 'Turing complete') so it's structure does not have a semantic clausure in its context.
Besides that, "practical aspects and challenges people encounter while using Excel" has nothing to do with SQL.
Yours. F.J.Alonso
Posted 1 March 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central