This exercise was done to understand the software skills that are in high demand for Data Science. Analysis was done by extracting the job postings from popular online websites. The findings are interesting. R continues to be the most popular skill, found in 70% of the postings. Python follows as a close second. Surprisingly, in spite all the talk about "Big Data Science", SQL comes up third. This shows that traditional RDBMS still continue to be the base for machine learning work today.
Details of the analysis can be found here : Analysis of Data Science Job postings. Below are some highlights. The following chart gives the relative importance of skills. A frequency of 0.5 means the skill is found in 50% of the postings.
As seen, R, python and sql are the top 3 skills found. Java continues to be a favorite programming language. Interestingly, SQL triumphs hadoop in the skill list.
Association rules mining was done to find which skills occur together. The following are the results of ARM (rules) on this skill set
lhs rhs support confidence lift
1 {} => {sas} 0.3469388 0.3469388 1.0000000
2 {} => {java} 0.4081633 0.4081633 1.0000000
3 {} => {hadoop} 0.4693878 0.4693878 1.0000000
4 {} => {sql} 0.5714286 0.5714286 1.0000000
5 {} => {python} 0.6326531 0.6326531 1.0000000
6 {} => {R} 0.7142857 0.7142857 1.0000000
7 {tableau} => {R} 0.1020408 1.0000000 1.4000000
8 {javascript} => {java} 0.1224490 1.0000000 2.4500000
9 {java} => {javascript} 0.1224490 0.3000000 2.4500000
10 {javascript} => {sql} 0.1020408 0.8333333 1.4583333
11 {javascript} => {python} 0.1020408 0.8333333 1.3172043
12 {big data} => {hadoop} 0.1020408 0.7142857 1.5217391
13 {spark} => {hive} 0.1224490 0.8571429 3.2307692
14 {hive} => {spark} 0.1224490 0.4615385 3.2307692
15 {spark} => {hadoop} 0.1224490 0.8571429 1.8260870
16 {spark} => {R} 0.1020408 0.7142857 1.0000000
17 {perl} => {sql} 0.1224490 1.0000000 1.7500000
18 {perl} => {python} 0.1224490 1.0000000 1.5806452
19 {perl} => {R} 0.1020408 0.8333333 1.1666667
20 {mapreduce} => {hive} 0.1020408 0.5555556 2.0940171
21 {hive} => {mapreduce} 0.1020408 0.3846154 2.0940171
22 {mapreduce} => {hadoop} 0.1632653 0.8888889 1.8937198
23 {hadoop} => {mapreduce} 0.1632653 0.3478261 1.8937198
24 {mapreduce} => {R} 0.1224490 0.6666667 0.9333333
25 {ruby} => {java} 0.1020408 0.6250000 1.5312500
26 {ruby} => {sql} 0.1632653 1.0000000 1.7500000
27 {ruby} => {python} 0.1428571 0.8750000 1.3830645
28 {ruby} => {R} 0.1020408 0.6250000 0.8750000
29 {pig} => {hive} 0.1428571 0.7777778 2.9316239
30 {hive} => {pig} 0.1428571 0.5384615 2.9316239
Comment
Thanks for yous comments. Pulled data as RSS feeds from popular websites - datasciencecentral, kaggle, glassdoor etc. i did not filter for any countries, but given i believe most of the are US based. SQL refers to the skill in general. It is the keyword that was searched for.
Hi Kumaran - Thank you for this. Just one question: SQL as a category simply refers to the language itself, or related platforms, such as Microsoft BI Suite? (SSAS, SSIS etc.)
Hi,
Which are the websites, did you consider ?
Did you consider the job listings of all countries or some countries?
Thanks,
Rakesh
Posted 12 April 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central