Home » Uncategorized

Bridging the Skills Gap of New York City ~ A Case Study using Tableau and MySQL

Data Analytics remains incomplete without data visualization.

In a Data Analytics course, I understood how Tableau was a useful tool so as to create and explain a visual story that heavily relied on big data. As a student, I was given x-cases which required data retrieval, cleaning, manipulation, and analysis so as to make appropriate recommendations.

To start with, I have a double major in finance and international banking and had some knowledge of data analysis. The features of Tableau were not only quick to understand but also simple to use. The interesting journey that quickly graduated from Excel to R to SQL and finally to Tableau was a great experience and after my course, I was excited to share a project that heavily relied on Tableau.

I divided the project into three sections

  • The Question
  • The Answer
    • The Process
    • The Findings
  • The Limitations
  • The Conclusion

I first looked at the problem and understood what the data was lacking and tried to retrieve and create an entity-relationship diagram in My SQL Workbench. Since the focus was NYC-MSA region and later was to trim it to only NYC, I had to ensure that the data I sifted only reflected the problem in question. My findings are based on the data and the way it has been sorted and cleaned. As per the retrieved data, I noticed a mismatch in supply and demand of skills across states. My Tableau worksheet also looked at the average posting duration across 11 states which I kept to 35 days.

Summarising below is the graph made in Tableau which looked at the average duration(in days) and compared it to the percentage of job postings across ALL states, for which data was provided.

Jobs Posting Across all states.png


A recent Brookings study of the Burning Glass data found that nationally, the median duration of advertising for a STEM vacancy is more than twice as long as for a non-STEM vacancy. The case revolves around the ongoing debate of education and a leading labor market analytics firm – Burning Glass. The case aimed to –

The case aimed to –

  • To evaluate NYC skills gap
  • Provide data-driven recommendations to NYC skills coalition  (NYCSC)
  • Help in the allocation of $100 million over ten years for development of workforce in the city of New York.

According to The National Federation of Independent Business:

As of first-quarter 2017, 45% of small businesses reported that they were unable to find qualified applicants to fill job openings.


In order to evaluate city’s skills, a detailed roadmap would require collaborative efforts by federal, state and private foundations. In order to answer this question, I first process the data and then look at suggestive measures based on the findings.


Before diving into the data set, I first try to understand the meaning of skills gap and why it exists. Simply put it is the difference between demand and supply of jobs available and the necessary skill set required in an ideal candidate. The gaps tend to sometimes exist due to various factors including insufficient jobs available and lack of proper skill set.


The data is in CSV format and has different occupations, skills, counties. Duplicates are removed and N/A values are converted to NULL values.

Using MySql Workbench, an entity-relationship diagram (ERD) is created so as to sort and join data using primary and foreign keys.



After creating the ERD, I look at the information limited to only New York City. The five counties in NYC are Bronx, Kings, New York, Queens, and Richmond. I use R to perform this task and with the available data, I create a heat map using Tableau. The data shows that the number of job postings is largest in NYC, I then, focus on the percentage of job postings across the tri-state of New York, New Jersey, and Pennsylvania to analyze where New York stands.

These findings in Tableau are as given below


I then try to take a deep dive into how long a certain job posting takes to get filled. For this, I compare no. of Job Postings to average job posting duration by occupation (only for New York City) My findings show that New York City’s gap is primarily driven by a scarcity of workers with certain skills. The MSA region code is 234.


The top jobs in NYC come from three industries namely: Healthcare, Information Technology, and Finance & Accounting. Using Table I try to create a Bubble Chart to look at the top jobs. (click the below image to enlarge)


The Top jobs in NYC are then deeply analyzed and I attempt to do this by compiling the experience level, education, name of the certificate. Correspondingly, I look at the number of jobs and the average number of days since they have been posted and not filled. The findings are mentioned below


The healthcare industry, it seems has the highest demand for nursing managers and registered nurses and the certificates required for these are listed below. A high chance is that the training gaps could be due to the fact that the nursing managers and registered nurses do not have the necessary certificates that make them ineligible for the job posting.

(Click Below to Enlarge the Image)



Looking into the available data and simplifying the visuals through Tableau, my findings suggested that there is a huge demand for entry-level financial analysts with a background in accounting and for tax managers with 3-5 years of work experience. Looking closely, Certified Public Accountant and Certified Financial Analyst Course are popular choices for Financial Analysts looking to bridge the skills gap.

(Click Below to Enlarge the Image)



The third industry that could benefit from the funding is the information technology industry.

Through data sorting and with the help of Tableau, I was able to visualize the top job titles that could benefit are Business Intelligence Analyst, IT Project Manager, Software Developer/Engineer, and Systems Analyst. Such demand for these titles means two things: there is either a huge demand which is also being met as quickly or there is a lack of skills which may have prompted the hiring to be slow.

In case of IT industry, the latter seems to be true. Looking closely, it can be concluded that certifications like Series 7, Project Management Certification (PMP) etc will add value to the resume of the job titles that are high in demand. While this may be true for people in their mid or senior level, gaining such certificates at an early stage could put a deserving candidate at the top of the hiring list.

(Click Below to Enlarge the Image)


But the million dollar question remains: what after a skills gap analysis and what are the suggestive measures that local authorities can take in order to bridge the gap and allow the employment rate to go up in the city of New York. After a thorough literature review, I was able to design a couple of steps as suggestive measures that could enable corporates, government as well as non-profit organizations to get more involved.


denotes that pay competitive salaries may be one suggestive measure but the data used may/may not support that. The measure is based on extensive secondary literature available.


Like most cases, the data provided suffers from some limitations which may/may not ascertain the above measures. One of the biggest limitations considered would be the year of data. Since there is no year provided, the findings could mean that it does not hold true for 2016 or 2017. Also while cleaning and sorting, there was no way to capture how many incumbents, unemployed or out of the labor force workers had requisite skills to fill the in-demand jobs.  Prudent decisions can be made when the unstructured data after cleaning contains the necessary components needed to form a strong analysis. Looking at the data, the salary was not taken into account since it was sparse and a lot was missing simply due to the missing dates. The recommendations could also vary since there is no real-time information or additional information that may have likely affected the data set.

Sometimes changes in administration allow new labor laws and regulations to kick in. The recommendations given through the data sent could also vary since there is no accountability for real-time information or additional information. I also looked at the entire city of New York City but am sure there is room for future analysis where looking at individual counties may show variation in the skills gap.



 Originally Published on Tableau Community