Subscribe to DSC Newsletter

There has been a number of interesting articles recently, discussing the skills a data scientist should or might have. The one entitled The 22 Skills of a Data Scientist is a popular one (see 22 skills listed below, or click on the link to read the full article). Earlier this morning, I read another one on LinkedIn: Data Scientist – MUST have skills?. The picture below comes from this LinkedIn article. Some of these articles have been posted on our network, by external bloggers, for instance, skills you need to become a data scientist or Some software and skills that every Data Scientist should know. Popular ones include how to become a data scientist and Are You A Data Scientist ?

Embedded image permalink

I tend to have some level of disagreement with many of these authors. My disagreement can be summarized as follows:

  • Rather than defining data scientists by a bunch of skills that few employees possess (though many analytic executives possess all of them and more), it makes more sense to divide data scientists in multiple categories: data engineers, machine learning experts, modelers, business-oriented data scientists, researchers, domain experts, generalists etc each possessing a separate skillset. Google six categories of data scientists for details.
  • Also, you can train data scientists to have all the required skills. Colleges do a poor job at that, focusing instead on delivering silo-ed, outdated curricula, and being out of touch with the real world. Some modern 6-month training will teach the foundations for self-learners, that's the purpose of our free data science apprenticeship using a project-based approach (real-life projects), though there are other alternatives.

The 22 skills in question 

Would you add or remove some to this great list created by Matt Reany? First, I'd categorize these skills. Then, I certainly would add business acumen, domain expertise, hacking skills, presentation and listening skills, good judgment, not trusting models, ability to work in a team or with clients, all sorts of databases and file management systems, some data engineering, some data architecture and dashboard design, data detection, real time analytics, data vendor expert (vendor selection, benchmarking), be the metric expert in your company (even decide which metrics to track, how to collect the data). 

  • Algorithms (ex: computational complexity, CS theory) DD,DR
  • Back-End Programming (ex: JAVA/Rails/Objective C) DC, DD
  • Bayesian/Monte-Carlo Statistics (ex: MCMC, BUGS) DD, DR
  • Big and Distributed Data (ex: Hadoop, Map/Reduce) DB, DC, DD
  • Business (ex: management, business development, budgeting) DB
  • Classical Statistics (ex: general linear model, ANOVA) DB, DC, DR
  • Data Manipulation (ex: regexes, R, SAS, web scraping) DC, DR
  • Front-End Programming (ex: JavaScript, HTML, CSS) DC, DD
  • Graphical Models (ex: social networks, Bayes networks) DD, DR
  • Machine Learning (ex: decision trees, neural nets, SVM, clustering) DC, DD
  • Math (ex: linear algebra, real analysis, calculus) DD,DR
  • Optimization (ex: linear, integer, convex, global) DD, DR
  • Product Development (ex: design, project management) DB
  • Science (ex: experimental design, technical writing/publishing) DC, DR
  • Simulation (ex: discrete, agent-based, continuous) DD,DR
  • Spatial Statistics (ex: geographic covariates, GIS) DC, DR
  • Structured Data (ex: SQL, JSON, XML) DC, DD
  • Surveys and Marketing (ex: multinomial modeling) DC, DR
  • Systems Administration (ex: *nix, DBA, cloud tech.) DC, DD
  • Temporal Statistics (ex: forecasting, time-series analysis) DC, DR
  • Unstructured Data (ex: noSQL, text mining) DC, DD
  • Visualisation (ex: statistical graphics, mapping, web-based data‐viz) DC, DR

Related articles

Views: 17564

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent Granville on October 2, 2014 at 11:28am

Stone, I agree that you can learn many skills yourself. I learned Perl, SAS, C, C++, R, JavaScript, SQL, XML, time series, decision trees, naïve Bates, Monte-CARLO, and many more all by myself, focusing only on stuff useful for my projects. And eventually developed methodologies far more robust than what I have learned by myself or during my college and graduate years.

The funny thing with C and C++, I was in charge of the computer labs for math students during my PhD years, I was asked to make a decision on which language to use for teaching, and since none of us were modern language users in my math department (we were Fortran and Pascal people), I decided to go ahead with C, later with C++, and learn it by myself... to eventually teach it to the students. This was more than 25 years ago. I even wrote an image processing software in C during these years (for myself, but it was also used by my colleagues to process some of the digital satellite images we were working on). It was loading images in BMP format straight in memory, 20 times faster than Windows. I also used it to produce nice textures and simulate colors that can not be produced with a combination or red / green / blue, such as metallic colors like gold.

Comment by Sione Palu on October 2, 2014 at 7:44am

One of the intern we hired 6 months ago (she now work full time), submitted her CV with only 1 skills listed above (Monte-carlo simulation). She did a Physics' MSc at University College London and ended up doing a 3 month intern at CERN at the end of last year after she completed her study. She did analysis on data generated from the Higgs Boson experiment using monte carlo.  We looked at her CV and I said to my CTO,  bring her in for interview. Her CV would probably be put in the bin if she applied for job somewhere else because hers didn't  show machine learning skills or statistic skills or majority of the skills from the above 22 skills, on her CV. In the interview she wasn't convincing, but I read her thesis (which she sent as well when she made her application) and then I decided that she's worth hiring. I recommend to my CTO that this young 23 year old physicist will add value to the company. My decision was purely based on what i read on her thesis. The mathematical models (from quantum electro dynamics) on her thesis were so complex that I could finish reading her thesis, which convinced me, that anyone with a brain that can understand quantum mechanics at  very deep level, will easily learn skills outside their specialty.

When she started, she only done programming in both Mathematica & Matlab. I quickly introduced her to Java programming & Weka machine learning open source API. After 2 weeks she was able to write Java codes. Her I haven't shown her a single line of Java codes. All I did was  point out to her some links on the net for her to start learning Java. I also pointed her to Java Colt (an open source Java Linear Algebra Package from CERN) to do experiment with matrix factorization (SVD) because I intended to put her to experiment with text-mining. I gave a text mining task to do. She used Lucene & Colt (SVD algorithm) to find semantic similarities between movie items. She came up with good results which we're now extending it further for our product development. She also used WEKA for temporal pattern detection and codes she wrote, is now in our production system. There are many topics that were new to her but now she's very proficients in them.

My role is to lead the data science team. When I (or other team members) find academic papers on Google Scholar that are relevant to us, then I print copies & distribute to each one, for a quick discussion in the afternoon about prototyping to see if its better performance than our current model (according to the paper's authors') or not. This physicist can read those academic papers (whether from machine learning, data-mining, signal processing, etc,...) and grasps the concepts with not much difficulty. She can code the algorithm in the paper, etc...

So, my whole point mentioning this is, we've decided that we hire based not primarily on skillsets listed on someone's CV, but if the candidate have the ability to learn very difficult topics. This physicist in my data science team is a star performer. She kept coming up with new ideas, either from a paper in machine learning she read about or her own original ideas.

I'm looking for my next data science interns, so hopefully I get one in two weeks time. I've already sent out the ad  to local universities here in New Zealand.

Comment by Shekar Manick on September 30, 2014 at 8:56am

It's obvious that there is a wide spectrum of Data Science skills.  The more, the better - so,  even the front-end scripting can be handy, but not mandatory.  So, pragmatically, the Data Science skills can be perceived based on their nature, usage, complexity & the factors which differentiate  Data Scientists from the rest.  More appropriately, these skills can be categorized & ranked as 1. Hard-core / Essential, 2. Required / Important and 3. Peripheral / desired skills.  For instance,  Machine-Leanring/Math/Statistical modeling/R/Python - as Hard-core,    Big-Data/Hadoop/Domain - as Important  and   Front-end programming as the Peripheral.   Again, we do not need to ignore any skill-set. But, the emphasis should be on the core/essentials.

Comment by Pavan Kumar on September 30, 2014 at 7:40am

So the logical conclusion is that only a subset of these skills will possibly be relevant to a data scientist position in a given context.

These posts do seem to portray that data scientist's work is that of an one man army and not a team effort. Hope I am not entirely correct.

Comment by Vincent Granville on September 30, 2014 at 7:31am

Front-end programming is useful in some cases, e.g. if you work for a digital publisher. In my case, I manipulate HTML, Javascript, RSS feeds, XML etc. though emphasis is on automating content creation as much as possible. I don't really write HTML, but I write Perl scripts (back-end) that generate HTML content. So afterall, maybe this is back-end.

On a different note, you can do interesting stuff with Javascript, such as creating browser-independent apps, or apps that don't require Internet connection.

Comment by Shekar Manick on September 30, 2014 at 4:46am
Yes. Strictly speaking, Front-end programming/scripting is not the essential , core Data Science skill.
Comment by Pavan Kumar on September 29, 2014 at 5:39pm

how does front end programming help?

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service