There has been a number of interesting articles recently, discussing the skills a data scientist should or might have. The one entitled The 22 Skills of a Data Scientist is a popular one (see 22 skills listed below, or click on the link to read the full article). Earlier this morning, I read another one on LinkedIn: Data Scientist – MUST have skills?. The picture below comes from this LinkedIn article. Some of these articles have been posted on our network, by external bloggers, for instance, skills you need to become a data scientist or Some software and skills that every Data Scientist should know. Popular ones include how to become a data scientist and Are You A Data Scientist ?
I tend to have some level of disagreement with many of these authors. My disagreement can be summarized as follows:
The 22 skills in question
Would you add or remove some to this great list created by Matt Reany? First, I'd categorize these skills. Then, I certainly would add business acumen, domain expertise, hacking skills, presentation and listening skills, good judgment, not trusting models, ability to work in a team or with clients, all sorts of databases and file management systems, some data engineering, some data architecture and dashboard design, data detection, real time analytics, data vendor expert (vendor selection, benchmarking), be the metric expert in your company (even decide which metrics to track, how to collect the data).
Related articles
Comment
Stone, I agree that you can learn many skills yourself. I learned Perl, SAS, C, C++, R, JavaScript, SQL, XML, time series, decision trees, naïve Bates, Monte-CARLO, and many more all by myself, focusing only on stuff useful for my projects. And eventually developed methodologies far more robust than what I have learned by myself or during my college and graduate years.
The funny thing with C and C++, I was in charge of the computer labs for math students during my PhD years, I was asked to make a decision on which language to use for teaching, and since none of us were modern language users in my math department (we were Fortran and Pascal people), I decided to go ahead with C, later with C++, and learn it by myself... to eventually teach it to the students. This was more than 25 years ago. I even wrote an image processing software in C during these years (for myself, but it was also used by my colleagues to process some of the digital satellite images we were working on). It was loading images in BMP format straight in memory, 20 times faster than Windows. I also used it to produce nice textures and simulate colors that can not be produced with a combination or red / green / blue, such as metallic colors like gold.
One of the intern we hired 6 months ago (she now work full time), submitted her CV with only 1 skills listed above (Monte-carlo simulation). She did a Physics' MSc at University College London and ended up doing a 3 month intern at CERN at the end of last year after she completed her study. She did analysis on data generated from the Higgs Boson experiment using monte carlo. We looked at her CV and I said to my CTO, bring her in for interview. Her CV would probably be put in the bin if she applied for job somewhere else because hers didn't show machine learning skills or statistic skills or majority of the skills from the above 22 skills, on her CV. In the interview she wasn't convincing, but I read her thesis (which she sent as well when she made her application) and then I decided that she's worth hiring. I recommend to my CTO that this young 23 year old physicist will add value to the company. My decision was purely based on what i read on her thesis. The mathematical models (from quantum electro dynamics) on her thesis were so complex that I could finish reading her thesis, which convinced me, that anyone with a brain that can understand quantum mechanics at very deep level, will easily learn skills outside their specialty.
When she started, she only done programming in both Mathematica & Matlab. I quickly introduced her to Java programming & Weka machine learning open source API. After 2 weeks she was able to write Java codes. Her I haven't shown her a single line of Java codes. All I did was point out to her some links on the net for her to start learning Java. I also pointed her to Java Colt (an open source Java Linear Algebra Package from CERN) to do experiment with matrix factorization (SVD) because I intended to put her to experiment with text-mining. I gave a text mining task to do. She used Lucene & Colt (SVD algorithm) to find semantic similarities between movie items. She came up with good results which we're now extending it further for our product development. She also used WEKA for temporal pattern detection and codes she wrote, is now in our production system. There are many topics that were new to her but now she's very proficients in them.
My role is to lead the data science team. When I (or other team members) find academic papers on Google Scholar that are relevant to us, then I print copies & distribute to each one, for a quick discussion in the afternoon about prototyping to see if its better performance than our current model (according to the paper's authors') or not. This physicist can read those academic papers (whether from machine learning, data-mining, signal processing, etc,...) and grasps the concepts with not much difficulty. She can code the algorithm in the paper, etc...
So, my whole point mentioning this is, we've decided that we hire based not primarily on skillsets listed on someone's CV, but if the candidate have the ability to learn very difficult topics. This physicist in my data science team is a star performer. She kept coming up with new ideas, either from a paper in machine learning she read about or her own original ideas.
I'm looking for my next data science interns, so hopefully I get one in two weeks time. I've already sent out the ad to local universities here in New Zealand.
It's obvious that there is a wide spectrum of Data Science skills. The more, the better - so, even the front-end scripting can be handy, but not mandatory. So, pragmatically, the Data Science skills can be perceived based on their nature, usage, complexity & the factors which differentiate Data Scientists from the rest. More appropriately, these skills can be categorized & ranked as 1. Hard-core / Essential, 2. Required / Important and 3. Peripheral / desired skills. For instance, Machine-Leanring/Math/Statistical modeling/R/Python - as Hard-core, Big-Data/Hadoop/Domain - as Important and Front-end programming as the Peripheral. Again, we do not need to ignore any skill-set. But, the emphasis should be on the core/essentials.
So the logical conclusion is that only a subset of these skills will possibly be relevant to a data scientist position in a given context.
These posts do seem to portray that data scientist's work is that of an one man army and not a team effort. Hope I am not entirely correct.
Front-end programming is useful in some cases, e.g. if you work for a digital publisher. In my case, I manipulate HTML, Javascript, RSS feeds, XML etc. though emphasis is on automating content creation as much as possible. I don't really write HTML, but I write Perl scripts (back-end) that generate HTML content. So afterall, maybe this is back-end.
On a different note, you can do interesting stuff with Javascript, such as creating browser-independent apps, or apps that don't require Internet connection.
how does front end programming help?
Posted 12 April 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central