Subscribe to DSC Newsletter

Interesting article posted recently by Ferris Jumah, Data and Products guy at LinkedIn. The author concludes that data scientists typically do the following:

  • Approach data with a mathematical mindset
  • Use a common language to access, explore and model data
  • Develop strong computer science and software engineering backgrounds

My belief is that this type of conclusion applies to only one type of data science: what I call low-level data science, that is tactical - as opposed to strategic - data science. The article is definitely worth reading and very interesting, it also features top skills ranked by importance: data mining, machine learning, R, Python, Data Analysis and so on. It reminds me vaguely about another article on highest paying programming skills. I also produced a similar list a while back, and it looks pretty similar to Ferris' list. However, I believe that there is another dimension to data science, which is the decisional aspects. It is not captured by LinkedIn because data scientists rarely list these business skills in their LinkedIn profile. 

For instance, most of the skills that I use, as a data scientist, are different: domain expertise, business acumen, data intuition, use of vendor dashboards, finding the right data, making conclusions and applying results to my decision process to run a business. The systems that I develop (computational marketing, growth hacking) rely on a few principles: data-driven rather than model-driven, simplicity, robustness, scalability, efficiency, fast implementation. Some processes do not involve coding, but instead making tools communicate together, for instance

  • making Google Analytics and Google Adwords communicate to automate keyword bidding to optimize conversions and Ad spend;
  • identifying the top 10,000 most relevant keywords with significant volume, is performed without writing one line of code, but instead using vendor tools or API's;
  • same with optimizing our mix of external RSS feeds and content, using for instance some lists of top data scientists to follow on Twitter (though we also have produced our home-made list) and their most shared content on various social networks

My article on 10 types of data scientists brings a different, fresh perspective to this.

Finally, everything you can learn from a textbook (R, Python, Machine Learning, and so on) is at risk of being outsourced or automated. That's what vendors are trying to do, and myself as well with my data science research lab. So if your skill-set consists only of stuff available in textbooks, your career prospects don't look too good. 

The picture is from the original article.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 11256

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Jitin Kapila on January 8, 2015 at 10:09pm

Thank you very much for the insight. I adhere to both of you , Dr. Granville and Mr Pradyumna , as inventing our own technique as well as some random learning jumps are required for some newbie like me to grow.

Considering this it seems I have got lot more to learn. And at the end I firmly believe that a mature Data Scientist career has a foundation roots in a never ending learning process. I hope I'll land a job sometime soon so that it (learning) doesn't scare me and I can learn more while earning good

Comment by Vincent Granville on January 6, 2015 at 7:35am

Jitin, in my example (growth hacking), the techniques are taught nowhere. An advantage is that beginners know as much as experts, and you can compete successfully with the most senior professionals, even if you are a beginner. When you come up with ideas like that, you can compete with anyone. The tricky question is, how do you come up with stuff like that in the first place? I don't know. One thing that might help you is to read "my data science journey" (Google it).  It explains how I started, all the way back to my high school days.

Comment by Jitin Kapila on January 5, 2015 at 9:51pm

Well thank you for the Insight Mr. Palu and Dr.Granville, but my point here is to ask a way in which at-least a newbie like me can grow to something substantial.

Its a sort of fact that, the uncommon the idea or skill, the higher is its value. But if someone wants to learn that skill how to go about it. Even Albert Einstein had to learn the basic elementary maths sometime before giving us the Relativity Theory.

Many of the Data Scientist I see are from background of majorly Statistics, Maths or Computer Science, but someone like me for Mechanical Major with has to learn the Basics at-least before jumping into this career. In last 4 months I have learned R, SQL and Python in more details and few Regression models, some Supervised and Unsupervised Analysis tools ( k- Means, Naive Bayes, etc) but still I appears to be impossible to become a "Data Scientist".

How can one qantify that the he is a Data Scientist. As in my case, I have no certificates, no background in this field, does that mean I cant Be a "Data Scientist".

I just want to know what are those basic mandatory skills, those can be acquired via Textbooks, Sites, Blogs, Online Courses,etc , which one can learn to start his Career Journey to become a "Data Scientist".

Comment by Vincent Granville on January 5, 2015 at 1:59pm

Sione, I tend to agree with you. But innovation has more potential than replication. Let's look at this very website. It has lots of traffic, you could say it's the most visited by data scientists. Yet anyone could and do create copycats. No degree required, you can set up such a website from anywhere in the world. If it was not for some proprietary data science techniques (more specifically, growth hacking and optimized content generation), it would no longer exist, surpassed by other similar websites that grew by following a recipe found in textbooks. To put if differently, I leverage a recipe published nowhere, to grow DSC.

At the other extreme, a guy who sets up a website, offering an API to forecast stock prices based on a regression model, will compete with thousands of others and has no chance to ever make any sustainable money, unless his regression has something special, something not found in textbooks.

The contrast between this textbook guy, and me, also shows up in job security and how much we earn: much more versus much less than average. The guy using advanced textbooks can expect his revenue and career prospects to be above average, until "advanced" becomes the new "normal". When this happens, he needs to adapt to keep up, and many actually adapt well to the change. To the contrary, the guy still relying on, and clinging to college textbooks knowledge, can expect his revenue and career prospects to be below average, and decreasing over time (among all data scientists).

Some money can be made by applying old tools to new problems. But the new problems aren't described in standard data science textbooks, but maybe in a standard urban planning textbook. Connecting the dots will give you job security and nice dollars for a while. Sometimes, it means finding the right company to work for, with exciting projects such as optimizing time spent on the road at commute time, via better, data-driven urban planning. The techniques involved won't be college textbook material, but either advanced textbook level, or more frequently proprietary (shareholders and VCs are reluctant to inject money in products that can easily be copied, for which no patent can be obtained; in short, they favor innovation over replication).

Comment by Sione Palu on January 5, 2015 at 11:47am

Textbook learning is still top in my list. I believe that you refer your comment Vincent to usual classroom textbook which is common everywhere. On that point I agree, but there are recommended textbooks with new topics that learners from typical classroom textbook will never or have been exposed to, if they haven't got a research background. In include here those experts who are knowledgeable  in their fields with many years of expertise, which can be regarded as narrow.

I bet that more than 50% of members here acquired their knowledge via textbooks and that includes myself which is not a bad thing at all.  Textbooks give a learner a first step to learning higher level concepts because I think that majority of data analysts or scientists are tool users (that is they use SAS or others), but never implement algorithms from textbooks or peer review papers. I think that those are the data scientists you refer to Vincent. I learn from textbook to get a quick start on a specific topic then I dig into the literature to find out if a recent publication has come up with something that outperform the one I read on a textbook. For example, an excellent textbook (targeting signal & image processing engineers) that has been published in the last 5 years on Tensor Factorization by one of the leading researcher in the topic , Dr Cichocki.  The pseudo-codes are presented clearly including the accompanying Matlab codes for each algorithm included in the book. My team has got a copy of this book. The algorithms are not limited to  image & signal processing analysis, but to wider area as text-mining, recommender system, and so forth.

"Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation"

http://au.wiley.com/WileyCDA/WileyTitle/productCd-0470746661.html

My point here is, learning from textbook is not bad thing at all. Its only bad if the learner doesn't update or explore new concepts and a good example of that is the textbook above.  Tensor factorization is still relatively new in machine learning even the concept of Tensor calculus & multi-mode modeling has been around for over a hundred years, even Einstein used tensor calculus in his General Theory of Relativity paper he published in 1916. Its only recent in the last decade or so (apart from 1 or 2 publications in 1960s on tensors) that its application to data analysis has seen an increase in its adoptions. One can just do a search on Google scholar & see how many papers that have been published in the literature on tensors ranging from Machine Learning journals to data-mining & signal processing journals. Its a difficult topic to grasp, but definitely textbook on tensor subject helps.

Comment by Vincent Granville on January 5, 2015 at 8:38am

If you live in a country that pays very low salary for these textbook skills, you should be fine for now. Textbook skills are easy to outsource to other countries, as anyone in the world can buy the book and learn the skills. They can also be automated by anyone with the know-how. So learning how to fully automate regression would be useful. But there's no textbooks on this, that's the problem with all new technologies. Potential for high earnings is great for only so long, until the technique becomes popular.

Another option is to exploit soft skills that you might have, like sales or marketing. Or domain expertise in specific fields.

Comment by Jitin Kapila on January 5, 2015 at 2:57am

I agree to what you say Dr. Granville but the point is how can you teach or learn (as in my case ) the "high -level " data science.

I have been looking for courses and knowledge for how to and what to learn to become a data scientist...?? Kindly help...

Comment by Sione Palu on January 3, 2015 at 4:10pm

Quote "So if your skill-set consists only of stuff available in textbooks, your career prospects don't look too good"

It looks like that I fall into that category & I'm at risk in my long term career prospects! My skillsets is consists of textbook materials which is a cut-down version of the published papers in journals.

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service