Subscribe to DSC Newsletter

Is Data Scientist the right career path for you?

According to Paco Nathan, a data scientist should:

  • prepare an analysis and visualization of an unknown data set, while impatient stakeholders watch over your shoulder and ask pointed questions; be prepared to make quantitative arguments about the confidence of the results
  • describe “loss function” and “regularization term” each in 25 words or less, with a compare/contrast of several examples, and show how to structure a range of trade­offs for model transparency, predictive power, and resource requirements
  • pitch a reorg proposal to an executive staff session which implies firing some ranking people
  • interview 3­4 different departments that are hostile to your project, to tease out the metadata for datasets that they’ve been reluctant to release
  • build, test, and deploy a mission­-critical app with real­time SLAs, efficiently across a 1000+ node cluster
  • troubleshoot intermittent bugs in somebody else’s code which is at least 2000 lines long, without their assistance
  • leverage ensemble approaches to enhance a predictive model that you’re working on
  • work on a deadline in paired programming with people from 3­4 different fields completely disjoint from the work that you’ve done
  • learn to leverage the evolving Py data stack: IPython, Pandas, scikit­learn, etc.
  • learn how to lead an interdisciplinary team
  • get experience in 1+ domains outside of data/analytics/programming
  • get a good grounding in design and apply it to data visualization
  • do everything you can to become a better writer and speaker (outside of academic confs)
  • participate in meetups; publish blogs, presentations, etc. (hiring managers ignore resumes and look for published content online)
  • get a good grounding in abstract algebra, Bayesian stats, linear algebra, convex optimization
  • study up on algorithms and frameworks for streaming data (the bigger use cases on the horizon are not batch)
  • learn Scalding and functional programming with type safety
  • avoid Business Intelligence (like the plague)
  • avoid anything referred to as “The Hadoop Ecosystem” or “Hadoop as an OS”

Paco Nathan

Do you agree with this?

Vincent Granville replied and wrote: There are all sorts of data scientists. In my case, as an entrepreneur managing a company on auto-pilot (no employee, 7-digits yearly revenue with 80% margins, with significant outsourcing to vendors), none of the above test questions apply, I'd probably fail most of them, but I am a data scientist nevertheless (click here to see what I do), as well as business / growth / data hacker. 

Other links

Views: 7713

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Mark Meloon on February 9, 2015 at 9:24pm

Interesting that he doesn't include "Interview client, think deeply about their goals, then tactfully convince them that the problem they need to have solved is not the one they vehemently claim it is." That's Step Numero Uno of the CRISP-DM model, and if you don't get that correct then everything else is irrelevant. That skill is needed far more often than being able to perform paired programming with people in 34 different fields while simultaneously spinning plates and singing the alphabet song backward.

Comment by Eric on October 6, 2014 at 9:24pm

Yeah, Functional Programming is helpful in this but According to me, We can not avoid Business Intelligence, it is related to Data also in many things.

https://intellipaat.com/

Comment by Khurram on April 10, 2014 at 3:28pm

Thanks Vincent to sharing it , i know what is hadoop ,but curious to know one of the point which paco pondered that avoid anything referred to as “The Hadoop Ecosystem” or “Hadoop as an OS” , being a data scientist why should we avoid Hadoop ? cause its not the matter to architecture only ,there are several things involved with Hadoop ecosystem like Pig , MapReducer ,Hive , Oozie and many more which give you an aid to target the data and process the data to get insight.If a Data Scientis handy on such tools then you can make different experiment with different questions, isnt?

Comment by Vincent Granville on April 10, 2014 at 5:30am

Hadoop is just an implementation of distributed architecture based on file management and splitting/merging tasks into smaller ones, and get them distributed usually over several servers (cloud) or on your machine (single node). This is fundamental in computer science to process large transactional data that results in hash tables too big to fit in memory. If you hate Hadoop (the standard today), you'll have to create some sort of Hadoop of your own.

Comment by Robert Mckeown on April 10, 2014 at 2:27am

Yeah, I too am curious about the Hadoop-related recommendations. Is it due to excessive buzzwordism, or some fundamental dislike of all thing Hadoop ? .. something else?

Comment by Chris on April 8, 2014 at 12:59pm

There is no doubt that Data Science is a fascinating path, but get into it is not for every one. Am looking for a mentor shadowing my progress. I know, on line there are plenty of webinar instructions for free, i.e. http://www.gopivotal.com/big-data/webinar/stuck-in-traffic-saved-by...

but having resources access remains a crucial factor. Thx! Chris

Comment by Matt Oates on April 6, 2014 at 12:37pm

Why avoid Business Intelligence..I know they are different disciplines but I find them to be relative...distant cousins perhaps..:)

Comment by Khurram on March 31, 2014 at 6:40pm

Good work Paco , why avoiding "The Hadoop Ecosystem" or “Hadoop as an OS” ?

Comment by Kirk Borne on March 29, 2014 at 9:16am

@Vincent, I would love to get access to your algorithm. I can think of some fun things to do in my data science class with something like that!  Have you posted the algorithm anywhere for public access?  Thanks!!

Comment by Vincent Granville on March 29, 2014 at 8:29am

Hi Kirk, I updated Paco's picture: I hand-drawn it, then scanned and uploaded it. Just kidding, I used data science (an algorithm) to turn it into what looks like a hand-drawn picture. Wondering if it's possible (using data science) to identify transformed images like the one here, versus genuinely hand-drawn versions.

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service