Subscribe to DSC Newsletter

The Data Scientist: Elusive or Illusive?

Excitement around Big Data has significantly increased the demand for data scientists. This hot position has been hailed as the "Sexiest Job of the 21st Century" by the Harvard Business Review.

When we started in the advanced analytics space we didn't use the term 'data scientist.' We were more focused on solving problems than defining a job description. Our initial teams have evolved and adapted to challenges through hands-on experience to find the appropriate methodology for success. However, as the demand for resources grows, ensuring that we recruit individuals that will quickly add value to our teams is a constant challenge.

Quality data scientists may be elusive, but they do exist. From our experience in the trenches, we've found seven core characteristics of a successful data scientist. This is a developing view, but offers some insight into the position and the types of skills we are seeking for our team.

It's rare that any one individual is an expert in all of these areas. However, a successful data scientist understands enough to intelligently collaborate with experts that can complement knowledge and skill gaps. Below, we describe these seven core competencies and the seven key collaborators of a successful data scientist.

A good data scientist:

1. Can design a data investigation and manage towards defined objectives

...and knows an: Experienced project manager

Many 'Big Data' projects fail because they do not set clearly defined objectives. With all the hype surrounding data analytics, there's often the false assumption that simply having a lot of data will magically produce valuable results. Frequently there is also the unrealistic expectation that a vendor's black box will easily spit out valuable answers. Such flashy products can make great tools, but without a good project blueprint these tools are just tools. 

Just like a science experiment, a Big Data project requires clearly defining the problem to be addressed, and developing a targeted plan for quickly translating raw data into a solution. A data scientist also understands that the results from a complex experiment are frequently influenced by what data is studied and how it's measured – being careful to not accidentally pre-determine an outcome simply because of the way an experiment was designed. Without effective project management, the stakeholders (who are also likely holding the checkbook) will quickly become disillusioned at a lack of tangible results.

 

2. Is comfortable with the fact that most data is a complete mess

...and knows a: DBA familiar with the data sources

‘Big Data’ is largely a misnomer. Many companies have been managing huge volumes of data for years and storage technologies largely keep up with storage demand. The real challenge of Big Data is that most of it is a complete mess. Traditional quantitative analysts (aka quants) that really came into force during the 90s and early 2000s are trained mostly to work with highly structured and very clean data (aka ‘dream data’). Dumping messy data on the desks of traditional analysts has proven problematic. Most of the skill and effort in Big Data comes from parsing, cleaning, de-normalizing, re-normalizing, linking, indexing, interpreting and otherwise preparing all this messy data for analysis. A data scientist thrives in tackling this work and pulling together jumbled disorganized data in order to solve a puzzle.   

Most of the data used in a project is already stored somewhere else—be that an e-mail server, transaction database or event logs. A data scientist will need to partner with the owners of those systems and leverage an experienced DBA and/or infrastructure expert that can coordinate access to all this information and integrate it into the project’s compute environment—either through direct feeds or a separate consolidated data store.   

Click here to continue reading the remaining 5 characteristics...

Agree? Disagree? Have a different experience? 
Let us know! Post a comment or write to us directly at [email protected]

About the author: Nicholas Hartman is a Director at CKM Advisors specializing in leveraging digital data for performance improvement.

Views: 1212

Tags: Analytics, Big Data, Data Scientist, Hiring, Recruitment, Talent

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Nicholas Hartman on July 8, 2013 at 12:18pm

Thanks for your comments Doug.

Gartner's research indicates that 1) Most Big Data projects fail because their requirements are too well defined. The highest value Big Data initiatives we see are experimental, opportunistic and creative. Ultimately they zero-in on an objective, but they don't start out that way. It's the opposite with most DW/BI initiatives. We refer to this difference as "the suits vs the hoodies."

I would agree one can’t be too specific going into a project and indeed much of the value comes from the data pointing analysis in a direction that one didn’t even know previously existed. However, in our experience there is currently much less tolerance among senior executives for pure data fishing expeditions. Such tolerance certainly varies a bit by industry, but we’ve seen many institutions that experienced “disillusionment” (as Gartner puts it) after a clear value-generating objective fails to materialize from a fishing exercise run by an internal team or black box technology vendor.

At some point, and typically much sooner rather than later, the stakeholders funding the effort need to see things move towards a return on the investment. That doesn’t mean the effort can’t still be “experimental, opportunistic and creative,” but there needs to be a clearly defined reason why that specific investment is being made in the first place (e.g., to increase process efficiency, reduce purchasing costs or increase customer retention). That objective can certainly change and evolve over time, but there’s little appetite for data scientists that just want to big a huge hole in the ground because perhaps they might dig up something valuable. Once a Big Data analytical group has established a strong track record, then there's certainly an increased tolerance of more fuzzy projects. 

2) Big data increasingly is less about the volume of data than it's variety and velocity (the "3Vs" I first defined over 12 years ago: http://goo.gl/wH3qG). Any notion that "storage" or processing power alone can relegates Big Data into a misnomer, is a misconception itself. By 2:1 our clients tell us that the variety of data is the biggest challenge and biggest opportunity (over volume), and *this* is what requires strong data science skills (esp data prep & integration) that can't be accommodated merely with technology.

I don’t think there is any disagreement here. This is largely what we addressed in our second characteristic of the Data Scientist. Technology can help, but there’s certainly no substitute for strong data wrangling skills.

Thanks again.

Comment by Doug laney on July 8, 2013 at 8:33am

Interesting but I respectfully disagree. Gartner's research indicates that 1) Most Big Data projects fail because their requirements are too well defined. The highest value Big Data initiatives we see are experimental, opportunistic and creative. Ultimately they zero-in on an objective, but they don't start out that way. It's the opposite with most DW/BI initiatives. We refer to this difference as "the suits vs the hoodies." 2) Big data increasingly is less about the volume of data than it's variety and velocity (the "3Vs" I first defined over 12 years ago: http://goo.gl/wH3qG). Any notion that "storage" or processing power alone can relegates Big Data into a misnomer, is a misconception itself. By 2:1 our clients tell us that the variety of data is the biggest challenge and biggest opportunity (over volume), and *this* is what requires strong data science skills (esp data prep & integration) that can't be accommodated merely with technology.

--Doug Laney, VP Research, Gartner, @doug_laney 

Comment by Nicholas Hartman on July 8, 2013 at 6:33am

Agreed.

Certainly some of the areas (e.g., interacting with senior executives or understanding the typically fragmented ownership structure of systems and information within corporate IT) require some previous hands on experience. We generally recruit a mix of new graduates and experienced hires.    

In terms of identifying potential in new graduates, I would say we look for and discuss examples of applying analytical skills towards actual applications. That could be though many different things--doing some analytics as part of a student society/club, internships, small business operations on the side during school. It's certainly not easy to precisely define, but there's a certain x-factor between having the technical and mathematical knowledge around data analysis, via good grades, and applying that in the 'real world.' That said, sometimes one just needs to take a bit of a chance on someone and see how they work out with the team. We don't want to miss the opportunity to bring someone truly brilliant on board just because they didn't have a huge amount of hands on experience to date--although we'd want to see evidence that this is the sort of person that is eager and capable of quickly pickup up new skills. Such individuals tend to thrive when 'thrown into the deep end.' 

Thanks for your feedback.

Comment by Scott Eilerts on July 7, 2013 at 7:34am

This is a very good outline.  I find it ironic that most job postings/descriptions for "Data Scientist" do not read like this at all! 

These core traits imply a good data scientist needs to have worked for a few years to have an opportunity to both learn and then exploit these talents.  How do you identify this potential in new university graduates?

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service