Home » Uncategorized

How to Raise a (Data) Scientist in the Xbox Age

Summary:  How will we convince our children to consider data science as a career?  And what will data science be like if they make that choice?

 2808310307                               Image source: regentsboosters.com

An article by this title (without the ‘Data’) appeared in December in the WSJ written by Robert Scherrer, Chairman of the Physics and Astronomy department at Vanderbilt University.  As an educator and parent he has some interesting and humorous insights into how to get young people interested in science.  Here on the verge of the New Year it set me to thinking about data science and how all you parents and near-parents might encourage your kids to get excited about our data science field.

Right now there is an explosion of employment opportunities matched by an explosion of educational programs creating what will probably end up being at least a four-fold if not a ten-fold increase in the ranks of people adopting the Data Scientist label over the next ten years.  I’m leaving aside the issue that we don’t yet have good naming conventions for ‘senior’ and ‘junior’ data scientists or labels differentiating ‘data engineers’ from ‘data scientists’.  I’m betting this food fight doesn’t get resolved any time soon.  Everyone wants to be called a data scientist.

But on a time horizon of maybe five to ten years this bubble of opportunity will have popped and it’s likely at that point there will be enough qualified DS hands to meet what will then be current demand.  Sure DS will continue to grow but not with the pressure to hire that currently exists.

And in about 10 years, many of the newly minted data scientists will have youngish children of their own and will no doubt be wondering where and how to guide them to an interesting, fulfilling, and perhaps even lucrative career.  How can we encourage them?  What will DS opportunities look like 10 or 15 years in the future?  Here’s a chance for a little speculation and little introspection.

Let’s take that last question first.  What will data science look like in 10 or 15 years?  As a place to start I turned to our own DataScienceCentral article in which five data science leaders offer their predictions about the future.  The opinions of those thinkers and my own observations led me to these conclusions:

There will still be plenty of opportunity.  I am constantly struck how data science has lagged in adoption.  Most researchers say that only about 20% of companies have embraced it.  Obviously those at the top of the food chain including Fortune 1000 and all of ecommerce are way ahead.  It may take another 10 years but eventually we’ll get close to full adoption and the market will continue to expand until that saturation is reached.

The core disciplines of the job will remain largely the same.  Whether it’s scoring models, value or volume forecasts, recommenders, NLP, image recognition, IoT, or deep learning, the data science at the core of these tools will require the same skills and learning.  Yes there will be advances in the size of data repositories, speed of access, miniaturization, and probably particularly in connectivity and AI but if you’re practicing today, you’re likely to recognize the 10-year future as pretty familiar.  (I’m guessing quantum computers are more than 10 years away.)

Potential Disruptions.  It’s always wise to keep an eye out for something that might change our world in an unforeseen way.  The ones I’d keep my eye on are ‘black box’ applications and the ‘citizen data scientist’.  Just to make sure we’re thinking of the same thing, what I mean is tools so simplified that a non-data scientist can learn to produce all those things I mentioned above (scoring models, forecasts, NLP, IoT, and similar) by mastering a tool in a relatively short period of time.  We’re not talking point-and-shoot here but maybe something on the order of complexity of MS Project.

The desire for these is largely driven by the labor shortage of qualified data scientists.  There are probably also a small minority of middle managers who really aspire to do this on their own.  I’m not talking about building an acceptable dashboard visualization.  I’m talking about a black box that creates production code that can be operationalized in a transactional system to score, predict, or recommend on the fly.

What continues to give me pause is that the developers behind this movement are solidly in the ‘good enough model’ movement.  Maybe to get people started down the path of adoption that’s sufficient.  But if you’ve paid attention to how very small increases in model fitness can leverage up into large increases in campaign ROI you know that real competitors will always be pursuing the best models, not just the good enough kind.

Also, today there are so many ways to get the wrong answer from a black box if you don’t know the data science behind the tool that I personally regard them as more than a little dangerous.  Oh, you mean I can’t use categorical variables in segmentation.  Oh, you mean I have to normalize data to run a neural net.  Oh, you mean I have to figure out what to do about all that missing data to get a decent answer.  You get what I mean.

Still, not to be a Luddite, if you look at other fields like DNA analysis, chemistry, or physics, you can see that what once looked impossible is now closing in on point-and-shoot.  I think the issue of accuracy will always be the lever that a good data scientist wields plus creative application of the tools and techniques to new business opportunities as they arise.

How to Encourage Young People to Explore Data Science

How to encourage young people to explore data science is a much tougher question.  When Professor Sherrer wrote his article he wrote of his own excitement (and dangerous experiments) with the chemistry sets that boys and girls could play with in his childhood.  He bemoaned how chemistry sets have been neutered by safety concerns to the point that it simply doesn’t appeal to a child’s sense of adventure laced with a little mischievous sense of perhaps achieving a result unintended by the set’s makers.

Compare that with the hurdle of any parent trying to get their child excited about math!  When the thrill of learning to code comes in adolescence it often comes with that same sense of naughty adventure.  Going somewhere or achieving something not intended by parents.  But that still isn’t data science.

Where does that first thrill come from?  For me it was my first exposure to regression.  I was no longer a child but I can still remember what an eye-opener it was to see that predictions about the future could be made with a little code and a little math.

Perhaps the impulse to data science doesn’t come in childhood.  Perhaps it only comes after we’ve learned a little code and a little more about the world.  That shouldn’t stop parents who are data science savvy from showing their children what’s behind the curtain of that recommendation engine, or why they are receiving the particular targeted ad that seems mysteriously prescient.

Data science is likely to continue to be a premier and exciting profession for your children as it may be for you.  It would be fun to hear about what the aha moment was for you that turned you on to DS.  Let us hear from you in the comments to this article.


December 28, 2015

Bill Vorhies
Editorial Director, DSC