Home » Technical Topics » Machine Learning

Explaining Data Science to a Non-Data Scientist

Summary:  Explaining data science to a non-data scientist isn’t as easy as it sounds.  You may know a lot about math, tools, techniques, data, and computer architecture but the question is how do you explain this briefly without getting buried in the detail.  You might try this approach.

5521687477We’ve all been there.  You’re at a party or maybe striking up a conversation with that pretty girl at the bar and sooner or later the question comes up, “what do you do?”  Since you have what is reported to be the sexiest job in the world you proudly respond “I’m a data scientist”.

OK, what happens next depends on exactly what you say.  Do your fellow party goers hang on your every word in anticipation?  Do you, as they say, get the pretty girl’s digits?  You respond:

“I’m working with deep neural nets with dozens of hidden layers on cloud based TPUs using Tensorflow.  Right now I’m working to put bounding boxes around images of people so I can create multi-class deep learning models to predict their…”

Never mind.  Your host’s eyes have glazed over.  The cute girl has turned to the guy on her other side who looks like a personal trainer at your gym.  TMI! TMI!  How do you keep it simple, brief, and still explain to a non-data scientist the essence of what you do without losing their interest in the first dozen words.  Next time you vow to keep it simple.

The next party comes.  You think, OK I’ll skip the specifics and just talk about the categories of tools that I use.  After the obligatory “I’m a data scientist” you continue:

“I use mathematical algorithms to answer questions in ranking, recommendation, classification, regression, clustering, and anomaly detection.  First we gather up massive data sets about the question we want to answer.  Getting that data and getting it ready for the algorithms is a whole different conversation.  But the fun part begins when I start creating models and testing them with different optimization methods like stochastic gradient descent to see which one is most accurate.  Then I score the unseen data…”

Never mind.  Same result.

After several years of trying, I’ve settled on a very simple explanation based mostly on Brandon Roher’s remarkable 2015 five-question explanation of machine learning.  Even with the additional complexity of Big Data and deep learning this is the explanation I’ve found most successful.  It basically has three parts following “I’m a data scientist”.

Part 1 You’re a Wizard

I help people answer question or make predictions about what will happen in the future.  So data scientists are kind of like fortune tellers except that we do it with math and data.  And most important, unlike fortune tellers we can get the right answer pretty often.

Keep in mind that 50% accuracy is the same as a coin toss, so generally we’re pretty happy when we get the answer right about 70% of the time and sometimes we can get it right upwards of 90% of the time.

(Ok I’m taking some liberties here but remember the audience).

Part 2 What You Work On is Easy to Understand – Sort of

There are really only five types of questions that all data scientists deal with.

  1. Is this A or B?
  2. Is this weird?
  3. How much – or – How many?
  4. How is this organized?
  5. What should I do next?

Now, if they’re still with you, you can move on to Part 3 for some examples – but keep it short.

Part 3 Some Examples – Keep it Short

  1. Is this A or B?

These questions are like predicting who will buy and who won’t.  Or with machines we might try to predict is that machine going to break down in the next week.

  1. Is this weird?

We help your bank and credit card company a lot with this type.  Is the transaction that just showed up on your credit card unusual for you so that maybe we should make sure it was really you.  This is also where the world of cybersecurity comes in.  We can look at individual incoming signals from outside your system and flag the ones that look suspicious.

  1. How much – or – How many?

These questions are about numbers in the future.  What will the price of oil be next month?  What will be my sales in each of the next 12 months?

  1. How is this organized?

Turns out that a lot of data, particularly about people naturally breaks into groups but those groups aren’t necessarily easy to see without some math.  So if we’re going to recommend what movie to see, what music you might like, or even who you should consider dating we’d answer them here.

  1. What should I do next?

Some of these questions have only a few logical answers.  Like, given two factors, like potential sales and the cost of the sale what’s the optimum combination of the two that maximizes profit.  The other types of questions here are even more interesting since they’re how we program self-driving cars where the question might be, the light just turned yellow, should I brake or accelerate through.

Part 4

Well there really isn’t any perfectly designed Part 4.  If you’ve been a great story teller then maybe your audience is ready to ask you some questions.  Maybe it’s time to just listen and make room for the next speaker.

You’ve devoted thousands of hours to perfecting your skills.  You’re proud of your knowledge and can speak at length about math, tools, data, computer architecture, deep learning, IoT, and even AGI.  What I’ve found is that what most non-data scientist want is your elevator pitch.  So keep it simple, keep it brief, and maybe try this approach to still get across most of the magic in what you do.

Other articles by Bill Vorhies


About the author:  Bill is Contributing Editor for Data Science Central.  Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001.  His articles have been read more than 2.1 million times.

[email protected] or [email protected]