Subscribe to DSC Newsletter

A Simple Introduction to Data Science

When you hear the term data scientist, what do you think of? If you’re like most people, you might think of something incredibly complex, with statistical terms and programming languages that are beyond comprehension. You might think that only PhD’s in computer science can do data science.

But if you peel back the layers, you’ll find that this isn’t the case. Data science, coined by DJ Patil who is now the Chief Data Scientist at the White House, is just a 21st century spin on mathematics that people have been doing for centuries. Big data, data science, and analytics are just fancy terms for using information available to gain insight and improve a business. Whether it’s a small Excel spreadsheet or a 100 million records in a database, the goal is always the same: to find value.

You too can start down the path of data science, and learn a lot along the way. Let’s demonstrate with a simple example.

Step 1: Have a question or something you’re curious about.

In this spring’s NBA Playoffs, Steph Curry and the Golden State Warriors were down 2 games to 1 against the Memphis Grizzlies, and Curry’s 3-point shooting was down in the previous two games which the Warriors lost. Commentators were speculating; have the Grizzlies figured out Steph Curry? Can he bounce back and guide the Warriors to victory?

Step 2: Gather data that exists for your area of interest.

We can use easily available data from basketball-reference.com in this situation. I simply took Steph Curry’s game log for the 2014-15 regular season and created a .CSV file (uploaded here if you want to download it). Here’s what the data looked like:

Steph Curry CSV Data JPEG

Step 3: Analyze your data, using whichever software and method you prefer.

Data science can range from making simple bar graphs in Excel to running multi-variable logistic regression in Hadoop. In this case, I’ll do some straightforward analysis on the data in R, which is free to download here.

For this analysis, I looked at his three point percentage for all 82 regular season games, and identified which games he shot 20 percent of less. I used that to average his three point percentage for all games following those low shooting games.

Here’s the script I used to import the data, identify the relevant games, and do the calculation. This may seem difficult, but using resources to learn each individual concept such as importing data, loops, arrays, etc, you could do this within a few days.

Steph Curry R Code JPEG

Step 4: Look at your analysis, interpret, and apply what you learned.

Based on the analysis, in games following his low-shooting games, Curry shot an average of 42.4 percent on three-pointers. Based on this, you proclaim that Curry will regress to his mean and return to his All-Star form. That means history suggests he would bounce back and find his usually-superb shooting stroke, with each additional three pointer boosting the Warriors scoring and leading them to victory.

We all know what happened next.

nba-stephen-curry

In closing, even if R or this type of analysis isn’t your cup of tea, I encourage you to gather some data and see what you learn. You can even start very simple in Excel then build your way up to more complex tools. Go forth, and what you find may surprise you.

About: Divya Parmar is a recent college graduate working in IT consulting. For more posts every week, and to subscribe to his blog, please click here.

Views: 4767

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Ananda Sagar Kommaluri on April 26, 2016 at 10:06pm

Good Article

Comment by Salil Sheth on August 13, 2015 at 2:32pm
I agree. Your writing is excellent.
Comment by Sanjeev Arora on August 13, 2015 at 7:12am

great article Divya, glad to see the simplicity of these concepts

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service