The Fundamentals of Data Science

Guest blog post by Mic Farris. Mic is a Decision Science & Analytics Leader at CenturyLink.

Two of the biggest buzzwords in our industry are “big data” and “data science”. Big Data seems to have a lot of interest right now, but Data Science is fast becoming a very hot topic.

I think there’s room to really define the science of data science – what are those fundamentals that are needed to make data science truly a science we can build upon?

What follows are such a set of fundamentals:

Fundamentals of Data Science

Introduction

The easiest thing for people within the big data / analytics / data science disciplines is to say “I do data science”. However, when it comes to data science fundamentals, we need to ask the following critical questions: What really is “data”, what are we trying to do with data, and how do we apply scientific principles to achieve our goals with data?

What is Data?
The Goal of Data Science
The Scientific Method

Probability and Statistics

The world is a probabilistic one, so we work with data that is probabilistic – meaning that, given a certain set of preconditions, data will appear to you in a specific way only part of the time. To apply data science properly, one must become familiar and comfortable with probability and statistics.

The Two Characteristics of Data
Examples of Statistical Data
Introduction to Probability
Probability Distributions
Connection with Statistical Distributions
Statistical Properties (Mean, Mode, Median, Moments, Standard Deviation, etc.)
Common Probability Distributions (Discrete, Binomial, Normal)
Other Probability Distributions (Chi-Square, Poisson)
Joint and Conditional Probabilities
Bayes’ Rules
Bayesian Inference

Decision Theory

This section is one of the key fundamentals of data science. Whether applied in scientific, engineering, or business fields, we are trying to make decisions using data. Data itself isn’t useful unless it’s telling us something, which means we’re making a decision about what it is telling us. How do we come up with those decisions? What are the factors that go into this decision making process? What is the best method for making decisions with data? This section tell us…

Hypothesis Testing
Binary Hypothesis Test
Likelihood Ratio and Log Likelihood Ratio
Bayes Risk
Neyman-Pearson Criterion
Receiver Operating Characteristic (ROC) Curve
M-ary Hypothesis Test
Optimal Decision Making

Content

The full article has the following additional sections, each with many interesting topics:

Probability and Statistics
Decision Theory
Estimation Theory
Coordinate Systems
Linear Transformations
Effects of Computation on Data
Prototype Coding / Programming
Graph Theory
Algorithms
Machine Learning

Click here to read the full article. Click here to read new articles published this week.

DSC Resources

Services: Hire a Data Scientist | Search DSC | Classifieds | Find a Job
Contributors: Post a Blog | Ask a Question
Follow us: @DataScienceCtrl | @AnalyticBridge

Leave a Reply Cancel reply