Guest blog post by Mic Farris. Mic is a Decision Science & Analytics Leader at CenturyLink.
Two of the biggest buzzwords in our industry are “big data” and “data science”. Big Data seems to have a lot of interest right now, but Data Science is fast becoming a very hot topic.
I think there’s room to really define the science of data science – what are those fundamentals that are needed to make data science truly a science we can build upon?
What follows are such a set of fundamentals:
Fundamentals of Data Science
The easiest thing for people within the big data / analytics / data science disciplines is to say “I do data science”. However, when it comes to data science fundamentals, we need to ask the following critical questions: What really is “data”, what are we trying to do with data, and how do we apply scientific principles to achieve our goals with data?
- What is Data?
- The Goal of Data Science
- The Scientific Method
Probability and Statistics
The world is a probabilistic one, so we work with data that is probabilistic – meaning that, given a certain set of preconditions, data will appear to you in a specific way only part of the time. To apply data science properly, one must become familiar and comfortable with probability and statistics.
- The Two Characteristics of Data
- Examples of Statistical Data
- Introduction to Probability
- Probability Distributions
- Connection with Statistical Distributions
- Statistical Properties (Mean, Mode, Median, Moments, Standard Deviation, etc.)
- Common Probability Distributions (Discrete, Binomial, Normal)
- Other Probability Distributions (Chi-Square, Poisson)
- Joint and Conditional Probabilities
- Bayes’ Rules
- Bayesian Inference
This section is one of the key fundamentals of data science. Whether applied in scientific, engineering, or business fields, we are trying to make decisions using data. Data itself isn’t useful unless it’s telling us something, which means we’re making a decision about what it is telling us. How do we come up with those decisions? What are the factors that go into this decision making process? What is the best method for making decisions with data? This section tell us…
- Hypothesis Testing
- Binary Hypothesis Test
- Likelihood Ratio and Log Likelihood Ratio
- Bayes Risk
- Neyman-Pearson Criterion
- Receiver Operating Characteristic (ROC) Curve
- M-ary Hypothesis Test
- Optimal Decision Making
The full article has the following additional sections, each with many interesting topics:
- Probability and Statistics
- Decision Theory
- Estimation Theory
- Coordinate Systems
- Linear Transformations
- Effects of Computation on Data
- Prototype Coding / Programming
- Graph Theory
- Machine Learning