Learning any new skill is hard. There are too many possibilities, and the goal seems massive and intimidating.

Enter the Pareto Principle.

The Pareto Principle, also known as the 80/20 rule, suggests that 80 percent of results come from 20 percent of efforts. It can be applied to everything from business to language, even learning how to use R.

With just a few packages and commands, you can get a lot done. The rest is just practice. Here are a few topics you can focus on to learn, and let’s make it interesting by using some National Football League data.

**Data Manipulation**

To start off, I downloaded RStudio here and the dataset here. The data contains every NFL play from the first half of this season. Here’s what the data looked like after importing it into R.

I want to see how teams choose to either run or pass at different yard lines on the field. In order to do that, I’ll have to shift my data frame, essentially create a pivot table in R. For that, I’ll use the plyr package to count the type of each play by yard line.

**Data Visualization**

Now that the data is set up correctly, I want to see the data in a graph. I’ll use ggplot2, one of the most well-known packages in R. We’ll use a basic plot, but with a little twist to separate which down the play was. In looking at the graph, we can see how play calling changes from 1^{st} down to 3^{rd} down.

**Presentation**

Now that you have found something interesting, you need to present it. R Markdown (see here) allows you to create HTML style pages that can even be published on the web. In fact, I used R Markdown in creating this post.

In summary, data analysis skills are near the top of every employer’s wish list. They may seem difficult, but in fact are quite attainable with practice. Don’t be intimidated, and never stop learning.

*Note: The code for this post can be found here, and if you’re interested in more posts, you can find my blog here.*

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central