.

*This article was written by Daniel McAuley**.*

I recently had the pleasure of speaking on a few panels about analytics to my fellow MBA students and alumni, as well as many Penn undergrads. After these talks, I’ve been asked for my advice on what the best resources are for someone coming from the business world (i.e., non-technical) who wants to develop the skills to become an effective data scientist. This post is an attempt to codify the advice I give and general resources I point people towards. Hopefully, this will make what I have learned accessible to more people and provide some guidance for those who realize that the future belongs to the empirically inclined (see below) but don’t know where to start their journey to becoming part of the club.

However, I would caution the reader that what I propose here is only a starting point on a journey towards really understanding the power of good data science. And, as Sean Taylor once told me, learn only what you need to accomplish your goal; if there are things on this list that you know you don’t need then skip them, you won’t hurt my feelings. At its core, data science is really about curiosity, optimism, and continual learning, all of which are ongoing habits rather than boxes to be checked. Therefore, I expect this list to evolve as the tools themselves change and as I continue to discover more about data science itself.

**1. Linear Algebra**

Linear algebra is a topic that underlies a lot of the statistical techniques and machine learning algorithms that you will employ as a data scientist. I like to recommend a MOOC I took through Coursera years ago, Coding the Matrix: Linear Algebra through Computer Science Applications. As the name implies, the course teaches linear algebra in the context of computer science (specifically using Python, which lends itself well to data science). There is also an optional companion textbook that makes a great reference manual.

**2. R**

Given that we use R at Wealthfront, I have a few resources that I think are important here. The first, written by Garrett Grolemund and Hadley Wickham, R for Data Science will be published in physical form in July 2016 but is available for free online now. And rather than explain what the book is about in my own words. If you only read one data science book, it should be this.

Next up, our friend Hadley has also written Advanced R, which covers functional programming, metaprogramming, and performant code as well as the quirks of R.

Hadley is also responsible for some of the packages I use every day that make 90% of common data science tasks quicker and less verbose. I recommend checking out the following libraries; they will change the way you write code in R:

- ggplot2 — An implementation of the Grammar of Graphics in R
- devtools —Tools to make an R developer’s life easier
- dplyr — Plyr specialized for data frames: faster & with remote data stores
- purrr — Make your pure R function purrr with functional programming
- tidyr — Easily tidy data with spread and gather functions
- lubridate — Make working with dates in R just that little bit easier
- testthat — An R package to make testing fun

For extra credit, check out yet another of Hadley’s books: R Packages. This is a great follow-up resource for those of you that want to write reproducible, well-documented R code that other people can easily use (other people includes your future self!)

**3. SQL**

This is probably the easiest section of the guide as you can teach yourself most of SQL in a few hours. Code School has both introductory and intermediate courses that you can get through in an afternoon.

The Sequel to SQL covers everything from aggregate functions and joins to normalization and subqueries. And while mastering these skills takes practice, you can still get an idea of what SQL can and cannot do without too much work.

**4. Bayesian Reasoning**

this book is probably one of the best all-around resources for learning how to do data science in R.

Without wading into the age-old Frequentist vs. Bayesian debate (or non-debate), I think that a solid foundation in Bayesian reasoning and statistics is a crucial part of any data scientist’s repertoire. For example, Bayesian reasoning underpins much of modern A/B testing and Bayesian methods are applied in many other areas of data science (and are generally covered less in introductory statistics courses).

John K. Kruschke has a great ability to break down complex material and convey it in a way that is intuitive and practical. Along with R for Data Science, this book is probably one of the best all-around resources for learning how to do data science in the R programming language.

Additionally, Kruschke’s blog makes a great companion resources to the textbook if you’re looking for more examples of problems to solve or answers to questions you still have after reading the book. And if a textbook isn’t exactly what you’re looking for, then Rasmus Bååth’s research blog, Publishable Stuff, is another great resource for learning about Bayesian approaches to problem-solving.

*To read the whole article, with the link for each resource,* *click here.*

- 11 data science skills for machine learning and AI
- Get started on AWS with this developer tutorial for beginners
- Microsoft, Zoom gain UCaaS market share as Cisco loses
- Develop 5G ecosystems for connectivity in the remote work era
- Choose between Microsoft Teams vs. Zoom for conference needs
- How to prepare networks for the return to office
- Qlik keeps focus on real-time, actionable analytics
- Data scientist job outlook in post-pandemic world
- 10 big data challenges and how to address them
- 6 essential big data best practices for businesses
- Hadoop vs. Spark: Comparing the two big data frameworks
- With accelerated digital transformation, less is more
- 4 IoT connectivity challenges and strategies to tackle them

Posted 10 May 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central