- Write your first R code, and discover vectors, matrices, data frames and lists.
- Seven Courses on student t-tests, ANOVA, correlation, regression, and more. (26 hours)
How to detect fake data science
Anytime you see a program dominated by ANOVA, t-tests, linear regression, and generally speaking, stuff published in any statistics 101 textbook dating back to 1930 (when computers did not exist), you are not dealing with actual data science. While it is true that data science has many flavors and does involve a bit of old-fashioned statistical science, most of the statistical theory behind data science has been entirely rewritten in the last 10 years, and in many occasions, invented from scratch to solve big data problems. You can find the real stuff for instance in Dr. Granville's Wiley book and his upcoming Data Science 2.0 book (for free), as well as in DSC's data science research lab. The material can be understood by any engineer with limited or no statistical background. It is indeed designed for them, and for automation / black-box usage - something classical statistics has been unable to achieve so far.
Also you don't need to know matrix algebra to practice modern data science. When you see 'matrices' in a data science program, it's a reg flag, in my opinion.
More warnings about traditional statistical science
Some statisticians claim that what data scientists do is statistics, and that we are ignorant of this fact. There are many ways to solve a problem: sometimes the data science solution is identical to the statistical solution. But typically, it is far more easy to understand and scale the data science solution. An engineer, or someone familiar with algorithms or databases, or a business manager, will easily understand. Data science, unlike statistics, has no mystery, and does not force you to choose between hundreds of procedures to solve a problem.
In some ways, data science is more simple, unified, and powerful than statistics, and in some ways more complicated as it requires strong domain expertise for successful implementation, as well as expertise in defining the right metrics, and chasing or designing the right data (along with data harvesting schedules).
It is interesting that Granville's article on predicting records was criticized by statisticians as being statistical science, while in fact you can understand the details without having ever attended any basic statistics classes. Likewise, engineers extensively use concepts such as copulas - something that looks very much like statistical science - yet has never been used by classical statisticians (it's not in any of their textbooks).
In short, some statisticians are isolating themselves more and more from the business world, while at the same time claiming that we - data scientists - are just statisticians with a new name. Nothing could be more wrong.
- Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
- Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
- Buzz: Business News | Announcements | Events | RSS Feeds
- Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers
- Data Scientist Reveals his Growth Hacking Techniques
- 10 Modern Statistical Concepts Discovered by Data Scientists
- Top data science keywords on DSC
- 4 easy steps to becoming a data scientist
- 13 New Trends in Big Data and Data Science
- 22 tips for better data science
- Data Science Compared to 16 Analytic Disciplines
- How to detect spurious correlations, and how to find the real ones
- 17 short tutorials all data scientists should read (and practice)
- 10 types of data scientists
- 66 job interview questions for data scientists
- High versus low-level data science