Here's the introduction. Click here to view more details about the book.
Introduction
This book is a type of “handbook” on data science and data scientists, and contains information not found in traditional statistical, programming, or computer science textbooks. The author has compiled what he considers some of the most important information you will need for a career in data science, based on his 20+ years as a leader in the field. Much of the text was initially published on the Data Science Central website over the last three years, which is read by millions of website visitors. The book shows how data science is different from related fields and the value it brings to organizations using big data.
This book has three components: a multi-layer discussion of what data science is and how it relates to other disciplines; technical applications of and for data science including tutorials and case studies; and career resources for practicing and aspiring data scientists. Numerous career and training resources are included (such as data sets, web crawler source code, data videos, and how to build API’s) so you can start practicing data science today and quickly boost your career. For decision makers, you will find information to help you make decisions on how to build a better analytic team, whether and when you need specialized solutions, and which ones will work best for your need.
Who This Book Is For
This book is intended for data scientists and related professionals (such as business analysts, computer scientists, software engineers, data engineers and statisticians) who are interested in shifting to big data science careers. It is also for the college student studying a quantitative curriculum with the goal of becoming a future data scientist. Finally, it is for managers of data scientists, and people interested in creating a startup business or consultancy around data science.
These readers will find valuable information throughout the book, and specifically in the following chapters:
What This Book Covers
The technical part of this book covers core data science topics including:
The focus is on recent technology. So you will not find material about old techniques such as linear regression, except anecdotal references. These are discussed at lengths in all standard books. Actually, there is some limited discussions on logistic-like regressions in this book, but it’s more about blending it with other classifiers, and proposing a numerically stable, approximate algorithm; we mention that approximate solutions are often as good as the exact model, as no data fits perfectly with a theoretical model.
Besides technology, the book provides useful career resources, including job interview questions (some are technical, some are not). Another important part is cases studies. Some have a statistical/machine learning flair, some have more of a business/decision science or operations research flair, and some have more of a data engineering flair.
Most of the time, I have favored topics that were posted recently and very popular on Data Science Central (the leading community for data scientists), rather than topics that I am particularly attached to,
How This Book Is Structured
The book consists of three main sets of topics:
The book provides valuable career resources for potential and existing data scientists and related professionals (and their managers and their bosses), and generally speaking, to all professionals dealing with increasingly bigger, more complex, and faster flowing data. The book also provides data science recipes, craftsmanship, concepts (many times, original and published for the first time), and cases studies illustrating implementation methods and techniques that have been proven successful in various domains for analyzing modern data — either manually or automatically
What You Need to Use This Book
The book contains few sample code, either in R or Perl. You can download Perl from http://www.activestate.com/activeperl/downloads and R from http://cran.r-project.org/bin/windows/base/. If you use a Windows machine, I would first install Cygwin, a Linux-like environment for windows. You can get Cygwin at http://cygwin.com/install.html. Python is also available as open source and has a useful library called Pandas.
For most of the book, 1 or 2 years of college with some basic quantitative courses is enough for you to understand the content. The book does not require calculus or advanced math — indeed, it barely contains any mathematical formulas or symbols.
Yet some quite advanced material is described at a high level. A few technical notes spread throughout the book are for those who are more mathematically inclined and interested in digging deeper. Two years of calculus, statistics, and matrix theory at the college level is needed to understand these technical notes. Some source code (R, Perl) and datasets are provided, but the emphasis is not on coding.
This mixture of technical levels offers the opportunity for you to explore the depths of data science without advanced math knowledge. (A bit like the way Carl Sagan introduced astronomy to the mainstream public.)
Comment
With all of this said, what is the name of your book?
I am interested in taking the DSA. According to the website, I need read the book and work on the project.
But can someone tell me how can I publish the project work, that is not clear.
Highly appreciate if someone can guide on this, if you have taken this before.
Thanks & Regards,
Sree
Congratulations Vincent!
I have read the 1st chapter (sample) in Amazon and very curious to read the rest of the book, hence pre-ordered.
Looking forward to the apprenticeship training.
I pre-ordered this as well. Looking forward to the apprenticeship.
Waiting for March 31st....
Congratulations Vincent, i found many potentially interesting chapters…Looking forward to reading it.
Congratulations on the book!
I have already pre-ordered this book on Amazon.
Thanks. Waiting to grab a copy.
© 2020 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central