Subscribe to DSC Newsletter

This book is also part of our apprenticeship. Part of the content as well as new content is in a separate document called Addendum. Click here to download the addendum. The book is available on Barnes and Noble. Also, read our article on strong correlations to see how various sections of our book apply to modern data science. If you start from zero, read my data science cheat sheet first: it will greatly facilitate the reading of my book.

My second book - Data Science 2.0 - can be checked out here. The book described on this page is my first book.

About the Author

Dr. Vincent Granville is a visionary data scientist with 15 years of big data, predictive modeling, digital and business analytics experience. Vincent is widely recognized as the leading expert in scoring technology, fraud detection and web traffic optimization and growth. Over the last ten years, he has worked in real-time credit card fraud detection with Visa, advertising mix optimization with CNET, change point detection with Microsoft, online user experience with Wells Fargo, search intelligence with InfoSpace, automated bidding with eBay, click fraud detection with major search engines, ad networks and large advertising clients.

Most recently, Vincent launched Data Science Central, the leading social network for big data, business analytics and data science practitioners. Vincent is a former post-doctorate of Cambridge University and the National Institute of Statistical Sciences. He was among the finalists at the Wharton School Business Plan Competition and at the Belgian Mathematical Olympiads. Vincent has published 40 papers in statistical journals (including Journal of Royal Statistical Society - Series B, IEEE Pattern Analysis and Machine Intelligence, Journal of Number Theory), a Wiley book on data science, and is an invited speaker at international conferences. He also developed a new data mining technology known as hidden decision trees, owns multiple patents, published the first data science eBook, and raised $6MM in start-up funding. Vincent is a top 20 big data influencers according to Forbes, and was also featured on CNN.

Introduction

To find out whether this book might be useful to you, read my introduction.

Table of Content

  • Introduction xxi
  • Chapter 1 What Is Data Science? 1
  • Chapter 2 Big Data Is Different 41
  • Chapter 3 Becoming a Data Scientist 73
  • Chapter 4 Data Science Craftsmanship, Part I 109
  • Chapter 5 Data Science Craftsmanship, Part II 151
  • Chapter 6 Data Science Application Case Studies 195
  • Chapter 7 Launching Your New Data Science Career 255
  • Chapter 8 Data Science Resources 287
  • Index 299

Chapter 1 - What Is Data Science? 1

Real Versus Fake Data Science 2

  • Two Examples of Fake Data Science 5
  • The Face of the New University 6

The Data Scientist 9

  • Data Scientist Versus Data Engineer 9
  • Data Scientist Versus Statistician 11
  • Data Scientist Versus Business Analyst 12

Data Science Applications in 13 Real-World Scenarios 13

  • Scenario 1: DUI Arrests Decrease After End of State Monopoly on Liquor Sales 14
  • Scenario 2: Data Science and Intuition 15
  • Scenario 3: Data Glitch Turns Data Into Gibberish 18
  • Scenario 4: Regression in Unusual Spaces 19
  • Scenario 5: Analytics Versus Seduction to Boost Sales 20
  • Scenario 6: About Hidden Data 22
  • Scenario 7: High Crime Rates Caused by Gasoline Lead. Really? 23
  • Scenario 8: Boeing Dreamliner Problems 23
  • Scenario 9: Seven Tricky Sentences for NLP 24
  • Scenario 10: Data Scientists Dictate What We Eat? 25
  • Scenario 11: Increasing Amazon.com Sales with Better Relevancy 27
  • Scenario 12: Detecting Fake Profiles or Likes on Facebook 29
  • Scenario 13: Analytics for Restaurants 30

Data Science History, Pioneers, and Modern Trends 30

  • Statistics Will Experience a Renaissance 31
  • History and Pioneers 32
  • Modern Trends 34
  • Recent Q&A Discussions 35

Summary 39

Chapter 2 - Big Data Is Different 41

Two Big Data Issues 41

  • The Curse of Big Data 41
  • When Data Flows Too Fast 45

Examples of Big Data Techniques 51

  • Big Data Problem Epitomizing the Challenges of Data Science 51
  • Clustering and Taxonomy Creation for Massive Data Sets 53
  • Excel with 100 Million Rows 57

What MapReduce Can’t Do 60

  • The Problem 61
  • Three Solutions 61
  • Conclusion: When to Use MapReduce 63

Communication Issues 63

Data Science: The End of Statistics? 65

  • The Eight Worst Predictive Modeling Techniques 65
  • Marrying Computer Science, Statistics,and Domain Expertise 67

The Big Data Ecosystem 70

Summary 71

Chapter 3 - Becoming a Data Scientist 73

Key Features of Data Scientists 73

  • Data Scientist Roles 73
  • Horizontal Versus Vertical Data Scientist 75

Types of Data Scientists 78

  • Fake Data Scientist 78
  • Self-Made Data Scientist 78
  • Amateur Data Scientist 79
  • Extreme Data Scientist 80

Data Scientist Demographics 82

Training for Data Science 82

  • University Programs 82
  • Corporate and Association Training Programs 86
  • Free Training Programs 87

Data Scientist Career Paths 89

  • The Independent Consultant 89
  • The Entrepreneur 95

Summary 107

Chapter 4 - Data Science Craftsmanship, Part I 109

New Types of Metrics 110

  • Metrics to Optimize Digital Marketing Campaigns 111
  • Metrics for Fraud Detection 112

Choosing Proper Analytics Tools 113

  • Analytics Software 114
  • Visualization Tools 115
  • Real-Time Products 116
  • Programming Languages 117

Visualization 118

  • Producing Data Videos with R 118
  • More Sophisticated Videos 122

Statistical Modeling Without Models 122

  • What Is a Statistical Model Without Modeling? 123
  • How Does the Algorithm Work? 124
  • Source Code to Produce the Data Sets 125

Three Classes of Metrics: Centrality, Volatility, Bumpiness 125

  • Relationships Among Centrality, Volatility, and Bumpiness 125
  • Defining Bumpiness 126
  • Bumpiness Computation in Excel 127
  • Uses of Bumpiness Coefficients 128

Statistical Clustering for Big Data 129

Correlation and R-Squared for Big Data 130

  • A New Family of Rank Correlations 132
  • Asymptotic Distribution and Normalization 134

Computational Complexity 137

  • Computing q(n) 137
  • A Theoretical Solution 140

Structured Coefficient 140

Identifying the Number of Clusters 141

  • Methodology 142
  • Example 143

Internet Topology Mapping 143

Securing Communications: Data Encoding 147

Summary 149

Chapter 5 - Data Science Craftsmanship, Part II 151

Data Dictionary 152

  • What Is a Data Dictionary? 152
  • Building a Data Dictionary 152

Hidden Decision Trees 153

  • Implementation 155
  • Example: Scoring Internet Traffic 156
  • Conclusion 158

Model-Free Confidence Intervals 158

  • Methodology 158
  • The Analyticbridge First Theorem 159
  • Application 160
  • Source Code 160

Random Numbers 161

Four Ways to Solve a Problem 163

  • Intuitive Approach for Business Analysts with Great Intuitive Abilities 164
  • Monte Carlo Simulations Approach for Software Engineers 165
  • Statistical Modeling Approach for Statisticians 165
  • Big Data Approach for Computer Scientists 165

Causation Versus Correlation 165

How Do You Detect Causes? 166

Life Cycle of Data Science Projects 168

Predictive Modeling Mistakes 171

Logistic-Related Regressions 172

  • Interactions Between Variables 172
  • First Order Approximation 172
  • Second Order Approximation 174
  • Regression with Excel 175

Experimental Design 176

  • Interesting Metrics 176
  • Segmenting the Patient Population 176
  • Customized Treatments 177

Analytics as a Service and APIs 178

  • How It Works 179
  • Example of Implementation 179
  • Source Code for Keyword Correlation API 180

Miscellaneous Topics 183

  • Preserving Scores When Data Sets Change 183
  • Optimizing Web Crawlers 184
  • Hash Joins 186
  • Simple Source Code to Simulate Clusters 186

New Synthetic Variance for Hadoop and Big Data 187

  • Introduction to Hadoop/MapReduce 187
  • Synthetic Metrics 188
  • Hadoop, Numerical, and Statistical Stability 189
  • The Abstract Concept of Variance 189
  • A New Big Data Theorem 191
  • Transformation-Invariant Metrics 192
  • Implementation: Communications Versus Computational Costs 193
  • Final Comments 193

Summary 193

Chapter 6 - Data Science Application Case Studies 195

Stock Market 195

  • Pattern to Boost Return by 500 Percent 195
  • Optimizing Statistical Trading Strategies 197
  • Stock Trading API: Statistical Model 200
  • Stock Trading API: Implementation 202
  • Stock Market Simulations 203
  • Some Mathematics 205
  • New Trends 208

Encryption 209

  • Data Science Application: Steganography 209
  • Solid E‑Mail Encryption 212
  • Captcha Hack 214

Fraud Detection 216

  • Click Fraud 216
  • Continuous Click Scores Versus Binary Fraud/Non-Fraud 218
  • Mathematical Model and Benchmarking 219
  • Bias Due to Bogus Conversions 220
  • A Few Misconceptions 221
  • Statistical Challenges 221
  • Click Scoring to Optimize Keyword Bids 222
  • Automated, Fast Feature Selection with Combinatorial Optimization 224
  • Predictive Power of a Feature and Cross-Validation 225
  • Association Rules to Detect Collusion and Botnets 228
  • Extreme Value Theory for Pattern Detection 229

Digital Analytics 230

  • Online Advertising: Formula for Reach and Frequency 231
  • E‑Mail Marketing: Boosting Performance by 300 Percent 231
  • Optimize Keyword Advertising Campaigns in 7 Days 232
  • Automated News Feed Optimization 234
  • Competitive Intelligence with Bit.ly 234
  • Measuring Return on Twitter Hashtags 237
  • Improving Google Search with Three Fixes 240
  • Improving Relevancy Algorithms 242
  • Ad Rotation Problem 244

Miscellaneous 245

  • Better Sales Forecasts with Simpler Models 245
  • Better Detection of Healthcare Fraud 247
  • Attribution Modeling 248
  • Forecasting Meteorite Hits 248
  • Data Collection at Trailhead Parking Lots 252
  • Other Applications of Data Science 253

Summary 253

Chapter 7 - Launching Your New Data Science Career 255

Job Interview Questions 255

  • Questions About Your Experience 255
  • Technical Questions 257
  • General Questions 258
  • Questions About Data Science Projects 260

Testing Your Own Visual and Analytic Thinking 263

  • Detecting Patterns with the Naked Eye 263
  • Identifying Aberrations 266
  • Misleading Time Series and Random Walks 266

From Statistician to Data Scientist 268

  • Data Scientists Are Also Statistical Practitioners 268
  • Who Should Teach Statistics to Data Scientists? 269
  • Hiring Issues 269
  • Data Scientists Work Closely with Data Architects 270
  • Who Should Be Involved in Strategic Thinking? 270
  • Two Types of Statisticians 271
  • Using Big Data Versus Sampling 272

Taxonomy of a Data Scientist 273

  • Data Science’s Most Popular Skill Mixes 273
  • Top Data Scientists on LinkedIn 276

400 Data Scientist Job Titles 279

Salary Surveys 281

  • Salary Breakdown by Skill and Location 281
  • Create Your Own Salary Survey 285

Summary 285

Chapter 8 - Data Science Resources 287

Professional Resources 287

  • Data Sets 288
  • Books 288
  • Conferences and Organizations 290
  • Websites 291
  • Definitions 292

Career-Building Resources 295

  • Companies Employing Data Scientists 296
  • Sample Data Science Job Ads 297
  • Sample Resumes 297

Summary 298

Index 299

Other links

Views: 114385

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by RAGHAVENDRACHARI on Wednesday

How Can i download this book , if any pdf available could you please help me to download or you can send me to my mail raghavendrachari08@outlook.com

Comment by Sam David on June 11, 2016 at 8:19am

It is frustrating that none of the links for either Data Science 1.0 or 2.0 works. There is no way to download the 1st version since it is difficult to find.

Please send me the correct links and oblige.

Thank You. Have a nice day.

Regards,

SD

Comment by varun kesherwani on February 27, 2016 at 2:25pm

How can we get books?

Comment by ARMEL DJANGONE on January 24, 2016 at 12:21pm

Just ordered it...looking forward to receiving it.

Comment by Flavio Bossolan on January 18, 2016 at 2:44am

Hey Vincent, you mentioned that this book is available in pdf for dsc members. I couldn´t find the link to download it. Can you share it with me please? Thank you very much!

Comment by Thanh Vu on March 2, 2015 at 8:39pm

Dear Dr. Vincent Granville,

I got this book. It is really helpful book on Data Science. I loves it. 

Regards,

Thanh Vu

Comment by Hassine Saidane on January 27, 2015 at 12:33pm

The book looks comprehensive. I would have liked to see chapters on most of predicitve tools (data mining) as well as chapters on clasiical topics likedata preparation, transformation, Exploratory Data Analysis, etc,, and also on traditional BI examples  such as visualization, reporting, dashboards, all these adapted to the case of Big data. Specific Big Data algorithms such as Deep Learning would be a great addition. 

Comment by Bob Vargas on January 18, 2015 at 9:43am

Lucky me! I actually bought this book last June, 2014 then learned about this website when I was reading page 85. Now I can devote my time on this valuable website other than the resourceful book.

Comment by Arturo Olvera on December 1, 2014 at 12:34pm

Are there any plans to have it in different languages? Spanish for example... How can I contribute?

Comment by kathirVel on October 30, 2014 at 12:18am

Just ordered this book thru amazon .. eagerly waiting to read 

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service