15 Books every Data Scientist Should Read

With all this talk of terabytes and petabytes of digital information zipping around the world at the speed of light, it’s sometimes easy to forget about the humble book!

After all pretty much all you could ever practically need to know is probably conveniently available on a blog, Google Hangout or SlideShare presentation somewhere.

But to many of us, books are special – and whether you are so attached to the feel of turning paper pages between your fingers that you would never contemplate living without them, or you have found that switching to eBooks has opened a whole new world of conveniently available literature – they still have a big part to play in our lives.

Books keep you focussed – in paper or on a screen, you probably won’t get distracted by a pop-up ad or an interesting looking link to a video of a dog falling over that catches your eye in the sidebar.

So here’s a rundown of 15 books which I think every data scientist should have on their shelf. Some are technical and will only be of interest to programmers or analysts, others will be interesting to anyone interested in the wider implications of our Big Data society.

1. Overviews and theories – the ideas behind the Big Data revolution, mostly written for any audience regardless of technical ability.


The Human Face of Big Data, created by Rick Smolan and Jennifer Erwitt


Rather than a formulaic textbook, this book talks the reader through the ideas and applications of Big Data through a series of essays and photographs. It pays particular attention to humanizing the story – showing how the technologies being discussed are affecting the lives of real people around the world. The essays come from a range of authors noted for their thoughts on the impact of technology and data on society.


Big Data: A Revolution that will Transform how we Live, Work and Think

By Viktor Mayer-Schonberger and Kenneth Cukier.


This book aims to examine the social impact of the ever-growing amount of data we are collecting, storing and analyzing, as well as providing the reader with a practical toolkit for surviving and thriving in a Big Data world.


Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die

By Eric Siegel


Referred to as “The Freakonomics of Big Data”, this book is written for any audience regardless of technical expertise and explores the many ways in which data analysis seems to be giving us the change to predict, and therefore change, the future. Author Siegel is the founder and editor of the Predictive Analytics Times.

Pattern Recognition and Machine Learning

By Christopher Bishop


This book assumes no prior knowledge of the subject matter, but readers with some intermediate knowledge of mathematics, such as linear algebra and calculus will find it easier going than those without. It explains and illustrates the way data scientists are introducing Bayesian algorithms to enable computers to make decisions more quickly and reliably than any human ever could.

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World 1st Edition 

By Bruce Schneier


Every day we are being watched and recorded, by governments as well as corporations, hell-bent on collecting as much information about us as they can. But why? What do they want? And, how can we make sure that the benefits we gain from living in an increasingly digitized and data-centred world outweigh the freedom and anonymity we are sacrificing? This book provides answers to these questions.


Smart Cities - Big Data, Civic Hackers, and the Quest for a New Utopia

by Anthony M. Townsend


An examination of how datafication of urban spaces and services is changing the way we live in cities, and how what we are seeing start to happen now – in cities such as Chicago, Zaragoza, Spain, and Milton Keynes, UK, is only the beginning.


2. Practical use – Books which explain specific technical skills, not always suited to beginners


Hadoop, the Definitive Guide

By Tom White


The elephant in the room that everyone is talking about. This practical guide to Hadoop is aimed at programmers and data scientists who want to get started using the Hadoop distributed Big Data framework for analytics and predictive modelling.


The Elements of Statistical Learning: Data Mining, Inference, and Prediction

By Trevor Hastie, Robert Tibshirani, Jerome Friedman


This is a great book which looks a little deeper into the science behind the theories. You won’t need a maths degree but it goes into some depth on the statistical theories and concepts behind machine learning and predictive algorithms.


MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

By Donald Miner


An overview, along with example code, of building MapReduce patterns for use in Big Data and analytical projects. The book was written with the aim of bringing all the disparate information on the subject together from the academic research papers, online communities and blogs where it has evolved.


Python for Data Analysis

By Wes McKinney


There are lots of free courses online which can teach you Python, but as mentioned in the intro, you sometimes just can’t beat a well written and structured book. Python is one of the most popular programming languages for handling data and creating predictive algorithms, and this book explains in detail how to apply it to Big Data tasks.


Practical Data Science with R

By Nina Zumel and John Mount


The basic principles along with real-world case studies showing the many applications of R in statistical modelling and predictive analytics. Not for total R beginners – the emphasis is on explaining how the language can be applied to creating algorithms for data analysis, rather than teaching a beginner to code in R, but most people with a basic understanding of computer programming principles should be able to follow it.

3. Miscellaneous – books covering the dark side of Big Data, hobbyist applications and specific applications.


Future Crimes

By Marc Goodman


If you have difficulty sleeping due to thoughts of burglars analyzing social media to determine the best time to break into your house, or hacking your baby monitor to spy on your family, you might want to give this one a miss. An examination of the many ways criminals are taking advantage of our always-connected society.


Internet of Things – Home Projects for Raspberry Pi, Arduino and Beaglebones Black

By Donald Norris


Fancy having a go at building your own IOT home lighting, security or environmental control system? This book will show you how to put together the hardware using cheap microcontrollers and off-the-shelf components, and explain the programming needed to make it all work.


Building Data Science Teams

By DJ Patil


Written by the US Chief Data Scientist and currently a free ebook download at Amazon, this book looks at the mix of skills business leaders need to harness to make the most of analytics in their organizations.


Visualize This: The FlowingData Guide to Design, Visualization, and Statistics 1st Edition

By Nathan Yau


Explains the principles of visual storytelling with Big Data. How to set goals regarding what you need to explain and what is just noise, and creatively express your results in a way that will get the attention of your intended audience .


DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Load Previous Comments
  • Paul Ralph

    What is the picture for? I don't doubt that all of the pictured books are great ones (actually, I already own three of them), but only one of them is mentioned and linked in the article below it.

    Also, here is the correct link to "Practical Data Science with R

  • Vincent Granville

    Hi Paul, I added the picture myself. It's a screen shot from Google Image search results for data science books.

  • Erika Canizales

    hola hay bibliografía en español--- y en web? si es viable la información me pueden apoyar?