With all this talk of terabytes and petabytes of digital information zipping around the world at the speed of light, it’s sometimes easy to forget about the humble book!
After all pretty much all you could ever practically need to know is probably conveniently available on a blog, Google Hangout or SlideShare presentation somewhere.
But to many of us, books are special – and whether you are so attached to the feel of turning paper pages between your fingers that you would never contemplate living without them, or you have found that switching to eBooks has opened a whole new world of conveniently available literature – they still have a big part to play in our lives.
Books keep you focussed – in paper or on a screen, you probably won’t get distracted by a pop-up ad or an interesting looking link to a video of a dog falling over that catches your eye in the sidebar.
So here’s a rundown of 15 books which I think every data scientist should have on their shelf. Some are technical and will only be of interest to programmers or analysts, others will be interesting to anyone interested in the wider implications of our Big Data society.
1. Overviews and theories – the ideas behind the Big Data revolution, mostly written for any audience regardless of technical ability.
The Human Face of Big Data, created by Rick Smolan and Jennifer Erwitt
Rather than a formulaic textbook, this book talks the reader through the ideas and applications of Big Data through a series of essays and photographs. It pays particular attention to humanizing the story – showing how the technologies being discussed are affecting the lives of real people around the world. The essays come from a range of authors noted for their thoughts on the impact of technology and data on society.
Big Data: A Revolution that will Transform how we Live, Work and Think
By Viktor Mayer-Schonberger and Kenneth Cukier.
This book aims to examine the social impact of the ever-growing amount of data we are collecting, storing and analyzing, as well as providing the reader with a practical toolkit for surviving and thriving in a Big Data world.
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die
By Eric Siegel
Referred to as “The Freakonomics of Big Data”, this book is written for any audience regardless of technical expertise and explores the many ways in which data analysis seems to be giving us the change to predict, and therefore change, the future. Author Siegel is the founder and editor of the Predictive Analytics Times.
Pattern Recognition and Machine Learning
By Christopher Bishop
This book assumes no prior knowledge of the subject matter, but readers with some intermediate knowledge of mathematics, such as linear algebra and calculus will find it easier going than those without. It explains and illustrates the way data scientists are introducing Bayesian algorithms to enable computers to make decisions more quickly and reliably than any human ever could.
Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World 1st Edition
By Bruce Schneier
Every day we are being watched and recorded, by governments as well as corporations, hell-bent on collecting as much information about us as they can. But why? What do they want? And, how can we make sure that the benefits we gain from living in an increasingly digitized and data-centred world outweigh the freedom and anonymity we are sacrificing? This book provides answers to these questions.
Smart Cities - Big Data, Civic Hackers, and the Quest for a New Utopia
by Anthony M. Townsend
An examination of how datafication of urban spaces and services is changing the way we live in cities, and how what we are seeing start to happen now – in cities such as Chicago, Zaragoza, Spain, and Milton Keynes, UK, is only the beginning.
2. Practical use – Books which explain specific technical skills, not always suited to beginners
Hadoop, the Definitive Guide
By Tom White
The elephant in the room that everyone is talking about. This practical guide to Hadoop is aimed at programmers and data scientists who want to get started using the Hadoop distributed Big Data framework for analytics and predictive modelling.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
By Trevor Hastie, Robert Tibshirani, Jerome Friedman
This is a great book which looks a little deeper into the science behind the theories. You won’t need a maths degree but it goes into some depth on the statistical theories and concepts behind machine learning and predictive algorithms.
MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems
By Donald Miner
An overview, along with example code, of building MapReduce patterns for use in Big Data and analytical projects. The book was written with the aim of bringing all the disparate information on the subject together from the academic research papers, online communities and blogs where it has evolved.
Python for Data Analysis
By Wes McKinney
There are lots of free courses online which can teach you Python, but as mentioned in the intro, you sometimes just can’t beat a well written and structured book. Python is one of the most popular programming languages for handling data and creating predictive algorithms, and this book explains in detail how to apply it to Big Data tasks.
Practical Data Science with R
By Nina Zumel and John Mount
The basic principles along with real-world case studies showing the many applications of R in statistical modelling and predictive analytics. Not for total R beginners – the emphasis is on explaining how the language can be applied to creating algorithms for data analysis, rather than teaching a beginner to code in R, but most people with a basic understanding of computer programming principles should be able to follow it.
3. Miscellaneous – books covering the dark side of Big Data, hobbyist applications and specific applications.
By Marc Goodman
If you have difficulty sleeping due to thoughts of burglars analyzing social media to determine the best time to break into your house, or hacking your baby monitor to spy on your family, you might want to give this one a miss. An examination of the many ways criminals are taking advantage of our always-connected society.
Internet of Things – Home Projects for Raspberry Pi, Arduino and Beaglebones Black
By Donald Norris
Fancy having a go at building your own IOT home lighting, security or environmental control system? This book will show you how to put together the hardware using cheap microcontrollers and off-the-shelf components, and explain the programming needed to make it all work.
Building Data Science Teams
By DJ Patil
Written by the US Chief Data Scientist and currently a free ebook download at Amazon, this book looks at the mix of skills business leaders need to harness to make the most of analytics in their organizations.
Visualize This: The FlowingData Guide to Design, Visualization, and Statistics 1st Edition
By Nathan Yau
Explains the principles of visual storytelling with Big Data. How to set goals regarding what you need to explain and what is just noise, and creatively express your results in a way that will get the attention of your intended audience .