[Book] Big Data - Principles and best practices of scalable realtime data systems

Big Data
Principles and best practices of scalable realtime data systems
Nathan Marz and Sam Ritchie

MEAP Began: January 2012
Softbound print: Summer 2012 | 425 pages 
ISBN: 9781617290343

Pre-Order options*
Order today and start reading Big Data today through MEAP                    
  MEAP + Ebook only - $39.99
  MEAP + Print book (includes Ebook) when available - $49.99
* For more information, please see the MEAP FAQs page.
  About MEAP Release Date Estimates      


Table of Contents         Resources 
  1. A new paradigm for Big Data - FREE 
  2. Data model for Big Data - AVAILABLE 
  3. Data storage on the batch layer 
  4. MapReduce and batch processing 
  5. Batch processing with Cascading 
  6. Basics of the serving layer 
  7. Storm and the speed layer 
  8. Incremental batch processing 
  9. Layered architecture in-depth 
10. Piping the system together 
11. Future of NoSQL and Big Data processing 

Appendix A: Hadoop 
Appendix B: Thrift 
Appendix C: Storm


Services like social networks, web analytics, and intelligent e-commerce often need to manage data at a scale too big for a traditional database. Complexity increases with scale and demand, and handling big data is not as simple as just doubling down on your RDBMS or rolling out some trendy new technology. Fortunately, scalability and simplicity are not mutually exclusive—you just need to take a different approach. Big data systems use many machines working in parallel to store and process data, which introduces fundamental challenges unfamiliar to most developers.

Big Data teaches you to build these systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy to understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built.

Big Data shows you how to build the back-end for a real-time service called SuperWebAnalytics.com—our version of Google Analytics. As you read, you'll discover that many standard RDBMS practices become unwieldy with large-scale data. To handle the complexities of Big Data and distributed systems, you must drastically simplify your approach. This book introduces a general framework for thinking about big data, and then shows how to apply technologies like Hadoop, Thrift, and various NoSQL databases to build simple, robust, and efficient systems to handle it.


  • Introduction to the concepts and technologies of Big Data
  • Work with emerging tools like Hadoop, Cassandra, Thrift, and more
  • Build on the skills you've learned using traditional databases
  • Real-time processing of web-scale data

This book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful.


Nathan Marz is an engineer at Twitter. He was previously Lead Engineer at BackType, a marketing intelligence company, that was acquired by Twitter in July of 2011. He is the author of two major open source projects: Storm, a distributed realtime computation system, and Cascalog, a tool for processing data on Hadoop. He is a frequent speaker and writes a blog at nathanmarz.com.

Sam Ritchie is an engineer at Twitter who uses Cascalog and ElephantDB to process and analyze many terabytes of data in near real-time. He is also the lead developer on FORMA, an open-source deforestation monitoring system in use by a number of top research institutions. He is a committer on Cascalog, ElephantDB, Pallet and a number of other open source Clojure projects.


This Early Access version of Big Data enables you to receive new chapters as they are being written. You can also interact with the authors to ask questions, provide feedback and errata, and help shape the final manuscript on the Author Online


Sign up to read more content when it is released and to receive news about this book.

Views: 1835


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service