Many folks believe that Hadoop is the original NOSQL database and it is the first that was available commercially in 2008. But Hadoop grew out of a research paper published by Google a few years earlier about their proprietary Big Table NOSQL store. Big Table was the inspiration for Column Oriented DBs (CODBs) though the current crop of offerings pushes far beyond their Google roots.
Unlike the row-based systems (Key Value and Document Oriented DBs) these are as the name implies oriented to storing data in columns. Where Document Oriented DBs excel at OLTP, Column Oriented DBs excel at OLAP (on line analytic processing).
Data are stored in cells grouped in columns as opposed to rows. Columns are grouped in Column Families and each can contain an essentially unlimited number of columns. Each storage block contains data from only one column.
Data can be sparse, that is not all cells need to be filled and the cell-to-column organization allows for greater compression of data on the disks. Compression reduces query time since fewer read actions are required.
Moreover CODBs would be selected where queries are likely to look at similar data items on many different records, for example "find all the people with the last name Smith" can be retrieved in a single operation. Other operations like counting the number of matching records or performing math over a set of data (e.g. find the average salary of all employees at level X) can be much faster. Since these data elements will reside in single columns they can be retrieved very quickly. The CODB might be able to retrieve a single data item from all records in a single operation, contrasted to row based systems where each row would need to be read and the data items extracted. Compared to REBMS this speed increase can be in the range of 5X to 100X. Consequently CODBs are the go-to solution for OLAP applications.
Particular Opportunities and Project Characteristics
July 23, 2014
Bill Vorhies, President & Chief Data Scientist – Data-Magnum - © 2014, all rights reserved.
About the author: Bill Vorhies is President & Chief Data Scientist of Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001. He can be reached at:
This original blog can be viewed at:
All nine lessons can be downloaded as a White Paper at: