Summary: Unless you have special needs Document Oriented DBs are your most likely default choice.
Second in popularity in the business world behind Key-Value-Stores are Document Oriented Databases. Here an entire document is treated as a record. While these can accommodate completely unstructured text, they excel at semi-structured text. That is text that has been encoded according to a known schema such as XML, YAML, JSON, PDF, email, or even MS Office.
The hidden strength of DODBs is that they are a collection of key value collections. That is within a bucket similar to key value stores, there is an additional level of key value indexing that allows much more efficient queries. It is likely that if you have several big data projects and none call out for a specific specialty database type like graph, the DODB will be your go-to default.
Characteristics
Stored elements are called documents. The data model is a collection of documents, and each document is a collection of key values allowing indexing within the bucket.
On the simplest level, thanks to the built in structures of semi-structured document types like XML, JSON, or even common email formats or MS Word docs (documents with tagged elements) the secondary index within the bucket can be easily inferred. If you have copied a number of invoices into a bucket, the tagged elements facilitate knowing that such-and-such a line is the address, another is the amount due, and so forth. In this mode DODBs are great for raw document searches such as patent search, litigation support, legal precedent search, search of scientific papers and experimental data, email compliance searches, or simply retrieving knowledge on a particular topic hidden among a forest of internal or externally prepared reports and document.
However, the strength of key values within the bucket, the secondary index is much more powerful. It easily facilitates adding additional data sources to an existing logical grouping without the need to change a formal schema. It is a great tool for simply combining data from many different incompatible database sources, addressing the ‘Variety’ aspect of big data. And the additional level of indexing makes partial record updating efficient so DODBs do well at OLTP (on line transaction processing) applications.
Advantages
Disadvantages
Particular Opportunities and Project Characteristics
Representative Vendors (not a recommendation): MongoDB, CouchBase, RavenDB, MarkLogic Server, and many others.
July 23, 2014
Bill Vorhies, President & Chief Data Scientist – Data-Magnum - © 2014, all rights reserved.
About the author: Bill Vorhies is President & Chief Data Scientist of Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001. He can be reached at:
This original blog can be viewed at:
http://data-magnum.com/lesson-6-document-oriented-databases/
All nine lessons can be downloaded as a White Paper at:
http://data-magnum.com/resources/white-papers/
Posted 1 March 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central