*Summary:** Graph databases are your go-to choice when a relationship among the data items is key.*

Up to about 1999 web search engines evaluated each web page as a standalone entity, ranking them based on content without regard to any other pages. But in 1999 Google adopted PageRank, a graph-centered approach invented by co-founder Larry Page. PageRank evaluates web pages in relationship to other pages. Users quickly recognized that ranking pages based on their relationship to others resulted in much better recommendations and may be the single factor that moved Google rapidly ahead of its competitors.

We are so used to thinking of databases as tables or at least buckets of information that it can be a little challenging to wrap your head around the concepts of graph databases. That said, Graph DBs can do things that none of the other types of NOSQL or RDBMS DBs can do. Making the effort to understand and utilize this type can offer big returns.

**Characteristics**

There are no classical indexes for Graph DBs. Rather, each object stored is mapped with “nodes” and “edges”. A node is a single record that has at least one and potentially many named properties. Edges define the relationship among nodes and both the nodes and their relationships have some predefined properties. Nodes can have multiple edges defining many different kinds of relationships they have with other nodes. Both nodes and relationships (edges) can be addressed with key values.

Search or query with Graph DBs is called “traversal”. These queries are designed to start at a specific node and explore its relationship with other nodes based on the relationships requested. A common example would be ‘what books are my friends reading that I haven’t yet read’. In this mode, Graph DBs are often associated with ‘recommender’ engines widely used in social and ecommerce applications.

As Graph DBs become more dense, traversal search may require stopping at the same node several times which can slow the search. As a result Graph DBs learn and index these common relationships to speed up search.

**Advantages**

- Extremely fast for connected data. While RDBMS can be made to replicate graphical ones, the extensive use of joins would make the technique quite slow.
- Easy to query.
- Able to quickly handle complex queries involving multiple levels of related data.

**Disadvantages**

- Traditionally Graph DBs have been scaled vertically but not horizontally as searching nodes on different machines would dramatically slow the process. Vendors did not support distribution or sharding. This has made it difficult for Graph DBs to scale beyond a certain size. However, some vendors are challenging in this area.
- Requires a conceptual shift in thinking for developers so some learning curve will be required.

**Particular Opportunities and Project Characteristics**

- Traditionally recommender engines (any ‘recommended for you’ rating) have been based on Graph DBs. Note that some recommenders are now also being built using Column Oriented DBs.
- Use where objects have both dynamic properties and dynamic relationships among objects.
- Applications requiring very deep and complex joins in RDBMS can be replaced by Graph DBs typically with increases in speed greater than 100X.

Some sample use cases:

- Model and store 7 billion people objects and 3 billion non-people objects to provide an earth-view drill down from planet to sidewalk. (Neo4J)
- Tracking food sources from seed to table. (Objectivity, Inc.)
- Ad placement applications (Objectivity, Inc.)
- Network management.
- Genealogy
- Public Transport links, road maps

**Representative Vendors** (not a recommendation): Neo4J, Infinite Graph, InfoGrid, HyperGraphDB, AllegroGraph, BrightstarDB, and many others.

July 23, 2014

Bill Vorhies, President & Chief Data Scientist – Data-Magnum - © 2014, all rights reserved.

About the author: Bill Vorhies is President & Chief Data Scientist of Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001. He can be reached at:

This original blog can be viewed at:

http://data-magnum.com/lesson-8-graph-databases-including-object-dbs/

All nine lessons can be downloaded as a White Paper at:

http://data-magnum.com/resources/white-papers/

Views: 6112

Tags: , HyperGraphDB, AllegroGraph, BrightstarDB, , Infinite Graph, InfoGrid, Neo4J, databases", graph dataases, object

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central