We live in a data driven world in which we are generating, storing and analyzing more information than ever before, and at an ever-increasing speed.
Traditional “relational” databases – which store information in neat hierarchies of rows and columns aren’t suited for the big, messy datasets harvested from video, audio and even social media data streams that are needed for today’s Big Data projects.
This is where noSQL databases become very handy. These newer database formats are designed to be far more scalable and with less limits on the types of data they can store and retrieve. Designed from the start to handle data distributed across thousands or millions of networked nodes, they are popular with those needing lighting-fast access to huge, messy datasets for real-time analysis.
So here’s my run down, in no particular order, of some of the best known and most widely used noSQL databases available today, and the companies which provide them.
- Oracle NoSQL - Oracle’s relational database system was pioneering and is still widely used for many purposes in businesses of all sizes. Oracle applied its pragmatic business sensibilities when entering the market created by newer upstart competitors. The focus was on reliability and scalability, in order to persuade giants of industry and finance to put their trust in emerging Big Data technologies.
- DynamoDB (Amazon Web Services) - Amazon provides DynamoDB as part of its commercial Amazon Web Services package. Built specifically to handle fast, constantly growing volumes of data in all shapes and sizes, it is provided as a “managed” service and has proven particularly popular with business users needing access to reliable cloud-enabled infrastructure, real-time analytics and scalability.
- Apache HBase (MapR-DB) - As well as their own custom Hadoop installation, MapR Technologies provide their own NoSQL database, which is based on the open source Apache HBase database architecture. It is provided free of charge, in the hope that users will pay for support, or go on to use other paid-for MapR products.
- MongoDB(Mongolab) - Mongo Lab is a commercial, software-as-a-service (SaaS) implementation of MongoDB, one of the most popular open source NoSQL databases. MongoDB is used to store the datasets which make up some of the biggest web businesses such as eBay and Craigslist.
- Google Cloud BigTable - BigTable is Google’s proprietary database system which powers, among many other things, its online services including the web index used by Google search, Google Maps and Gmail. This year it made it available as a cloud-based service called Cloud BigTable, part of the Google Cloud Platform. Google hopes that its obvious track record in handling big data will mean businesses will be ready to put their confidence in the service.
- Redis (Pivotal) - Redis is another open source database infrastructure, with commercial support and installations available through Pivotal’s Big Data Suite. As with other Big Data-as-a-Service (BdaaS) providers, the open source components are essentially provided for free, while the user “rents” storage and processing power to carry out their analytics from Pivotal’s cloud, and pays for any support they require.
- Couchbase - Couchbase is another product which is open source and essentially free to use, but is supported by a commercial entity of the same name offering support and custom installations. It is something of a “jack of all trades” in the NoSQL world, supporting document-based as well as key-value architectures.
- Aerospike - Aerospike has proven popular among companies serving up analytical online advertising, including eBay, due to the high speed its in-memory information reading and writing functions. It is also optimized to work with data stored on modern solid-state drives – although these aren’t that common yet in Big Data systems due to their price, that’s likely to change in the future.
- Apache Cassandra (Cassandra) - Cassandra is another Apache open-source project, which is supplied in a commercial version, along with support and other BdaaS by Datastax. It was originally developed at Facebook before it was open sourced, and it now powers the streaming database of Netflix.
- H Base - From Hadoop; real-time data processing capabilities; can be scaled linearly through the addition of nodes to the setup.
- Neo4j - Full ACID conformity; clustering, replication, caching, online backup, advanced monitoring and high availability are commercially licensed.
- Mark Logic - A strong platform which not only offers regular features of platforms of such nature, but also provides users with the opportunity to retain vast amounts of RDF triples for queries, to provide richer, deeper into datain ways not possible with NoSQL or relational models.
I hope that was useful. Which other ones would you add to the list?
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge