Subscribe to DSC Newsletter

We have collected the best resources from the net comparing the top data warehouse solutions such as Vertica, Aster Data, Greenplum, Netezza, Teradata, HANA, Hbase etc...

Comparison of MPP Data Warehouse Platforms

The industry is moving towards open, commodity solutions Traditional database servers, such as IBM DB2, Oracle Exadata and Microsoft SQL Server, license proprietary software, but run on commodity hardware. Although the nature of SMP architecture typically favors having a few large expensive servers. But the biggest MPP data warehouse vendors all have proprietary software. That’s despite the fact that Netezza and Vertica were on the open source PostgreSQL database. Teradata and Netezza even implement custom hardware, which drives up the price. Hadoop has open sourced the software component leading to a vibrant ecosystem of tools and applications. And with built in redundancy, it’s easy to deploy on cheap commodity servers.

Price/Performance of HANA, Exadata, Teradata, and Greenplum

DataBase

$$/TB

HANA

$200,000

Exadata X3

$66,000

Teradata

$66,000

Greenplum

$30,000

Of these numbers the one that may be the furthest off is the HANA number. This is odd since I work for SAP… but I just could not find a good number so I picked a big number to see how the model came out. Please, for any of these numbers provide a comment and I’ll adjust.

Vertica vs Aster Data vs Greenplum vs Netezza vs Teradata

Pros

Cons

Vertica

1. Cheap - Cheapest of all three appliances we own

2. Columnar - Applies some of the new generation practices and provides full columnar structure

3. Compression - highly compressed database and is automatic in chosing compressing data.

1. Product is not as matured as other appliances. With no GUI tools and very limited work load management options

2. Heavily depends on projections to deliver performance for different scenarios. With multiple analysis view points the number of projections for a table will increase (literally maintained as multiple copies of same data) nullifying the benefits of compression.

Teradata

1. Established and arguably best in the data warehouse appliances.

2. Performance degradation is minimal even with exponential raise of number of users and work load (best for thousands of users).

1. Costly for small organizations.

2. Some of the developers find it difficult to understand the system level functionality of Teradata, but if used with proper usage, Teradata will beat any one in the performance.

Netezza

1. Good for mid ranged organizations or departments in terms of cost.

2. Good performance with full table scans and small number of users.

1. Performance decrease with number of users and concurrent queries increase.

2. Still not having a matured toolset for operating on the database.

Should I Choose Netezza, Teradata or Something Else?

Concurrency is a by-product of performance.  Concurrency is the number of simultaneous database queries running at any one given time in the database.  In this context, the word “query” includes searching, adding, updating or deleting data.   Netezza will never physically run more than 48 queries at a time. Netezza can support up to 2,000 active read-only queries at one time, but at most 48 will be running, and the rest will be queued. The active query is interrupted so other queries can use the CPU. This context switch prevents any one query from monopolizing the CPU, and ensures all queries get a fair share of CPU time.  The limit makes sure Netezza is not wasting time switching between sessions. While there can be up to 2,000 concurrent read-only queries, Netezza has a limit of 64 active add, update or delete queries (anything that might change data).  This usually isn’t an issue in a typical analytics environment where the work of getting data in and out of Netezza is done as quickly as possible and the writers are typically ETL processes.

Teradata vs Netezza vs Hadoop

Each Teradata table chooses a column to be the primary index, and they distribute the data by hashing that key.  This allows Teradata to master 2 extremes.  Parallel processing can analyze petabytes of data, and if you use the primary index in the where clause, it uses the same hashing algorithm to find that data in 1 second.  To eliminate full table scans, Teradata uses secondary indexes, partitioning, columnar design, and in-memory enhancements for performance tuning that make Teradata the most sophisticated data warehouse solution the world has ever seen. Netezza took a different approach.  Instead of a primary index, Netezza has a distribution key which also allows it to master the 2 extremes.  But instead of indexing, Netezza uses a Zone Map and an FPGA Card.  The FPGA Card sits on top of the disk, and before any blocks move, the Zone Map is checked.

Will Hadoop Challenge Greenplum and Netezza?

One place is in the data warehouse market… This view says Hadoop replaces the DBMS for data warehouses. But the very mature BI/DW market requires a high level of operational integrity and Hadoop is not there yet… it is advancing rapidly as an enterprise platform and I believe it will get there… but it will be 3-4 years. This is the thinking I provided here that leads me to draw the picture in Figure.

Wondering About Netezza… and A Teradata Prediction Comes True…

I also have a suspicion that the Netezza architecture, with its execution engine split across two different processors, is just hard to engineer. I cannot think of another reason features come so slowly there. Why, for example, is there no columnar support? Greenplum built it on the same Postgres base with less than a handful of engineers in a year. Teradata now offers columnar tables as well.

These concerns… combined with some previous notes on Netezza add up as follows:

  1. FPGAs no longer provide a performance advantage (per my link above)

  2. FPGAs limit the ability of the DBMS to use more cores (see here)

  3. FPGAs limit the ability of the DBMS to manage workload (see here… and especially the comments)

Views: 11246

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service