Subscribe to DSC Newsletter

Using Amazon Redshift's Interleaved Sort Keys for 35x Faster Queries


Previously, we discussed the role of Amazon Redshift's sort keys and compared how both compound and interleaved keys work in theory. Throughout that post we used some dummy data and a set of Postgres queries in order to explore the Z-order curve and interleaved sorting without getting bogged down in implementation details. In this post, we will explore some of these implementation details, discuss a common tactic that can benefit from using compound and interleaved sort keys together, and run some benchmark queries against a data set with billions of rows. Throughout this post we will link to code that can be used to recreate our results.

The Zone Map: stv_blocklist

In Part I, we discussed how Redshift eschews the B-tree in favor of another secondary data structure called a zone map. The zone maps for each table, as well as additional metadata, is stored in the stv_blocklist system table. If we take a look at the definition of thestv_blocklist table, we can see it has a width of about 100B. This corresponds with our estimates from Part I regarding the small total size of this secondary data structure relative to the tables it describes.

Taking a look at stv_blocklist for a table with a compound key confirms what we've learned about zone maps for compound keys. Below we can see that the zone map forc_custkey, the leading column of the sort key for the orders_compound table, would prove helpful in pruning irrelevant blocks. However, the zone map for any of the other columns in the key would be useless.

The zone map for a compound sort key (code).

If we take a look at the zone map for orders_interleaved_4, which has an interleaved sort key, we can see that the zone maps for all of the sort key columns look pretty useful.

The zone map for an interleaved sort key (code).

However, it's not the minimum and maximum column values that we are seeing here for each block but instead the minimum and maximum sort key values. Continue reading here.

Views: 1529

Tags: AWS, Amazon, Curve, Interleaved, Redshift, Z-Order, business, intelligence, keys, sort


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service