

Sort keys can be set on a single column in a table, or multiple columns (known as a compound sort key). The execution engine can then skip over these blocks, reducing the amount of data read and the number of rows that need to be materialized and processed, which improves the query performance. For example, if a table has a sort key on the column created_date and a query has a filter WHERE created_date BETWEEN '' AND '', the execution engine can identify which blocks don’t contain data for February 1 and 2 given their metadata. When a query filters on a column (or multiple columns), the execution engine can use the metadata to skip blocks that are out of the filter’s range. Amazon Redshift stores the minimum and maximum values of each of its data blocks in metadata. Having table rows sorted improves the performance of queries with range-bound filters. Sort keys determine how rows are physically sorted in a table. If a table’s distribution style is key, then a single column in the table can be set as the DISTKEY.
#Amazon redshift distribution key manual
Finally, I present the results of a test that show ATO improved performance on this benchmark, without requiring any manual tuning. Then I outline the steps to set up and run a test of ATO on the Cloud DW benchmark derived from TPC-H using a 30 TB dataset.
#Amazon redshift distribution key how to
I also explain how ATO works and how to enable and disable it. In this post, I explain what sort and distribution keys are and how they improve query performance. ATO automatically monitors a cluster’s workload and table metadata, runs artificial intelligence algorithms over the observations, and implements sort and distribution keys online in the background, without requiring any manual intervention, and without interrupting any running queries. At AWS re:Invent 2020, Amazon Redshift announced a new feature to automate this process: automatic table optimization (ATO). More recently, Amazon Redshift Advisor provided suggestions, but these still had to be manually implemented. In the past, setting sort and distribution keys was an involved manual process that required a skilled resource to analyze a cluster’s workload and choose and implement the right keys for every table in the data model. In Amazon Redshift, you implement this by setting sort and distribution key table attributes. You do so by sorting table rows and rearranging rows across a cluster’s nodes. Although Amazon Redshift has excellent query performance out of the box, with up to three times better price performance than other cloud data warehouses, you can further improve its performance by physically tuning tables in a data model.

Amazon Redshift is a cloud data warehouse database that provides fast, consistent performance running complex analytical queries on huge datasets scaling into petabytes and even exabytes with Amazon Redshift Spectrum.
