Stage: Query Execution Time Prediction in Amazon Redshift

Wu, Ziniu; Marcus, Ryan; Liu, Zhengchun; Negi, Parimarjan; Nathan, Vikram; Pfeil, Pascal; Saxena, Gaurav; Rahman, Mohammad; Narayanaswamy, Balakrishnan; Kraska, Tim

dc.contributor.author	Wu, Ziniu
dc.contributor.author	Marcus, Ryan
dc.contributor.author	Liu, Zhengchun
dc.contributor.author	Negi, Parimarjan
dc.contributor.author	Nathan, Vikram
dc.contributor.author	Pfeil, Pascal
dc.contributor.author	Saxena, Gaurav
dc.contributor.author	Rahman, Mohammad
dc.contributor.author	Narayanaswamy, Balakrishnan
dc.contributor.author	Kraska, Tim
dc.date.accessioned	2024-07-23T20:14:08Z
dc.date.available	2024-07-23T20:14:08Z
dc.date.issued	2024-06-09
dc.identifier.isbn	979-8-4007-0422-2
dc.identifier.uri	https://hdl.handle.net/1721.1/155774
dc.description.abstract	Query performance (e.g., execution time) prediction is a critical component of modern DBMSes. As a pioneering cloud data warehouse, Amazon Redshift relies on an accurate execution time prediction for many downstream tasks, ranging from high-level optimizations, such as automatically creating materialized views, to low-level tasks on the critical path of query execution, such as admission, scheduling, and execution resource control. Unfortunately, many existing execution time prediction techniques, including those used in Redshift, suffer from cold start issues, inaccurate estimation, and are not robust against workload/data changes. In this paper, we propose a novel hierarchical execution time predictor: the Stage predictor. The Stage predictor is designed to leverage the unique characteristics and challenges faced by Redshift. The Stage predictor consists of three model states: an execution time cache, a lightweight local model optimized for a specific DB instance with uncertainty measurement, and a complex global model that is transferable across all instances in Redshift. We design a systematic approach to use these models that best leverages optimality (cache), instance-optimization (local model), and transferable knowledge about Redshift (global model). Experimentally, we show that the Stage predictor makes more accurate and robust predictions while maintaining a practical inference latency and memory overhead. Overall, the Stage predictor can improve the average query execution latency by 20% on these instances compared to the prior query performance predictor in Redshift.	en_US
dc.publisher	ACM\|Companion of the 2024 International Conference on Management of Data	en_US
dc.relation.isversionof	10.1145/3626246.3653391	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	Stage: Query Execution Time Prediction in Amazon Redshift	en_US
dc.type	Article	en_US
dc.identifier.citation	Wu, Ziniu, Marcus, Ryan, Liu, Zhengchun, Negi, Parimarjan, Nathan, Vikram et al. 2024. "Stage: Query Execution Time Prediction in Amazon Redshift."
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.mitlicense	PUBLISHER_CC
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2024-07-01T07:54:29Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2024-07-01T07:54:30Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: license_rdf
Size:: 40bytes
Format:: application/rdf+xml

View/Open

Name:: 3626246.3653391.pdf
Size:: 1.974Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record