Flow-loss: learning cardinality estimates that matter

Negi, Parimarjan; Marcus, Ryan; Kipf, Andreas; Mao, Hongzi; Tatbul, Nesime; Kraska, Tim; Alizadeh, Mohammad

dc.contributor.author	Negi, Parimarjan
dc.contributor.author	Marcus, Ryan
dc.contributor.author	Kipf, Andreas
dc.contributor.author	Mao, Hongzi
dc.contributor.author	Tatbul, Nesime
dc.contributor.author	Kraska, Tim
dc.contributor.author	Alizadeh, Mohammad
dc.date.accessioned	2022-05-25T15:58:01Z
dc.date.available	2022-05-25T15:58:01Z
dc.date.issued	2021
dc.identifier.uri	https://hdl.handle.net/1721.1/142720
dc.description.abstract	<jats:p> Recently there has been significant interest in using machine learning to improve the accuracy of cardinality estimation. This work has focused on improving average estimation error, but not all estimates matter equally for downstream tasks like query optimization. Since learned models inevitably make mistakes, the goal should be to improve the estimates that make the biggest difference to an optimizer. We introduce a new loss function, Flow-Loss, for learning cardinality estimation models. Flow-Loss approximates the optimizer's cost model and search algorithm with analytical functions, which it uses to optimize explicitly for better query plans. At the heart of Flow-Loss is a reduction of query optimization to a flow routing problem on a certain "plan graph", in which different paths correspond to different query plans. To evaluate our approach, we introduce the Cardinality Estimation Benchmark (CEB) which contains the ground truth cardinalities for sub-plans of over 16 <jats:italic>K</jats:italic> queries from 21 templates with up to 15 joins. We show that across different architectures and databases, a model trained with Flow-Loss improves the plan costs and query runtimes despite having worse estimation accuracy than a model trained with Q-Error. When the test set queries closely match the training queries, models trained with both loss functions perform well. However, the Q-Error-trained model degrades significantly when evaluated on slightly different queries (e.g., similar but unseen query templates), while the Flow-Loss-trained model generalizes better to such situations, achieving 4 -- 8× better 99th percentile runtimes on unseen templates with the same model architecture and training data. </jats:p>	en_US
dc.language.iso	en
dc.publisher	VLDB Endowment	en_US
dc.relation.isversionof	10.14778/3476249.3476259	en_US
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licens	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	en_US
dc.source	ACM	en_US
dc.title	Flow-loss: learning cardinality estimates that matter	en_US
dc.type	Article	en_US
dc.identifier.citation	Negi, Parimarjan, Marcus, Ryan, Kipf, Andreas, Mao, Hongzi, Tatbul, Nesime et al. 2021. "Flow-loss: learning cardinality estimates that matter." Proceedings of the VLDB Endowment, 14 (11).
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.relation.journal	Proceedings of the VLDB Endowment	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2022-05-25T15:52:30Z
dspace.orderedauthors	Negi, P; Marcus, R; Kipf, A; Mao, H; Tatbul, N; Kraska, T; Alizadeh, M	en_US
dspace.date.submission	2022-05-25T15:52:32Z
mit.journal.volume	14	en_US
mit.journal.issue	11	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3476249.3476259.pdf
Size:: 4.733Mb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record