Show simple item record

dc.contributor.authorNegi, Parimarjan
dc.contributor.authorMarcus, Ryan
dc.contributor.authorKipf, Andreas
dc.contributor.authorMao, Hongzi
dc.contributor.authorTatbul, Nesime
dc.contributor.authorKraska, Tim
dc.contributor.authorAlizadeh, Mohammad
dc.date.accessioned2022-05-25T15:58:01Z
dc.date.available2022-05-25T15:58:01Z
dc.date.issued2021
dc.identifier.urihttps://hdl.handle.net/1721.1/142720
dc.description.abstract<jats:p> Recently there has been significant interest in using machine learning to improve the accuracy of cardinality estimation. This work has focused on improving average estimation error, but not all estimates matter equally for downstream tasks like query optimization. Since learned models inevitably make mistakes, the goal should be to improve the estimates that make the biggest difference to an optimizer. We introduce a new loss function, Flow-Loss, for learning cardinality estimation models. Flow-Loss approximates the optimizer's cost model and search algorithm with analytical functions, which it uses to optimize explicitly for better query plans. At the heart of Flow-Loss is a reduction of query optimization to a flow routing problem on a certain "plan graph", in which different paths correspond to different query plans. To evaluate our approach, we introduce the Cardinality Estimation Benchmark (CEB) which contains the ground truth cardinalities for sub-plans of over 16 <jats:italic>K</jats:italic> queries from 21 templates with up to 15 joins. We show that across different architectures and databases, a model trained with Flow-Loss improves the plan costs and query runtimes despite having worse estimation accuracy than a model trained with Q-Error. When the test set queries closely match the training queries, models trained with both loss functions perform well. However, the Q-Error-trained model degrades significantly when evaluated on slightly different queries (e.g., similar but unseen query templates), while the Flow-Loss-trained model generalizes better to such situations, achieving 4 -- 8× better 99th percentile runtimes on unseen templates with the same model architecture and training data. </jats:p>en_US
dc.language.isoen
dc.publisherVLDB Endowmenten_US
dc.relation.isversionof10.14778/3476249.3476259en_US
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licensen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en_US
dc.sourceACMen_US
dc.titleFlow-loss: learning cardinality estimates that matteren_US
dc.typeArticleen_US
dc.identifier.citationNegi, Parimarjan, Marcus, Ryan, Kipf, Andreas, Mao, Hongzi, Tatbul, Nesime et al. 2021. "Flow-loss: learning cardinality estimates that matter." Proceedings of the VLDB Endowment, 14 (11).
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.relation.journalProceedings of the VLDB Endowmenten_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2022-05-25T15:52:30Z
dspace.orderedauthorsNegi, P; Marcus, R; Kipf, A; Mao, H; Tatbul, N; Kraska, T; Alizadeh, Men_US
dspace.date.submission2022-05-25T15:52:32Z
mit.journal.volume14en_US
mit.journal.issue11en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record