FactorJoin: A New Cardinality Estimation Framework for Join Queries

Wu, Ziniu; Negi, Parimarjan; Alizadeh, Mohammad; Kraska, Tim; Madden, Samuel

dc.contributor.author	Wu, Ziniu
dc.contributor.author	Negi, Parimarjan
dc.contributor.author	Alizadeh, Mohammad
dc.contributor.author	Kraska, Tim
dc.contributor.author	Madden, Samuel
dc.date.accessioned	2023-06-02T14:02:09Z
dc.date.available	2023-06-02T14:02:09Z
dc.date.issued	2023-05-30
dc.identifier.issn	2836-6573
dc.identifier.uri	https://hdl.handle.net/1721.1/150846
dc.description.abstract	Cardinality estimation is one of the most fundamental and challenging problems in query optimization. Neither classical nor learning-based methods yield satisfactory performance when estimating the cardinality of the join queries. They either rely on simplified assumptions leading to ineffective cardinality estimates or build large models to understand the complicated data distributions, leading to long planning times and a lack of generalizability across queries. In this paper, we propose a new framework FactorJoin for estimating join queries. FactorJoin combines the idea behind the classical join-histogram method to efficiently handle joins with the learning-based methods to accurately capture attribute correlation. Specifically, FactorJoin scans every table in a DB and builds single-table conditional distributions during an offline preparation phase. When a join query comes, FactorJoin translates it into a factor graph model over the learned distributions to effectively and efficiently estimate its cardinality. Unlike existing learning-based methods, FactorJoin does not need to de-normalize joins upfront or require executed query workloads to train the model. Since it only relies on single-table statistics, FactorJoin has a small space overhead and is extremely easy to train and maintain. In our evaluation, FactorJoin can produce more effective estimates than the previous state-of-the-art learning-based methods, with 40x less estimation latency, 100x smaller model size, and 100x faster training speed at comparable or better accuracy. In addition, FactorJoin can estimate 10,000 sub-plan queries within one second to optimize the query plan, which is very close to the traditional cardinality estimators in commercial DBMS.	en_US
dc.publisher	ACM	en_US
dc.relation.isversionof	https://doi.org/10.1145/3588721	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	FactorJoin: A New Cardinality Estimation Framework for Join Queries	en_US
dc.type	Article	en_US
dc.identifier.citation	Wu, Ziniu, Negi, Parimarjan, Alizadeh, Mohammad, Kraska, Tim and Madden, Samuel. 2023. "FactorJoin: A New Cardinality Estimation Framework for Join Queries." Proceedings of the ACM on Management of Data, 1 (1).
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journal	Proceedings of the ACM on Management of Data	en_US
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2023-06-01T07:47:14Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2023-06-01T07:47:15Z
mit.journal.volume	1	en_US
mit.journal.issue	1	en_US
mit.license	PUBLISHER_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3588721.pdf
Size:: 1.499Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record