Correlation maps: A compressed access method for exploiting soft functional dependencies

Kimura, Hideaki; Huo, George; Rasin, Alexander; Madden, Samuel; Zdonik, Stanley B.

dc.contributor.author	Kimura, Hideaki
dc.contributor.author	Huo, George
dc.contributor.author	Rasin, Alexander
dc.contributor.author	Zdonik, Stanley B.
dc.contributor.author	Madden, Samuel R.
dc.date.accessioned	2014-09-26T13:17:21Z
dc.date.available	2014-09-26T13:17:21Z
dc.date.issued	2009-08
dc.identifier.issn	21508097
dc.identifier.uri	http://hdl.handle.net/1721.1/90382
dc.description.abstract	In relational query processing, there are generally two choices for access paths when performing a predicate lookup for which no clustered index is available. One option is to use an unclustered index. Another is to perform a complete sequential scan of the table. Many analytical workloads do not benefit from the availability of unclustered indexes; the cost of random disk I/O becomes prohibitive for all but the most selective queries. It has been observed that a secondary index on an unclustered attribute can perform well under certain conditions if the unclustered attribute is correlated with a clustered index attribute [4]. The clustered index will co-locate values and the correlation will localize access through the unclustered attribute to a subset of the pages. In this paper, we show that in a real application (SDSS) and widely used benchmark (TPC-H), there exist many cases of attribute correlation that can be exploited to accelerate queries. We also discuss a tool that can automatically suggest useful pairs of correlated attributes. It does so using an analytical cost model that we developed, which is novel in its awareness of the effects of clustering and correlation. Furthermore, we propose a data structure called a Correlation Map (CM) that expresses the mapping between the correlated attributes, acting much like a secondary index. The paper also discusses how bucketing on the domains of both attributes in the correlated attribute pair can dramatically reduce the size of the CM to be potentially orders of magnitude smaller than that of a secondary B+Tree index. This reduction in size allows us to create a large number of CMs that improve performance for a wide range of queries. The small size also reduces maintenance costs as we demonstrate experimentally.	en_US
dc.language.iso	en_US
dc.publisher	Association for Computing Machinery (ACM)	en_US
dc.relation.isversionof	http://dx.doi.org/10.14778/1687627.1687765	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	Other repository	en_US
dc.title	Correlation maps: A compressed access method for exploiting soft functional dependencies	en_US
dc.type	Article	en_US
dc.identifier.citation	Hideaki Kimura, George Huo, Alexander Rasin, Samuel Madden, and Stanley B. Zdonik. 2009. Correlation maps: a compressed access method for exploiting soft functional dependencies. Proc. VLDB Endow. 2, 1 (August 2009), 1222-1233.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.mitauthor	Madden, Samuel R.	en_US
dc.relation.journal	Proceedings of the VLDB Endowment	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dspace.orderedauthors	Kimura, Hideaki; Huo, George; Rasin, Alexander; Madden, Samuel; Zdonik, Stanley B.	en_US
dc.identifier.orcid	https://orcid.org/0000-0002-7470-3265
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: Madden_Correlation maps.pdf
Size:: 343.0Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record