Correlation indices : a new access method to exploit correlated attributes
Author(s)
Huo, George (George J.)
DownloadFull printable version (3.336Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Samuel Madden.
Terms of use
Metadata
Show full item recordAbstract
In relational query processing, one generally chooses between two classes of access paths when performing a predicate lookup for which no clustered index is available. One option is to use an unclustered index. Another is to perform a complete sequential scan of the table. Online analytical processing (OLAP) workloads often do not benefit from the availability of unclustered indices; the cost of random disk I/O becomes prohibitive for all but the most selective queries. Unfortunately, this means that data warehouses and other OLAP systems frequently perform sequential scans, unless they can satisfy nearly all of the queries posed to them by a single clustered index [7], or unless they have available specialized data structures - like bitmap indices, materialized views, or cubes - to answer queries directly. This thesis presents a new index data structure called a correlation index (CI) that enables OLAP databases to answer a wider range of queries from a single clustered index or sorted file. The CI exploits correlations between the key attribute of a clustered index and other unclustered attributes in the table. In order to predict when CIs will exhibit wins over alternative access methods, the thesis describes an analytical cost model that is suitable for integration with existing query optimizers. An implementation compares CI performance against sequential scans and unclustered B+Tree indices in the popular Berkeley DB [22] library. Experimental results over three different data sets validate the accuracy of the cost model and establish numerous cases where CIs accelerate lookup times by 5 to 20 times over both unclustered B+Trees and sequential scans. The strong experimental results suggest that CIs offer practical and substantial benefits in a variety of useful query scenarios.
Description
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007. Includes bibliographical references (p. 69-71).
Date issued
2007Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.