Show simple item record

dc.contributor.advisorSamuel Madden.en_US
dc.contributor.authorHuo, George (George J.)en_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2008-05-19T16:03:12Z
dc.date.available2008-05-19T16:03:12Z
dc.date.copyright2007en_US
dc.date.issued2007en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/41632
dc.descriptionThesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.en_US
dc.descriptionIncludes bibliographical references (p. 69-71).en_US
dc.description.abstractIn relational query processing, one generally chooses between two classes of access paths when performing a predicate lookup for which no clustered index is available. One option is to use an unclustered index. Another is to perform a complete sequential scan of the table. Online analytical processing (OLAP) workloads often do not benefit from the availability of unclustered indices; the cost of random disk I/O becomes prohibitive for all but the most selective queries. Unfortunately, this means that data warehouses and other OLAP systems frequently perform sequential scans, unless they can satisfy nearly all of the queries posed to them by a single clustered index [7], or unless they have available specialized data structures - like bitmap indices, materialized views, or cubes - to answer queries directly. This thesis presents a new index data structure called a correlation index (CI) that enables OLAP databases to answer a wider range of queries from a single clustered index or sorted file. The CI exploits correlations between the key attribute of a clustered index and other unclustered attributes in the table. In order to predict when CIs will exhibit wins over alternative access methods, the thesis describes an analytical cost model that is suitable for integration with existing query optimizers. An implementation compares CI performance against sequential scans and unclustered B+Tree indices in the popular Berkeley DB [22] library. Experimental results over three different data sets validate the accuracy of the cost model and establish numerous cases where CIs accelerate lookup times by 5 to 20 times over both unclustered B+Trees and sequential scans. The strong experimental results suggest that CIs offer practical and substantial benefits in a variety of useful query scenarios.en_US
dc.description.statementofresponsibilityby George Huo.en_US
dc.format.extent71 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleCorrelation indices : a new access method to exploit correlated attributesen_US
dc.typeThesisen_US
dc.description.degreeM.Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.identifier.oclc216929211en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record