Show simple item record

dc.contributor.authorShanbhag, Anil
dc.contributor.authorJindal, Alekh
dc.contributor.authorMadden, Samuel
dc.contributor.authorQuiane, Jorge
dc.contributor.authorElmore, Aaron J.
dc.date.accessioned2021-11-09T13:28:47Z
dc.date.available2021-11-09T13:28:47Z
dc.date.issued2017-09-24
dc.identifier.urihttps://hdl.handle.net/1721.1/137858
dc.description.abstract© 2017 Association for Computing Machinery. Data partitioning is crucial to improving query performance and severalworkload-based partitioning techniques have been proposed in database literature. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload a priori. Static workload-based data partitioning techniques are therefore not suitable for such settings. In this paper, we propose Amoeba, a distributed storage system that uses adaptive multi-attribute data partitioning to efficiently support ad-hoc as well as recurring queries. Amoeba requires zero set-up and tuning effort, allowing analysts to get the benefits of partitioning without requiring an upfront query workload. The key idea is to build and maintain a partitioning tree on top of the dataset. The partitioning tree allows us to answer queries with predicates by reading a subset of the data. The initial partitioning tree is created without requiring an upfront query workload and Amoeba adapts it over time by incrementally modifying subtrees based on user queries using repartitioning. A prototype of Amoeba running on top of Apache Spark improves query performance by up to 7x over full scans and up to 2x over range-based partitioning techniques on TPC-H as well as a real-world workload.en_US
dc.language.isoen
dc.publisherACMen_US
dc.relation.isversionof10.1145/3127479.3131613en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcewebsiteen_US
dc.titleA robust partitioning scheme for ad-hoc query workloadsen_US
dc.typeArticleen_US
dc.identifier.citationShanbhag, Anil, Jindal, Alekh, Madden, Samuel, Quiane, Jorge and Elmore, Aaron J. 2017. "A robust partitioning scheme for ad-hoc query workloads."
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2019-06-18T13:56:03Z
dspace.date.submission2019-06-18T13:56:05Z
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record