An adaptive partitioning scheme for ad-hoc and time-varying database analytics
Author(s)
Shanbhag, Anil (Anil Atmanand)
DownloadFull printable version (769.9Kb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Samuel Madden.
Terms of use
Metadata
Show full item recordAbstract
Data partitioning significantly improves query performance in distributed database systems. A large number of techniques have been proposed to efficiently partition a dataset, often focusing on finding the best partitioning for a particular query workload. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload. Furthermore, workloads change over time as businesses evolve or as analysts gain better understanding of their data. Static workload-based data partitioning techniques are therefore not suitable for such settings. In this thesis, we present Amoeba, an adaptive distributed storage system for data skipping. It does not require an upfront query workload and adapts the data partitioning according to the queries posed by users over time. We present the data structures, partitioning algorithms, and an efficient implementation on top of Apache Spark and HDFS. Our experimental results show that the Amoeba storage system provides improved query performance for ad-hoc workloads, adapts to changes in the query workloads, and converges to a steady state in case of recurring workloads. On a real world workload, Amoeba reduces the total workload runtime by 1.8x compared to Spark with data partitioned and 3.4x compared to unmodified Spark.
Description
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 57-59).
Date issued
2016Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.