Optimizing Partitioning for Efficient Parallel Reads
Author(s)
Sragow, John
DownloadThesis PDF (1.310Mb)
Advisor
Madden, Samuel
Terms of use
Metadata
Show full item recordAbstract
Modern database management systems spend a significant portion of query execution time scanning data, so minimizing scanning latency is critical to maintaining high performance. As such, databases are partitioned into blocks so that queries can skip irrelevant tuples and avoid scanning the entire database. When this partitioning is optimized to minimize the number of blocks accessed by each query, smaller queries that access very few blocks fail to fully utilize the bandwidth because they cannot take advantage of parallel reading. However, reducing the size of each block in order to increase the number of blocks accessed by smaller queries slows down larger queries by forcing them to increase the number of I/Os they must perform. We propose a novel partitioning scheme that shuffles the row groups of blocks accessed by smaller queries so that they can read fewer tuples from multiple blocks in parallel without increasing the I/O cost of larger queries. Our experiments show that this technique allows smaller queries to be scanned up to twice as fast on larger block sizes as they would on a standard partitioning without significantly slowing down larger queries.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology