dc.contributor.advisor | Kraska, Tim | |
dc.contributor.author | Cen, Lujing | |
dc.date.accessioned | 2022-01-14T14:55:23Z | |
dc.date.available | 2022-01-14T14:55:23Z | |
dc.date.issued | 2021-06 | |
dc.date.submitted | 2021-06-17T20:12:59.977Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/139184 | |
dc.description.abstract | As the demand for data outpaces diminishing improvements in the hardware used to store and query them, we must find intelligent ways to increase database performance on existing systems. This project is focused on integrating learned encodings into SageDB, a database capable of accelerating queries by analyzing and adapting to different workloads. Encodings improve query performance through lossless compression, thereby reducing I/O time during scans. Different encoding types exhibit different characteristics depending on properties of the underlying data and the hardware on which queries are executed. We implement a variety of common encodings in SageDB and propose a learning-based approach to select the optimal encoding for a given data block by combining block-level statistics with sampling. In addition, we demonstrate how to leverage properties of encoded data along with vectorized processing units in modern CPUs to more efficiently execute queries without the need to decode every value. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | In Copyright - Educational Use Permitted | |
dc.rights | Copyright MIT | |
dc.rights.uri | http://rightsstatements.org/page/InC-EDU/1.0/ | |
dc.title | Learned Encodings in SageDB | |
dc.type | Thesis | |
dc.description.degree | M.Eng. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |