dc.contributor.advisor | Tim Kraska. | en_US |
dc.contributor.author | Vaidya, Kapil Eknath. | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2021-05-24T20:24:01Z | |
dc.date.available | 2021-05-24T20:24:01Z | |
dc.date.copyright | 2021 | en_US |
dc.date.issued | 2021 | en_US |
dc.identifier.uri | https://hdl.handle.net/1721.1/130792 | |
dc.description | Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021 | en_US |
dc.description | Cataloged from the official PDF version of thesis. | en_US |
dc.description | Includes bibliographical references (pages 55-59). | en_US |
dc.description.abstract | Sorting is one of the most fundamental algorithms in Computer Science and a common operation in databases not just for sorting query results but also as part of joins (i.e., sort-merge-join) or indexing. In this work, we introduce a new type of distribution sort that leverages a learned model of the empirical CDF of the data. Our algorithm uses a model to efficiently get an approximation of the scaled empirical CDF for each record key and map it to the corresponding position in the output array. We then apply a deterministic sorting algorithm that works well on nearly-sorted arrays (e.g., Insertion Sort) to establish a totally sorted order. We compared this algorithm against common sorting approaches and measured its performance for up to 1 billion normally-distributed double-precision keys. The results show that our approach yields upto 3.38x performance improvement over C++ STL sort , which is an optimized Quicksort hybrid, 1.49x improvement over sequential Radix Sort, 1.31x over IS⁴o[2] and 5.54x improvement over a C++ implementation of Timsort, which is the default sorting function for Java and Python, over several real-world datasets. While these results are very encouraging, duplicates have a particular negative impact on the sorting performance of Learned Sort, as we show in our experiments. | en_US |
dc.description.statementofresponsibility | by Kapil Eknath Vaidya. | en_US |
dc.format.extent | 59 pages | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | The case for a Learned sorting algorithm | en_US |
dc.type | Thesis | en_US |
dc.description.degree | S.M. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.identifier.oclc | 1252064657 | en_US |
dc.description.collection | S.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science | en_US |
dspace.imported | 2021-05-24T20:24:01Z | en_US |
mit.thesis.degree | Master | en_US |
mit.thesis.department | EECS | en_US |