| dc.contributor.advisor | Kraska, Tim | |
| dc.contributor.author | Yang, Adela | |
| dc.date.accessioned | 2022-01-14T14:55:00Z | |
| dc.date.available | 2022-01-14T14:55:00Z | |
| dc.date.issued | 2021-06 | |
| dc.date.submitted | 2021-06-17T20:15:00.690Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/139178 | |
| dc.description.abstract | Lookup of strings into in-memory database indexes is a problem with different considerations from those using integer keys. With their variable sizes, efficiently inserting strings into indexes should account for properties specific to strings. We investigate learning alternate schemes for encoding and inserting strings into index structures such as the adaptive radix tree (ART) and their impact on memory and lookup performance. In this thesis, we examine three different properties of string datasets and perform three experiments aimed at taking advantage of these properties. While using a character frequency based encoding was successful in increasing throughput on a theoretical read-heavy workload, it did not preserve lexicographical order and is unlikely to be useful in most workloads. Meanwhile, the experiments that did preserve lexicographical order were unsuccessful in demonstrating space or throughput improvements. We suggest improvements on these approaches for further experimentation. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | In Copyright - Educational Use Permitted | |
| dc.rights | Copyright MIT | |
| dc.rights.uri | http://rightsstatements.org/page/InC-EDU/1.0/ | |
| dc.title | Analysis of Encoding Schemes for String Indexing | |
| dc.type | Thesis | |
| dc.description.degree | M.Eng. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Master | |
| thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |