Analysis of Encoding Schemes for String Indexing
Author(s)
Yang, Adela
DownloadThesis PDF (770.5Kb)
Advisor
Kraska, Tim
Terms of use
Metadata
Show full item recordAbstract
Lookup of strings into in-memory database indexes is a problem with different considerations from those using integer keys. With their variable sizes, efficiently inserting strings into indexes should account for properties specific to strings. We investigate learning alternate schemes for encoding and inserting strings into index structures such as the adaptive radix tree (ART) and their impact on memory and lookup performance. In this thesis, we examine three different properties of string datasets and perform three experiments aimed at taking advantage of these properties. While using a character frequency based encoding was successful in increasing throughput on a theoretical read-heavy workload, it did not preserve lexicographical order and is unlikely to be useful in most workloads. Meanwhile, the experiments that did preserve lexicographical order were unsuccessful in demonstrating space or throughput improvements. We suggest improvements on these approaches for further experimentation.
Date issued
2021-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology