Show simple item record

dc.contributor.advisorKraska, Tim
dc.contributor.authorYang, Adela
dc.date.accessioned2022-01-14T14:55:00Z
dc.date.available2022-01-14T14:55:00Z
dc.date.issued2021-06
dc.date.submitted2021-06-17T20:15:00.690Z
dc.identifier.urihttps://hdl.handle.net/1721.1/139178
dc.description.abstractLookup of strings into in-memory database indexes is a problem with different considerations from those using integer keys. With their variable sizes, efficiently inserting strings into indexes should account for properties specific to strings. We investigate learning alternate schemes for encoding and inserting strings into index structures such as the adaptive radix tree (ART) and their impact on memory and lookup performance. In this thesis, we examine three different properties of string datasets and perform three experiments aimed at taking advantage of these properties. While using a character frequency based encoding was successful in increasing throughput on a theoretical read-heavy workload, it did not preserve lexicographical order and is unlikely to be useful in most workloads. Meanwhile, the experiments that did preserve lexicographical order were unsuccessful in demonstrating space or throughput improvements. We suggest improvements on these approaches for further experimentation.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleAnalysis of Encoding Schemes for String Indexing
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record