MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Learned String Index Structures for In-Memory Databases

Author(s)
Spector, Benjamin
Thumbnail
DownloadThesis PDF (968.7Kb)
Advisor
Kraska, Tim
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Within the field of machine learning for systems, learning-based methods have brought new perspective to indexing by reframing it as a cumulative distribution function (CDF) modeling problem. The burgeoning field, despite its nascence, has brought with it many opportunities and efficiencies. However, most work in this area has focused on efficiently indexing numerical keys, as the additional challenges posed by indexing strings have prevented the effective application of these techniques to string domains. We hypothesize that the machine learning approaches which have, in recent years, made significant strides in scalar indexing applications can also be effectively adapted to string applications. First, we introduce the RadixStringSpline (RSS) learned index structure for efficiently indexing strings. RSS is a tree of learned radix splines each indexing a fixed number of bytes. RSS achieves better performance than other structures by first using the minimal string prefix to sufficiently distinguish the data, followed by a contextual learned model to predict its location. Additionally, the bounded-error nature of RSS accelerates the last mile search and also enables a memory-efficient hash-table lookup accelerator. Second, we benchmark RSS against existing algorithms on several real-world string datasets and study its performance in-depth. RSS approaches or exceeds the performance of traditional string indexes while using up to 300× less memory, suggesting this line of research may be promising for future memory-intensive database applications.
Date issued
2022-05
URI
https://hdl.handle.net/1721.1/144902
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.