MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Machine Learning for Out of Distribution Database Workloads

Author(s)
Negi, Parimarjan
Thumbnail
DownloadThesis PDF (10.71Mb)
Advisor
Alizadeh, Mohammad
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
DBMS query optimizers are designed using several heuristics to make decisions, such as simplifying assumptions in cardinality estimation, or cost model assumptions for predicting query latencies. With the rise of cloud first DBMS architectures, it is now possible to collect massive amounts of data on executed queries. This gives a way to improve the DBMS heuristics using models that utilize this execution history. In particular, such models can be specialized to particular workloads — thus, it may be possible to do much better than average by learning patterns, such as some joins are always unexpectedly slow, or some tables are always much larger than expected. This can be very beneficial for performance, however, deploying ML systems in the real world has a catch: it is hard to avoid Out of Distribution (OoD) scenarios in the real workloads. ML models often fail in surprising ways in OoD scenarios, and this is an active area of research in the broader ML community. In this thesis, we introduce several such OoD scenarios in the context of database workloads, and show that ML models can easily fail catastrophically in such cases. These range from new query patterns, such as a new column, or new join, to execution time variance across different hardware and system loads. In each case, we use database specific knowledge to develop techniques that get us ML models with more reliable and robust performance in OoD setting.
Date issued
2024-02
URI
https://hdl.handle.net/1721.1/153835
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.