Anomaly Detection in Database Operating System
Author(s)
Xia, Brian
DownloadThesis PDF (734.4Kb)
Advisor
Stonebraker, Michael
Terms of use
Metadata
Show full item recordAbstract
Database Operating System (DBOS) is a new operating system (OS) framework that replaces the traditional file-based system with a high-performance database management system (DBMS). This design choice addresses the needs of a rapidly evolving software and hardware landscape that cannot be met by a traditional, mainstream OS. However, DBOS is a relatively new project under active development, with some missing secondary capabilities. In particular, the provenance capture system has not been fully explored with respect to real-time anomaly detection. To that end, Nectar Network (NN) was developed on top of DBOS as a public web application to generate real-world traffic and provenance data. In this thesis, I present a machine learning (ML) model to label anomalous provenance data captured by the NN, in the form of HTTP logs, in real-time. The model consists of two components: tokenization and classification. In the tokenization step, Byte-level Byte Pair Encoding (BBPE) breaks down the input bytes into token bytes that hold semantic meaning. In the classification step, a Convolutional Neural Network (CNN) takes the token bytes as input and outputs the predicted probability of anomaly. The model achieved strong performance, with a F1 score of 0.99951. Importantly, this work serves as a proof-of-concept for future endeavors to develop real-time security analysis features on top of DBOS systems.
Date issued
2022-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology