Integrating Gradient Boosting and Generative Models:
Hybrid Approach to Address Class Imbalance and
Evaluation Gaps in Real-World Systems

Lau, Mary

Author(s)

Lau, Mary

DownloadThesis PDF (2.060Mb)

Advisor

Gupta, Amar

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Anomaly detection remains a persistent challenge in machine learning due to the extreme class imbalance, high cost of false negatives, and the need to regulate false positives in realworld settings at scale. This thesis introduces Tail-end FPR Max Recall, a business-aware evaluation framework designed for such constrained environments. Using this framework, we benchmark LightGBM—a gradient boosting method known for its computational efficiency and predictive accuracy—on an imbalanced dataset, comparing its performance against standard academic evaluation criteria. Our results demonstrate that Tail-end FPR Max Recall fills critical gaps left by standard academic criteria, providing a more realistic assessment of model performance that aims to maximize recall while enforcing a false positive rate budget. Beyond benchmarking, we propose two strategies that incorporate deep learning methods to augment the already strong performance of gradient boosting: (1) using generative models to produce synthetic minority-class samples that outperform traditional oversampling techniques, and (2) using neural embeddings to improve feature representation for anomaly detection. Together, these contributions offer a methodology for evaluating and improving anomaly detection pipelines in domains where rare, high-impact events must be detected while meeting strict operational demands.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/162707

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses