dc.description.abstract | This thesis addresses the challenge of detecting and fixing the errors of a machine learning (ML) model—model debugging.
Current ML models, especially overparametrized deep neural networks (DNNs) trained on crowd-sourced data, easily latch onto spurious signals, underperform for small subgroups, and can be derailed by errors in training labels. Consequently, the ability to detect and fix a model’s mistakes prior to deployment is crucial.
Explainable machine learning approaches, particularly post hoc explanations, have emerged as the defacto ML model debugging tools. A plethora of approaches currently exist, yet it is unclear whether these approaches are effective.
In the first part of this thesis, we introduce a framework to categorize model bugs that can arise as part of the standard supervised learning pipeline. Equipped with the categorization, we assess whether several post hoc model explanation approaches are effective for detecting and fixing the categories of bugs proposed in the framework. We show that current approaches struggle to detect a model’s reliance on spurious signals, are unable to identify training inputs with wrong labels, and provide no direct avenue for fixing model errors. In addition, we demonstrate that practitioners struggle to use these tools to debug ML models in practice.
With the limitations of current approaches established, in the second part of the thesis, we present new tools for model debugging. First, we introduce an approach termed model guiding, which uses an audit set—a small dataset that has been carefully annotated by a task expert—to update a pre-trained ML model’s parameters. We formulate the update as a bilevel optimization problem that requires the updated model to match the expert’s predictions and feature annotations on the audit set. Model guiding can be used to identify and correct mislabelled examples. Similarly, we show that the approach can also remove a model’s reliance on spurious training signals.
The second debugging tool we introduce uses the influence function of an estimator to help identify training points whose labels have a high effect on an ML model’s disparity metric such as group calibration.
Taken together, this thesis makes advances towards better debugging tools for machine learning models. | |