MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Self-Training and Calibration for Learning with Limited Data

Author(s)
Liu, Emma J.
Thumbnail
DownloadThesis PDF (1.947Mb)
Advisor
Wornell, Gregory W.
Sattigeri, Prasanna
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Semi-supervised learning methods such as self-training are able to leverage unlabeled data, which is widely available, as opposed to only using labeled data like many successful supervised learning methods. One part of self-training is to use a trained model to create pseudo-labels for unlabeled data and then select some of those samples to add to the labeled dataset. One way to do this is to pick samples for which the model has high confidence. However, many models are not well-calibrated, which means that the confidence scores do not necessarily align with the expected distribution in the dataset. Thus, the usage of confidence scores in this manner may result in adding more incorrectly labeled samples to the training dataset than expected. This thesis explores how the addition of a recalibration step during self-training to adjust the confidence scores before they are used to select samples can improve the results of self-training. Performing experiments on natural language processing data revealed that combining self-training with calibration results in improved accuracy when the initial self-training accuracy is not too high and the amount of labeled data initially used is not too small.
Date issued
2022-05
URI
https://hdl.handle.net/1721.1/144511
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.