MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

IDE: A System for Iterative Mislabel Detection

Author(s)
Deng, Yuhao; Deng, Qiyan; Chai, Chengliang; Cao, Lei; Tang, Nan; Fan, Ju; Wang, Jiayi; Yuan, Ye; Wang, Guoren; ... Show more Show less
Thumbnail
Download3626246.3654737.pdf (2.298Mb)
Publisher Policy

Publisher Policy

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Terms of use
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Metadata
Show full item record
Abstract
While machine learning techniques, especially deep neural networks, have shown remarkable success in various applications, their performance is adversely affected by label errors in training data. Acquiring high-quality annotated data is both costly and time-consuming in real-world scenarios, requiring extensive human annotation and verification. Consequently, many industry-applied models are trained over data containing substantial noise, significantly degrading the performance of these models. To address this critical issue, we demonstrate IDE, a novel system that iteratively detects mislabeled instances and repairs the wrong labels. Specifically, IDE leverages the early loss observation and influence-based verification to iteratively identify mislabeled instances. When the mislabeled instances are obtained in each iteration, IDE will repair their labels to enhance detection accuracy for subsequent iterations. The framework automatically determines the termination point when the early loss is no longer effective. For uncertain instances, it generates pseudo labels to train a binary classification model, leveraging the model's generalization ability to make the final decision. With a real-life scenario, we demonstrate that IDE produces high-quality training data by effective mislabel detection and repair.
Description
SIGMOD-Companion ’24, June 09–15, 2024, Santiago, AA, Chile
Date issued
2024-06-09
URI
https://hdl.handle.net/1721.1/155776
Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Publisher
ACM|Companion of the 2024 International Conference on Management of Data
Citation
Deng, Yuhao, Deng, Qiyan, Chai, Chengliang, Cao, Lei, Tang, Nan et al. 2024. "IDE: A System for Iterative Mislabel Detection."
Version: Final published version
ISBN
979-8-4007-0422-2

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.