Show simple item record

dc.contributor.authorDeng, Yuhao
dc.contributor.authorDeng, Qiyan
dc.contributor.authorChai, Chengliang
dc.contributor.authorCao, Lei
dc.contributor.authorTang, Nan
dc.contributor.authorFan, Ju
dc.contributor.authorWang, Jiayi
dc.contributor.authorYuan, Ye
dc.contributor.authorWang, Guoren
dc.date.accessioned2024-07-23T20:22:02Z
dc.date.available2024-07-23T20:22:02Z
dc.date.issued2024-06-09
dc.identifier.isbn979-8-4007-0422-2
dc.identifier.urihttps://hdl.handle.net/1721.1/155776
dc.descriptionSIGMOD-Companion ’24, June 09–15, 2024, Santiago, AA, Chileen_US
dc.description.abstractWhile machine learning techniques, especially deep neural networks, have shown remarkable success in various applications, their performance is adversely affected by label errors in training data. Acquiring high-quality annotated data is both costly and time-consuming in real-world scenarios, requiring extensive human annotation and verification. Consequently, many industry-applied models are trained over data containing substantial noise, significantly degrading the performance of these models. To address this critical issue, we demonstrate IDE, a novel system that iteratively detects mislabeled instances and repairs the wrong labels. Specifically, IDE leverages the early loss observation and influence-based verification to iteratively identify mislabeled instances. When the mislabeled instances are obtained in each iteration, IDE will repair their labels to enhance detection accuracy for subsequent iterations. The framework automatically determines the termination point when the early loss is no longer effective. For uncertain instances, it generates pseudo labels to train a binary classification model, leveraging the model's generalization ability to make the final decision. With a real-life scenario, we demonstrate that IDE produces high-quality training data by effective mislabel detection and repair.en_US
dc.publisherACM|Companion of the 2024 International Conference on Management of Dataen_US
dc.relation.isversionof10.1145/3626246.3654737en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleIDE: A System for Iterative Mislabel Detectionen_US
dc.typeArticleen_US
dc.identifier.citationDeng, Yuhao, Deng, Qiyan, Chai, Chengliang, Cao, Lei, Tang, Nan et al. 2024. "IDE: A System for Iterative Mislabel Detection."
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.identifier.mitlicensePUBLISHER_POLICY
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2024-07-01T07:55:03Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2024-07-01T07:55:03Z
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record