Show simple item record

dc.contributor.authorMahdavi, Mohammad
dc.contributor.authorAbedjan, Ziawasch
dc.contributor.authorCastro Fernandez, Raul
dc.contributor.authorMadden, Samuel
dc.contributor.authorOuzzani, Mourad
dc.contributor.authorStonebraker, Michael
dc.contributor.authorTang, Nan
dc.date.accessioned2021-11-05T15:22:54Z
dc.date.available2021-11-05T15:22:54Z
dc.date.issued2019
dc.identifier.urihttps://hdl.handle.net/1721.1/137524
dc.description.abstract© 2019 Association for Computing Machinery. Detecting erroneous values is a key step in data cleaning. Error detection algorithms usually require a user to provide input configurations in the form of rules or statistical parameters. However, providing a complete, yet correct, set of configurations for each new dataset is not trivial, as the user has to know about both the dataset and the error detection algorithms upfront. In this paper, we present Raha, a new configuration-free error detection system. By generating a limited number of configurations for error detection algorithms that cover various types of data errors, we can generate an expressive feature vector for each tuple value. Leveraging these feature vectors, we propose a novel sampling and classification scheme that effectively chooses the most representative values for training. Furthermore, our system can exploit historical data to filter out irrelevant error detection algorithms and configurations. In our experiments, Raha outperforms the state-of-the-art error detection techniques with no more than 20 labeled tuples on each dataset.en_US
dc.language.isoen
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.relation.isversionof10.1145/3299869.3324956en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceMIT web domainen_US
dc.titleRaha: A Configuration-Free Error Detection Systemen_US
dc.typeArticleen_US
dc.identifier.citationMahdavi, Mohammad, Abedjan, Ziawasch, Castro Fernandez, Raul, Madden, Samuel, Ouzzani, Mourad et al. 2019. "Raha: A Configuration-Free Error Detection System." Proceedings of the ACM SIGMOD International Conference on Management of Data.
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journalProceedings of the ACM SIGMOD International Conference on Management of Dataen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-01-29T18:48:59Z
dspace.orderedauthorsMahdavi, M; Abedjan, Z; Castro Fernandez, R; Madden, S; Ouzzani, M; Stonebraker, M; Tang, Nen_US
dspace.date.submission2021-01-29T18:49:04Z
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record