Automated Data Slicing for Model Validation: A Big data - AI Integration Approach

Chung, Yeounoh; kraska, tim; Polyzotis, Neoklis; Tae, Kihyun; Whang, Steven Euijong

Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/132271.2

Show simple item record

dc.contributor.author	Chung, Yeounoh
dc.contributor.author	kraska, tim
dc.contributor.author	Polyzotis, Neoklis
dc.contributor.author	Tae, Kihyun
dc.contributor.author	Whang, Steven Euijong
dc.date.accessioned	2021-09-20T18:21:36Z
dc.date.available	2021-09-20T18:21:36Z
dc.identifier.uri	https://hdl.handle.net/1721.1/132271
dc.description.abstract	© 1989-2012 IEEE. As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models. However, current data tools are still primitive when it comes to helping users trace model performance problems all the way to the data. We focus on the particular problem of slicing data to identify subsets of the validation data where the model performs poorly. This is an important problem in model validation because the overall model performance can fail to reflect that of the smaller subsets, and slicing allows users to analyze the model performance on a more granular-level. Unlike general techniques (e.g., clustering) that can find arbitrary slices, our goal is to find interpretable slices (which are easier to take action compared to arbitrary subsets) that are problematic and large. We propose mathsf{Slice Finder}SliceFinder, which is an interactive framework for identifying such slices using statistical techniques. Applications include diagnosing model fairness and fraud detection, where identifying slices that are interpretable to humans is crucial. This research is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.	en_US
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	10.1109/TKDE.2019.2916074	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	Automated Data Slicing for Model Validation: A Big data - AI Integration Approach	en_US
dc.type	Article	en_US
dc.relation.journal	IEEE Transactions on Knowledge and Data Engineering	en_US
dc.eprint.version	Original manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2021-01-11T15:00:58Z
dspace.orderedauthors	Chung, Y; kraska, T; Polyzotis, N; Tae, K; Whang, SE	en_US
dspace.date.submission	2021-01-11T15:01:01Z
mit.journal.volume	32	en_US
mit.journal.issue	12	en_US
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed

Files in this item

Name:: 1807.06068.pdf
Size:: 869.7Kb
Format:: PDF
Description:: Submitted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record

Version	Item	Date	Summary
2	1721.1/132271.2	2022-07-15T20:17:42Z	Metadata changed: Verified or entered author name and department authority metadata.
1	1721.1/132271*	2021-09-20T18:21:36Z

*Selected version

DSpace@MIT

Notice

Automated Data Slicing for Model Validation: A Big data - AI Integration Approach

Files in this item

This item appears in the following Collection(s)

Version History