Evaluating discrete choice prediction models when the evaluation data is corrupted: analytic results and bias corrections for the area under the ROC

Stein, Roger M.

dc.contributor.author	Stein, Roger Mark
dc.date.accessioned	2017-02-16T20:43:00Z
dc.date.available	2017-02-16T20:43:00Z
dc.date.issued	2015-09
dc.identifier.issn	1384-5810
dc.identifier.issn	1573-756X
dc.identifier.uri	http://hdl.handle.net/1721.1/106979
dc.description.abstract	There has been a growing recognition that issues of data quality, which are routine in practice, can materially affect the assessment of learned model performance. In this paper, we develop some analytic results that are useful in sizing the biases associated with tests of discriminatory model power when these are performed using corrupt (“noisy”) data. As it is sometimes unavoidable to test models with data that are known to be corrupt, we also provide some guidance on interpreting results of such tests. In some cases, with appropriate knowledge of the corruption mechanism, the true values of the performance statistics such as the area under the ROC curve may be recovered (in expectation), even when the underlying data have been corrupted. We also provide estimators of the standard errors of such recovered performance statistics. An analysis of the estimators reveals interesting behavior including the observation that “noisy” data does not “cancel out” across models even when the same corrupt data set is used to test multiple candidate models. Because our results are analytic, they may be applied in a broad range of settings and this can be done without the need for simulation.	en_US
dc.publisher	Springer US	en_US
dc.relation.isversionof	http://dx.doi.org/10.1007/s10618-015-0437-7	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	Springer US	en_US
dc.title	Evaluating discrete choice prediction models when the evaluation data is corrupted: analytic results and bias corrections for the area under the ROC	en_US
dc.type	Article	en_US
dc.identifier.citation	Stein, Roger M. “Evaluating Discrete Choice Prediction Models When the Evaluation Data Is Corrupted: Analytic Results and Bias Corrections for the Area under the ROC.” Data Mining and Knowledge Discovery 30.4 (2016): 763–796.	en_US
dc.contributor.department	Sloan School of Management	en_US
dc.contributor.mitauthor	Stein, Roger Mark
dc.relation.journal	Data Mining and Knowledge Discovery	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2016-06-30T12:07:54Z
dc.language.rfc3066	en
dc.rights.holder	The Author(s)
dspace.orderedauthors	Stein, Roger M.	en_US
dspace.embargo.terms	N	en
mit.license	PUBLISHER_POLICY	en_US

Files in this item

Name:: 10618_2015_437_ReferencePDF.pdf
Size:: 325.2Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record