Show simple item record

dc.contributor.authorBertsimas, Dimitris
dc.contributor.authorOrfanoudaki, Agni
dc.contributor.authorPawlowski, Colin
dc.date.accessioned2021-09-20T17:41:04Z
dc.date.available2021-09-20T17:41:04Z
dc.date.issued2020-11-10
dc.identifier.urihttps://hdl.handle.net/1721.1/131956
dc.description.abstractAbstract Missing data is a common problem in longitudinal datasets which include multiple instances of the same individual observed at different points in time. We introduce a new approach, MedImpute, for imputing missing clinical covariates in multivariate panel data. This approach integrates patient specific information into an optimization formulation that can be adjusted for different imputation algorithms. We present the formulation for a K-nearest neighbors model and derive a corresponding scalable first-order method med.knn. Our algorithm provides imputations for datasets with both continuous and categorical features and observations occurring at arbitrary points in time. In computational experiments on three real-world clinical datasets, we test its performance on imputation and downstream predictive tasks, varying the percentage of missing data, the number of observations per patient, and the mechanism of missing data. The proposed method improves upon both the imputation accuracy and downstream predictive performance relative to the best of the benchmark imputation methods considered. We show that this edge is consistently present both in longitudinal and electronic health records datasets as well as in binary classification and regression settings. On computational experiments on synthetic data, we test the scalability of this algorithm on large datasets, and we show that an efficient method for hyperparameter tuning scales to datasets with 10,000’s of observations and 100’s of covariates while maintaining high imputation accuracy.en_US
dc.publisherSpringer USen_US
dc.relation.isversionofhttps://doi.org/10.1007/s10994-020-05923-2en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceSpringer USen_US
dc.titleImputation of clinical covariates in time seriesen_US
dc.typeArticleen_US
dc.contributor.departmentMassachusetts Institute of Technology. Operations Research Center
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2021-01-26T04:41:13Z
dc.language.rfc3066en
dc.rights.holderThe Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature
dspace.embargo.termsY
dspace.date.submission2021-01-26T04:41:13Z
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusAuthority Work and Publication Information Needed


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record