Representation learning in multi-dimensional clinical timeseries for risk and event prediction

Ghassemi, Marzyeh

dc.contributor.advisor	Peter Szolovits.	en_US
dc.contributor.author	Ghassemi, Marzyeh	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2017-12-05T16:25:47Z
dc.date.available	2017-12-05T16:25:47Z
dc.date.copyright	2017	en_US
dc.date.issued	2017	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/112389
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.	en_US
dc.description	This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.	en_US
dc.description	Cataloged from student-submitted PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 99-108).	en_US
dc.description.abstract	There are major practical and technical barriers to understanding human health, and therefore a need for methods that thrive on large, complex, noisy data. In this work, we present machine learning methods that distill large amounts of heterogeneous health data into latent state representations. These representations are then used to estimate risks of poor outcomes, and response to intervention in multivariate physiological signals. We evaluate the reduced latent representations by 1) establishing their predictive value in important clinical tasks and 2) showing that the latent space representations themselves provide useful insight into underlying systems. In particular, we focus on case studies that can provide evidence-based risk assessment and forecasting in settings with guidelines that have not traditionally been data-driven. In this thesis we evaluate several methods to create patient representations, and use these features to predict important outcomes. Representation learning can be thought of as a form of phenotype discovery, where we attempt to discover spaces in the new representation that are markers of important events. We argue that these latent representations are useful markers when they 1) create better prediction results on outcomes of interest, and 2) do not duplicate features that are currently known bio-markers. We present four case studies of learning representations, and evaluate the representations on real predictive tasks. First, we create forward-facing prediction models using baseline clinical features, and those from a Latent Dirichlet Allocation (LDA) model trained with clinical progress notes. We then evaluate the per-patient latent state membership to predict mortality in an intensive care setting as time moves forward. Second, we use non-parametric Multi-task Gaussian Process (MTGP) hyper-parameters as latent features to estimate correlations within and between signals in sparse, heterogeneous time series data. We evaluate the hyper-parameters for forecasting missing signals in traumatic brain injury patients, and predicting mortality in intensive care unit patients. Third, we train switching-state autoregressive models (SSAMs) to model the underlying states that emit patient vital signs over time. We evaluate the time-specific latent state distributions as features to predict vasopressor onset and weaning in intensive care unit patients. Finally, we use statistical and symbolic features extracted from wearable ambulatory accelerometers (ACC) mounted to the neck to classify patient pathology, and stratify patients' risk of voice misuse. We evaluate the utility of both statistically generated features and symbolic representations of glottal pulses towards patient classification.	en_US
dc.description.statementofresponsibility	by Marzyeh Ghassemi.	en_US
dc.format.extent	114 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Representation learning in multi-dimensional clinical timeseries for risk and event prediction	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	1012611423	en_US

Files in this item

Name:: 1012611423-MIT.pdf
Size:: 1.926Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record