A data-driven approach to predict daily risk of Clostridium difficile infection at two large academic health centers
Author(s)
Makar, Maggie; Oh, Jeeheh; Fusco, Christopher; Marchesani, Joseph; McCaffrey, Robert; Rao, Krishna; Ryan, Erin E; Washer, Laraine; West, Lauren R; Young, Vincent B; Guttag, John; Hooper, David C; Shenoy, Erica S; Wiens, Jenna; ... Show more Show less
DownloadPublished version (496.7Kb)
Terms of use
Metadata
Show full item recordAbstract
<jats:title>Abstract</jats:title>
<jats:sec>
<jats:title>Background</jats:title>
<jats:p>An estimated 293,300 healthcare-associated cases of Clostridium difficile infection (CDI) occur annually in the United States. Prior research on risk-prediction models for CDI have focused on a small number of risk factors with the goal of developing a model that works well across hospitals. We hypothesize that risk factors are, in part, hospital-specific. We applied a generalizable machine learning approach to discovering, or “learning”, hospital-specific risk-stratification models using electronic health record (EHR) data collected during the course of patient care from the Massachusetts General Hospital (MGH) and the University of Michigan Health System (UM).</jats:p>
</jats:sec>
<jats:sec>
<jats:title>Methods</jats:title>
<jats:p>We utilized EHR data from 115,958 adult inpatient admissions from 2012–2014 (MGH) and 258,050 adult inpatient admissions from 2010–2016 (UM) (Fig 1). We extracted patient demographics, admission details, patient history, and daily hospitalization details, resulting in 2,964 and 4,739 features in the MGH and UM models, respectively. We used L2 regularized logistic regression to learn the models and measured the discriminative performance of the models on a year of held-out data from each hospital.</jats:p>
</jats:sec>
<jats:sec>
<jats:title>Results</jats:title>
<jats:p>The MGH and UM models achieved AUROCs of 0.74 (CI: 0.73–0.75) and 0.77 (CI: 0.75–0.80), respectively. The relative importance of risk factors varied significantly across hospitals. In particular, in-hospital locations appeared in the set of top risk factors at one hospital and in the set of protective factors at the other. On average, both models were able to predict CDI five days in advance of clinical diagnosis (Fig 2).</jats:p>
</jats:sec>
<jats:sec>
<jats:title>Conclusion</jats:title>
<jats:p>We used EHR data to generate a daily estimate of the risk of CDI for each inpatient hospitalization. We applied a generalizable data-driven approach to existing data from two large institutions with different patient populations and different data formats and content. In contrast to approaches that focus on learning models that apply generally across hospitals, our proposed approach yields risk stratification models tailored to an institution’s EHR system and patient population. In turn, these hospital-specific models could allow for earlier and more accurate identification of high-risk patients.</jats:p>
</jats:sec>
<jats:sec>
<jats:title>Disclosures</jats:title>
<jats:p>All authors: No reported disclosures.</jats:p>
</jats:sec>
Date issued
2017Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Open Forum Infectious Diseases
Publisher
Oxford University Press (OUP)