Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts

Liao, Katherine P.; Ananthakrishnan, Ashwin N.; Kumar, Vishesh; Xia, Zongqi; Cagan, Andrew; Gainer, Vivian S.; Goryachev, Sergey; Chen, Pei; Savova, Guergana K.; Agniel, Denis; Churchill, Susanne; Lee, Jaeyoung; Murphy, Shawn N.; Plenge, Robert M.; Szolovits, Peter; Kohane, Isaac; Shaw, Stanley Y.; Karlson, Elizabeth W.; Cai, Tianxi

dc.contributor.author	Liao, Katherine P.
dc.contributor.author	Ananthakrishnan, Ashwin N.
dc.contributor.author	Kumar, Vishesh
dc.contributor.author	Xia, Zongqi
dc.contributor.author	Cagan, Andrew
dc.contributor.author	Gainer, Vivian S.
dc.contributor.author	Goryachev, Sergey
dc.contributor.author	Chen, Pei
dc.contributor.author	Savova, Guergana K.
dc.contributor.author	Agniel, Denis
dc.contributor.author	Churchill, Susanne
dc.contributor.author	Lee, Jaeyoung
dc.contributor.author	Murphy, Shawn N.
dc.contributor.author	Plenge, Robert M.
dc.contributor.author	Szolovits, Peter
dc.contributor.author	Kohane, Isaac
dc.contributor.author	Shaw, Stanley Y.
dc.contributor.author	Karlson, Elizabeth W.
dc.contributor.author	Cai, Tianxi
dc.date.accessioned	2015-11-10T16:23:06Z
dc.date.available	2015-11-10T16:23:06Z
dc.date.issued	2015-08
dc.date.submitted	2014-09
dc.identifier.issn	1932-6203
dc.identifier.uri	http://hdl.handle.net/1721.1/99879
dc.description.abstract	Background Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. Methods and Results We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. Conclusions We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.	en_US
dc.description.sponsorship	National Institutes of Health (U.S.). Informatics for Integrating Biology and the Bedside Project (U54LM008748)	en_US
dc.language.iso	en_US
dc.publisher	Public Library of Science	en_US
dc.relation.isversionof	http://dx.doi.org/10.1371/journal.pone.0136651	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Public Library of Science	en_US
dc.title	Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts	en_US
dc.type	Article	en_US
dc.identifier.citation	Liao, Katherine P., Ashwin N. Ananthakrishnan, Vishesh Kumar, Zongqi Xia, Andrew Cagan, Vivian S. Gainer, Sergey Goryachev, et al. “Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts.” Edited by Giorgos Bamias. PLOS ONE 10, no. 8 (August 24, 2015): e0136651.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.mitauthor	Szolovits, Peter	en_US
dc.relation.journal	PLOS ONE	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.orderedauthors	Liao, Katherine P.; Ananthakrishnan, Ashwin N.; Kumar, Vishesh; Xia, Zongqi; Cagan, Andrew; Gainer, Vivian S.; Goryachev, Sergey; Chen, Pei; Savova, Guergana K.; Agniel, Denis; Churchill, Susanne; Lee, Jaeyoung; Murphy, Shawn N.; Plenge, Robert M.; Szolovits, Peter; Kohane, Isaac; Shaw, Stanley Y.; Karlson, Elizabeth W.; Cai, Tianxi	en_US
dc.identifier.orcid	https://orcid.org/0000-0001-8411-6403
mit.license	PUBLISHER_CC	en_US
mit.metadata.status	Complete

Files in this item

Name:: Liao-2015-Methods to Develop a.pdf
Size:: 527.0Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record