High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)

Zhang, Yichi; Cai, Tianrun; Yu, Sheng; Cho, Kelly; Hong, Chuan; Sun, Jiehuan; Huang, Jie; Ho, Yuk-Lam; Ananthakrishnan, Ashwin N; Xia, Zongqi; Shaw, Stanley Y; Gainer, Vivian; Castro, Victor; Link, Nicholas; Honerlaw, Jacqueline; Huang, Sicong; Gagnon, David; Karlson, Elizabeth W; Plenge, Robert M; Szolovits, Peter; Savova, Guergana; Churchill, Susanne; O’Donnell, Christopher; Murphy, Shawn N; Gaziano, J Michael; Kohane, Isaac; Cai, Tianxi; Liao, Katherine P

dc.contributor.author	Zhang, Yichi
dc.contributor.author	Cai, Tianrun
dc.contributor.author	Yu, Sheng
dc.contributor.author	Cho, Kelly
dc.contributor.author	Hong, Chuan
dc.contributor.author	Sun, Jiehuan
dc.contributor.author	Huang, Jie
dc.contributor.author	Ho, Yuk-Lam
dc.contributor.author	Ananthakrishnan, Ashwin N
dc.contributor.author	Xia, Zongqi
dc.contributor.author	Shaw, Stanley Y
dc.contributor.author	Gainer, Vivian
dc.contributor.author	Castro, Victor
dc.contributor.author	Link, Nicholas
dc.contributor.author	Honerlaw, Jacqueline
dc.contributor.author	Huang, Sicong
dc.contributor.author	Gagnon, David
dc.contributor.author	Karlson, Elizabeth W
dc.contributor.author	Plenge, Robert M
dc.contributor.author	Szolovits, Peter
dc.contributor.author	Savova, Guergana
dc.contributor.author	Churchill, Susanne
dc.contributor.author	O’Donnell, Christopher
dc.contributor.author	Murphy, Shawn N
dc.contributor.author	Gaziano, J Michael
dc.contributor.author	Kohane, Isaac
dc.contributor.author	Cai, Tianxi
dc.contributor.author	Liao, Katherine P
dc.date.accessioned	2021-10-27T20:09:22Z
dc.date.available	2021-10-27T20:09:22Z
dc.date.issued	2019
dc.identifier.uri	https://hdl.handle.net/1721.1/134827
dc.description.abstract	© 2019, The Author(s), under exclusive licence to Springer Nature Limited. Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1–2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).
dc.language.iso	en
dc.publisher	Springer Science and Business Media LLC
dc.relation.isversionof	10.1038/S41596-019-0227-6
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
dc.source	PMC
dc.title	High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
dc.type	Article
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.relation.journal	Nature Protocols
dc.eprint.version	Author's final manuscript
dc.type.uri	http://purl.org/eprint/type/JournalArticle
eprint.status	http://purl.org/eprint/status/PeerReviewed
dc.date.updated	2021-03-26T16:44:36Z
dspace.orderedauthors	Zhang, Y; Cai, T; Yu, S; Cho, K; Hong, C; Sun, J; Huang, J; Ho, Y-L; Ananthakrishnan, AN; Xia, Z; Shaw, SY; Gainer, V; Castro, V; Link, N; Honerlaw, J; Huang, S; Gagnon, D; Karlson, EW; Plenge, RM; Szolovits, P; Savova, G; Churchill, S; O’Donnell, C; Murphy, SN; Gaziano, JM; Kohane, I; Cai, T; Liao, KP
dspace.date.submission	2021-03-26T16:44:37Z
mit.journal.volume	14
mit.journal.issue	12
mit.license	PUBLISHER_POLICY
mit.metadata.status	Authority Work and Publication Information Needed

Files in this item

Name:: nihms-1594913.pdf
Size:: 949.9Kb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record