Show simple item record

dc.contributor.authorZhang, Yichi
dc.contributor.authorCai, Tianrun
dc.contributor.authorYu, Sheng
dc.contributor.authorCho, Kelly
dc.contributor.authorHong, Chuan
dc.contributor.authorSun, Jiehuan
dc.contributor.authorHuang, Jie
dc.contributor.authorHo, Yuk-Lam
dc.contributor.authorAnanthakrishnan, Ashwin N
dc.contributor.authorXia, Zongqi
dc.contributor.authorShaw, Stanley Y
dc.contributor.authorGainer, Vivian
dc.contributor.authorCastro, Victor
dc.contributor.authorLink, Nicholas
dc.contributor.authorHonerlaw, Jacqueline
dc.contributor.authorHuang, Sicong
dc.contributor.authorGagnon, David
dc.contributor.authorKarlson, Elizabeth W
dc.contributor.authorPlenge, Robert M
dc.contributor.authorSzolovits, Peter
dc.contributor.authorSavova, Guergana
dc.contributor.authorChurchill, Susanne
dc.contributor.authorO’Donnell, Christopher
dc.contributor.authorMurphy, Shawn N
dc.contributor.authorGaziano, J Michael
dc.contributor.authorKohane, Isaac
dc.contributor.authorCai, Tianxi
dc.contributor.authorLiao, Katherine P
dc.date.accessioned2021-10-27T20:09:22Z
dc.date.available2021-10-27T20:09:22Z
dc.date.issued2019
dc.identifier.urihttps://hdl.handle.net/1721.1/134827
dc.description.abstract© 2019, The Author(s), under exclusive licence to Springer Nature Limited. Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1–2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).
dc.language.isoen
dc.publisherSpringer Science and Business Media LLC
dc.relation.isversionof10.1038/S41596-019-0227-6
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
dc.sourcePMC
dc.titleHigh-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
dc.typeArticle
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.relation.journalNature Protocols
dc.eprint.versionAuthor's final manuscript
dc.type.urihttp://purl.org/eprint/type/JournalArticle
eprint.statushttp://purl.org/eprint/status/PeerReviewed
dc.date.updated2021-03-26T16:44:36Z
dspace.orderedauthorsZhang, Y; Cai, T; Yu, S; Cho, K; Hong, C; Sun, J; Huang, J; Ho, Y-L; Ananthakrishnan, AN; Xia, Z; Shaw, SY; Gainer, V; Castro, V; Link, N; Honerlaw, J; Huang, S; Gagnon, D; Karlson, EW; Plenge, RM; Szolovits, P; Savova, G; Churchill, S; O’Donnell, C; Murphy, SN; Gaziano, JM; Kohane, I; Cai, T; Liao, KP
dspace.date.submission2021-03-26T16:44:37Z
mit.journal.volume14
mit.journal.issue12
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Needed


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record