MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)

Author(s)
Zhang, Yichi; Cai, Tianrun; Yu, Sheng; Cho, Kelly; Hong, Chuan; Sun, Jiehuan; Huang, Jie; Ho, Yuk-Lam; Ananthakrishnan, Ashwin N; Xia, Zongqi; Shaw, Stanley Y; Gainer, Vivian; Castro, Victor; Link, Nicholas; Honerlaw, Jacqueline; Huang, Sicong; Gagnon, David; Karlson, Elizabeth W; Plenge, Robert M; Szolovits, Peter; Savova, Guergana; Churchill, Susanne; O’Donnell, Christopher; Murphy, Shawn N; Gaziano, J Michael; Kohane, Isaac; Cai, Tianxi; Liao, Katherine P; ... Show more Show less
Thumbnail
DownloadAccepted version (949.9Kb)
Publisher Policy

Publisher Policy

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Terms of use
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Metadata
Show full item record
Abstract
© 2019, The Author(s), under exclusive licence to Springer Nature Limited. Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1–2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).
Date issued
2019
URI
https://hdl.handle.net/1721.1/134827
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Journal
Nature Protocols
Publisher
Springer Science and Business Media LLC

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.