dc.contributor.author | Ananthakrishnan, Ashwin N. | |
dc.contributor.author | Cai, Tianxi | |
dc.contributor.author | Savova, Guergana | |
dc.contributor.author | Cheng, Su-Chun | |
dc.contributor.author | Chen, Pei | |
dc.contributor.author | Perez, Raul Guzman | |
dc.contributor.author | Gainer, Vivian | |
dc.contributor.author | Murphy, Shawn N. | |
dc.contributor.author | Szolovits, Peter | |
dc.contributor.author | Xia, Zongqi | |
dc.contributor.author | Shaw, Stanley | |
dc.contributor.author | Churchill, Susanne | |
dc.contributor.author | Karlson, Elizabeth W. | |
dc.contributor.author | Kohane, Isaac | |
dc.contributor.author | Plenge, Robert M. | |
dc.contributor.author | Liao, Katherine P. | |
dc.date.accessioned | 2014-10-10T18:18:27Z | |
dc.date.available | 2014-10-10T18:18:27Z | |
dc.date.issued | 2013-06 | |
dc.identifier.issn | 1078-0998 | |
dc.identifier.issn | 1536-4844 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/90903 | |
dc.description | available in PMC 2014 June 01 | en_US |
dc.description.abstract | Background:
Previous studies identifying patients with inflammatory bowel disease using administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record–based model for classification of inflammatory bowel disease leveraging the combination of codified data and information from clinical text notes using natural language processing.
Methods:
Using the electronic medical records of 2 large academic centers, we created data marts for Crohn’s disease (CD) and ulcerative colitis (UC) comprising patients with ≥1 International Classification of Diseases, 9th edition, code for each disease. We used codified (i.e., International Classification of Diseases, 9th edition codes, electronic prescriptions) and narrative data from clinical notes to develop our classification model. Model development and validation was performed in a training set of 600 randomly selected patients for each disease with medical record review as the gold standard. Logistic regression with the adaptive LASSO penalty was used to select informative variables.
Results:
We confirmed 399 CD cases (67%) in the CD training set and 378 UC cases (63%) in the UC training set. For both, a combined model including narrative and codified data had better accuracy (area under the curve for CD 0.95; UC 0.94) than models using only disease International Classification of Diseases, 9th edition codes (area under the curve 0.89 for CD; 0.86 for UC). Addition of natural language processing narrative terms to our final model resulted in classification of 6% to 12% more subjects with the same accuracy.
Conclusions:
Inclusion of narrative concepts identified using natural language processing improves the accuracy of electronic medical records case definition for CD and UC while simultaneously identifying more subjects compared with models using codified data alone. | en_US |
dc.description.sponsorship | National Institutes of Health (U.S.) (NIH U54-LM008748) | en_US |
dc.description.sponsorship | American Gastroenterological Association | en_US |
dc.description.sponsorship | National Institutes of Health (U.S.) (NIH K08 AR060257) | en_US |
dc.description.sponsorship | Beth Isreal Deaconess Medical Center (Katherine Swan Ginsburg Fund) | en_US |
dc.description.sponsorship | National Institutes of Health (U.S.) (NIH R01-AR056768) | en_US |
dc.description.sponsorship | Burroughs Wellcome Fund (Career Award for Medical Scientists) | en_US |
dc.description.sponsorship | National Institutes of Health (U.S.) (NIH U01-GM092691) | en_US |
dc.description.sponsorship | National Institutes of Health (U.S.) (NIH R01-AR059648) | en_US |
dc.language.iso | en_US | |
dc.publisher | Lippincott Williams & Wilkins | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1097/MIB.0b013e31828133fd | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.source | PMC | en_US |
dc.title | Improving Case Definition of Crohnʼs Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Ananthakrishnan, Ashwin N., Tianxi Cai, Guergana Savova, Su-Chun Cheng, Pei Chen, Raul Guzman Perez, Vivian S. Gainer, et al. “Improving Case Definition of Crohnʼs Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing.” Inflammatory Bowel Diseases 19, no. 7 (2013): 1411–1420. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.contributor.mitauthor | Szolovits, Peter | en_US |
dc.relation.journal | Inflammatory Bowel Diseases | en_US |
dc.eprint.version | Author's final manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dspace.orderedauthors | Ananthakrishnan, Ashwin N.; Cai, Tianxi; Savova, Guergana; Cheng, Su-Chun; Chen, Pei; Perez, Raul Guzman; Gainer, Vivian S.; Murphy, Shawn N.; Szolovits, Peter; Xia, Zongqi; Shaw, Stanley; Churchill, Susanne; Karlson, Elizabeth W.; Kohane, Isaac; Plenge, Robert M.; Liao, Katherine P. | en_US |
dc.identifier.orcid | https://orcid.org/0000-0001-8411-6403 | |
mit.license | OPEN_ACCESS_POLICY | en_US |
mit.metadata.status | Complete | |