Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer

Hoogendoorn, Mark; Szolovits, Peter; Moons, Leon M.G.; Numans, Mattijs E.

dc.contributor.author	Hoogendoorn, Mark
dc.contributor.author	Moons, Leon M.G.
dc.contributor.author	Numans, Mattijs E.
dc.contributor.author	Szolovits, Peter
dc.date.accessioned	2017-09-07T16:06:17Z
dc.date.available	2017-09-07T16:06:17Z
dc.date.issued	2016-03
dc.date.submitted	2015-11
dc.identifier.issn	0933-3657
dc.identifier.uri	http://hdl.handle.net/1721.1/111149
dc.description.abstract	Objective Machine learning techniques can be used to extract predictive models for diseases from electronic medical records (EMRs). However, the nature of EMRs makes it difficult to apply off-the-shelf machine learning techniques while still exploiting the rich content of the EMRs. In this paper, we explore the usage of a range of natural language processing (NLP) techniques to extract valuable predictors from uncoded consultation notes and study whether they can help to improve predictive performance. Methods We study a number of existing techniques for the extraction of predictors from the consultation notes, namely a bag of words based approach and topic modeling. In addition, we develop a dedicated technique to match the uncoded consultation notes with a medical ontology. We apply these techniques as an extension to an existing pipeline to extract predictors from EMRs. We evaluate them in the context of predictive modeling for colorectal cancer (CRC), a disease known to be difficult to diagnose before performing an endoscopy. Results Our results show that we are able to extract useful information from the consultation notes. The predictive performance of the ontology-based extraction method moves significantly beyond the benchmark of age and gender alone (area under the receiver operating characteristic curve (AUC) of 0.870 versus 0.831). We also observe more accurate predictive models by adding features derived from processing the consultation notes compared to solely using coded data (AUC of 0.896 versus 0.882) although the difference is not significant. The extracted features from the notes are shown be equally predictive (i.e. there is no significant difference in performance) compared to the coded data of the consultations. Conclusion It is possible to extract useful predictors from uncoded consultation notes that improve predictive performance. Techniques linking text to concepts in medical ontologies to derive these predictors are shown to perform best for predicting CRC in our EMR dataset.	en_US
dc.description.sponsorship	National Institutes of Health (U.S.) (Grant R01-EB017205)	en_US
dc.description.sponsorship	National Institutes of Health (U.S.) (Grant 154HG007963)	en_US
dc.language.iso	en_US
dc.publisher	Elsevier	en_US
dc.relation.isversionof	http://dx.doi.org/10.1016/j.artmed.2016.03.003	en_US
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivs License	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	en_US
dc.source	PMC	en_US
dc.title	Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer	en_US
dc.type	Article	en_US
dc.identifier.citation	Hoogendoorn, Mark et al.“Utilizing Uncoded Consultation Notes from Electronic Medical Records for Predictive Modeling of Colorectal Cancer.” Artificial Intelligence in Medicine 69 (May 2016): 53–61 © 2016 Elsevier B.V.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.mitauthor	Szolovits, Peter
dc.relation.journal	Artificial Intelligence in Medicine	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.orderedauthors	Hoogendoorn, Mark; Szolovits, Peter; Moons, Leon M.G.; Numans, Mattijs E.	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0001-8411-6403
mit.license	PUBLISHER_CC	en_US

Files in this item

Name:: Szolovits_Utilizing uncoded.pdf
Size:: 507.3Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record