Show simple item record

dc.contributor.authorOng, Charlene Jennifer
dc.contributor.authorOrfanoudaki, Agni
dc.contributor.authorZhang, Rebecca
dc.contributor.authorCaprasse, Francois Pierre M.
dc.contributor.authorBertsimas, Dimitris J
dc.date.accessioned2021-03-08T18:31:21Z
dc.date.available2021-03-08T18:31:21Z
dc.date.issued2020-06
dc.date.submitted2019-11
dc.identifier.issn1932-6203
dc.identifier.urihttps://hdl.handle.net/1721.1/130098
dc.description.abstractAccurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in identifying stroke in large datasets, triaging critical clinical reports, and quality improvement efforts. In this study, we developed and report a comprehensive framework studying the performance of simple and complex stroke-specific Natural Language Processing (NLP) and Machine Learning (ML) methods to determine presence, location, and acuity of ischemic stroke from radiographic text. We collected 60,564 Computed Tomography and Magnetic Resonance Imaging Radiology reports from 17,864 patients from two large academic medical centers. We used standard techniques to featurize unstructured text and developed neurovascular specific word GloVe embeddings. We trained various binary classification algorithms to identify stroke presence, location, and acuity using 75% of 1,359 expert-labeled reports. We validated our methods internally on the remaining 25% of reports and externally on 500 radiology reports from an entirely separate academic institution. In our internal population, GloVe word embeddings paired with deep learning (Recurrent Neural Networks) had the best discrimination of all methods for our three tasks (AUCs of 0.96, 0.98, 0.93 respectively). Simpler NLP approaches (Bag of Words) performed best with interpretable algorithms (Logistic Regression) for identifying ischemic stroke (AUC of 0.95), MCA location (AUC 0.96), and acuity (AUC of 0.90). Similarly, GloVe and Recurrent Neural Networks (AUC 0.92, 0.89, 0.93) generalized better in our external test set than BOW and Logistic Regression for stroke presence, location and acuity, respectively (AUC 0.89, 0.86, 0.80). Our study demonstrates a comprehensive assessment of NLP techniques for unstructured radiographic text. Our findings are suggestive that NLP/ML methods can be used to discriminate stroke features from large data cohorts for both clinical and research-related investigations.en_US
dc.language.isoen
dc.publisherPublic Library of Science (PLoS)en_US
dc.relation.isversionof10.1371/journal.pone.0234908en_US
dc.rightsCC0 1.0 Universal (CC0 1.0) Public Domain Dedicationen_US
dc.rights.urihttps://creativecommons.org/publicdomain/zero/1.0/en_US
dc.sourcePLoSen_US
dc.titleMachine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reportsen_US
dc.typeArticleen_US
dc.identifier.citationOng, Charlene Jennifer et al. “Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports.” PLoS ONE, 15, 6 (June 2020): e0234908 © 2020 The Author(s)en_US
dc.contributor.departmentMassachusetts Institute of Technology. Operations Research Centeren_US
dc.contributor.departmentSloan School of Managementen_US
dc.relation.journalPLoS ONEen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2021-02-05T16:41:31Z
dspace.orderedauthorsOng, CJ; Orfanoudaki, A; Zhang, R; Caprasse, FPM; Hutch, M; Ma, L; Fard, D; Balogun, O; Miller, MI; Minnig, M; Saglam, H; Prescott, B; Greer, DM; Smirnakis, S; Bertsimas, Den_US
dspace.date.submission2021-02-05T16:41:40Z
mit.journal.volume15en_US
mit.journal.issue6en_US
mit.licensePUBLISHER_CC
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record