Machine Learning Methods to Extract Documentation of Breast Cancer Symptoms From Electronic Health Records
Author(s)
Forsyth, Alexander W.; Barzilay, Regina; Hughes, Kevin S.; Lui, Dickson; Lorenz, Karl A.; Enzinger, Andrea; Tulsky, James A.; Lindvall, Charlotta; ... Show more Show less
DownloadAccepted version (1.423Mb)
Terms of use
Metadata
Show full item recordAbstract
© 2018 American Academy of Hospice and Palliative Medicine Context: Clinicians document cancer patients’ symptoms in free-text format within electronic health record visit notes. Although symptoms are critically important to quality of life and often herald clinical status changes, computational methods to assess the trajectory of symptoms over time are woefully underdeveloped. Objectives: To create machine learning algorithms capable of extracting patient-reported symptoms from free-text electronic health record notes. Methods: The data set included 103,564 sentences obtained from the electronic clinical notes of 2695 breast cancer patients receiving paclitaxel-containing chemotherapy at two academic cancer centers between May 1996 and May 2015. We manually annotated 10,000 sentences and trained a conditional random field model to predict words indicating an active symptom (positive label), absence of a symptom (negative label), or no symptom at all (neutral label). Sentences labeled by human coder were divided into training, validation, and test data sets. Final model performance was determined on 20% test data unused in model development or tuning. Results: The final model achieved precision of 0.82, 0.86, and 0.99 and recall of 0.56, 0.69, and 1.00 for positive, negative, and neutral symptom labels, respectively. The most common positive symptoms were pain, fatigue, and nausea. Machine-based labeling of 103,564 sentences took two minutes. Conclusion: We demonstrate the potential of machine learning to gather, track, and analyze symptoms experienced by cancer patients during chemotherapy. Although our initial model requires further optimization to improve the performance, further model building may yield machine learning methods suitable to be deployed in routine clinical care, quality improvement, and research applications.
Date issued
2018-02Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryJournal
Journal of Pain and Symptom Management
Publisher
Elsevier BV
ISSN
0885-3924