Advanced Search

Identification of clinical characteristics of large patient cohorts through analysis of free text physician notes

Research and Teaching Output of the MIT Community

Show simple item record

dc.contributor.advisor Isaac S. Kohane. en_US Turchin, Alexander en_US
dc.contributor.other Harvard University--MIT Division of Health Sciences and Technology. en_US 2006-06-19T17:39:09Z 2006-06-19T17:39:09Z 2005 en_US 2005 en_US
dc.description Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2005. en_US
dc.description Includes bibliographical references (p. 31-33). en_US
dc.description.abstract Background A number of important applications in medicine and biomedical research, including quality of care surveillance and identification of prospective study subjects, require identification of large cohorts of patients with specific clinical characteristics. Currently used conventional techniques are either labor-intensive or imprecise, while natural language processing-based applications are relatively slow and expensive. Specific Aims In this thesis we describe the design and formal evaluation of PACT - a suite of rapid, accurate, and easily portable software tools for identification of patients with specific clinical characteristics through analysis of the text of physician notes in the electronic medical record. Methods PACT algorithm is based on sentence-level semantic analysis. The major steps involve identification of word tags (e.g. name of the disease or medications exclusively used to treat the disease) specific for the clinical characteristics in the sentences of the physician notes. Sentences with word tags and negative qualifiers (e.g. "rule out diabetes") are excluded from consideration. PACT can also identify quantitative (e.g. blood pressure, height, weight) and semi-quantitative (e.g. compliance with medical treatment) clinical characteristics. PACT performance was evaluated against blinded manual chart review (the "gold standard") and currently used computational methods (analysis of billing data). Results Evaluation of PACT demonstrated it to be rapid and highly accurate. PACT processed 6.5 to 8.8x 10⁵ notes/hour (1.0 to 1.4 GB of text / hour). en_US
dc.description.abstract (cont) When compared to the gold standard of manual chart review, PACT sensitivity ranged (depending on the patient characteristic being extracted from the notes) from 74 to 100%, and specificity from 86 to 100%. K statistic for agreement between PACT and manual chart review ranged from 0.67 to 1.0 and in most cases exceeded 0.75, indicating excellent agreement. PACT accuracy substantially exceeded the performance of currently used techniques (billing data analysis). Finally, index of patient non-compliance with physician recommendations computed by PACT was shown to correlate with the frequency of annual Emergency Department visits: patients in the highest quartile for the index of non-compliance had 50% as many annual visits as the patients in the lowest quartile. Conclusion PACT is a rapid, precise and easily portable suite of software tools for extracting focused clinical information out of free text clinical documents. It compares favorably with computation techniques currently available for the purpose (where ones exist). It represents an important advance in the field, and we plan to continue to develop this concept further to improve its performance and functionality. en_US
dc.description.statementofresponsibility by Alexander Turchin. en_US
dc.format.extent 33 p. en_US
dc.format.extent 1929512 bytes
dc.format.extent 1928597 bytes
dc.format.mimetype application/pdf
dc.format.mimetype application/pdf
dc.language.iso eng en_US
dc.publisher Massachusetts Institute of Technology en_US
dc.rights M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. en_US
dc.subject Harvard University--MIT Division of Health Sciences and Technology. en_US
dc.title Identification of clinical characteristics of large patient cohorts through analysis of free text physician notes en_US
dc.type Thesis en_US S.M. en_US
dc.contributor.department Harvard University--MIT Division of Health Sciences and Technology. en_US
dc.identifier.oclc 62172055 en_US

Files in this item

Name Size Format Description
62172055.pdf 1.840Mb PDF Preview, non-printable (open to all)
62172055-MIT.pdf 1.839Mb PDF Full printable version (MIT only)

This item appears in the following Collection(s)

Show simple item record