Performance and error analysis of three part of speech taggers on health texts

Zeng, Qing; Curtis, Dorothy

Author(s)

Zeng, Qing; Curtis, Dorothy

DownloadMIT-CSAIL-TR-2010-012.pdf (186.5Kb)

Other Contributors

Networks & Mobile Systems

Advisor

John Guttag

Terms of use

Creative Commons Attribution 3.0 Unported http://creativecommons.org/licenses/by/3.0/

Metadata

Show full item record

Abstract

Increasingly, natural language processing (NLP) techniques are being developed and utilized in a variety of biomedical domains. Part of speech tagging is a critical step in many NLP applications. Currently, we are developing a NLP tool for text simplification. As part of this effort, we set off to evaluate several part of speech (POS) taggers. We selected 120 sentences (2375 tokens) from a corpus of six types of diabetes-related health texts and asked human reviewers to tag each word in these sentences to create a "Gold Standard." We then tested each of the three POS taggers against the "Gold Standard." One tagger (dTagger) had been trained on health texts and the other two (MaxEnt and Curran & Clark) were trained on general news articles. We analyzed the errors and placed them into five categories: systematic, close, subtle, difficult source, and other. The three taggers have relatively similar rates of success: dTagger, MaxEnt, and Curran & Clark had 87%, 89% and 90% agreement with the gold standard, respectively. These rates of success are lower than published rates for these taggers. This is probably due to our testing them on a corpus that differs significantly from their training corpora. The taggers made different errors: the dTagger, which had been trained on a set of medical texts (MedPost), made fewer errors on medical terms than MaxEnt and Curran & Clark. The latter two taggers performed better on non-medical terms and we found the difference between their performance and that of dTagger was statistically significant. Our findings suggest that the three POS taggers have similar correct tagging rates, though they differ in the types of errors they make. For the task of text simplification, we are inclined to perform additional training of the Curran & Clark tagger with the Medpost corpus because both the fine grained tagging provided by this tool and the correct recognition of medical terms are equally important.

Date issued

2010-02-25

URI

http://hdl.handle.net/1721.1/51833

Series/Report no.

MIT-CSAIL-TR-2010-012

Collections

CSAIL Technical Reports (July 1, 2003 - present)

The following license files are associated with this item:

Creative Commons