dc.contributor.advisor | Peter Szolovits and Ozlem Uzuner. | en_US |
dc.contributor.author | Tafvizi, Arya (Tafvizi Zavareh) | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2013-02-13T21:23:53Z | |
dc.date.available | 2013-02-13T21:23:53Z | |
dc.date.copyright | 2011 | en_US |
dc.date.issued | 2011 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/76815 | |
dc.description | Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011. | en_US |
dc.description | This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. | en_US |
dc.description | Cataloged from PDF version of thesis. | en_US |
dc.description | Includes bibliographical references (p. 73-74). | en_US |
dc.description.abstract | In this thesis, I describe our effort to build an extended and specialized Named Entity Recognizer (NER) to detect instances of Protected Health Information (PHI) in electronic medical records (A de-identifier). The de-identifier was built by creating a comprehensive set of features formed by combining features from the most successful named entity recognizers and de-identifiers and using them in a SVM classifier. We show that the benefit from having an inclusive set of features outweighs the harm from the very large dimensionality of the resulting classification problem. We also show that our classifier does not over-fit the training data. We test whether this approach is more effective than using the NERs separately and combining the results using a committee voting procedure. Finally, we show that our system achieves a precision of up to 1.00, a recall of up to 0.97, and an f-measure of up to 0.98 on a variety of corpora. | en_US |
dc.description.statementofresponsibility | by Arya Tafvizi. | en_US |
dc.format.extent | 74 p. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | M.I.T. theses are protected by
copyright. They may be viewed from this source for any purpose, but
reproduction or distribution in any format is prohibited without written
permission. See provided URL for inquiries about permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | A de-identifier for electronic medical records based on a heterogeneous feature set | en_US |
dc.title.alternative | De-identifier for electronic medical records | en_US |
dc.type | Thesis | en_US |
dc.description.degree | M.Eng. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.identifier.oclc | 825555398 | en_US |