Show simple item record

dc.contributor.advisorPeter Szolovits and Ozlem Uzuner.en_US
dc.contributor.authorTafvizi, Arya (Tafvizi Zavareh)en_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2013-02-13T21:23:53Z
dc.date.available2013-02-13T21:23:53Z
dc.date.copyright2011en_US
dc.date.issued2011en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/76815
dc.descriptionThesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (p. 73-74).en_US
dc.description.abstractIn this thesis, I describe our effort to build an extended and specialized Named Entity Recognizer (NER) to detect instances of Protected Health Information (PHI) in electronic medical records (A de-identifier). The de-identifier was built by creating a comprehensive set of features formed by combining features from the most successful named entity recognizers and de-identifiers and using them in a SVM classifier. We show that the benefit from having an inclusive set of features outweighs the harm from the very large dimensionality of the resulting classification problem. We also show that our classifier does not over-fit the training data. We test whether this approach is more effective than using the NERs separately and combining the results using a committee voting procedure. Finally, we show that our system achieves a precision of up to 1.00, a recall of up to 0.97, and an f-measure of up to 0.98 on a variety of corpora.en_US
dc.description.statementofresponsibilityby Arya Tafvizi.en_US
dc.format.extent74 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleA de-identifier for electronic medical records based on a heterogeneous feature seten_US
dc.title.alternativeDe-identifier for electronic medical recordsen_US
dc.typeThesisen_US
dc.description.degreeM.Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc825555398en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record