Login

Computer-assisted de-identification of free-text nursing notes

Show simple item record

dc.contributor.advisor Roger G. Mark. en_US
dc.contributor.author Douglass, Margaret, 1981- en_US
dc.contributor.other Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. en_US
dc.date.accessioned 2006-07-13T15:13:32Z
dc.date.available 2006-07-13T15:13:32Z
dc.date.copyright 2005 en_US
dc.date.issued 2005 en_US
dc.identifier.uri http://hdl.handle.net/1721.1/33299
dc.description Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. en_US
dc.description Includes bibliographical references (leaves 67-70). en_US
dc.description.abstract Medical researchers are legally required to protect patients' privacy by removing personally identifiable information from medical records before sharing the data with other researchers. Different computer-assisted methods are evaluated for removing and replacing protected health information (PHI) from free-text nursing notes collected in the hospital intensive care unit. A semi-automated method was developed to allow clinicians to highlight PHI on the screen of a tablet PC and to compare and combine the selections of different experts reading the same notes. Expert adjudication demonstrated that inter-human variability was high, with few false positives and many false negatives. A preliminary automated de-identification algorithm generated few false negatives but many false positives. A second automated algorithm was developed using the successful portions of the first algorithm and incorporating other heuristic methods to improve overall performance. A large de-identified collection of nursing notes was re-identified with realistic surrogate (but unprotected) dates, serial numbers, names, and phrases to form a "gold standard" reference database of over 2600 notes (approximately 340,000 words) with over 1800 labeled instances of PHI. This gold standard database of nursing notes and the Java source code used to evaluate algorithm performance will be made freely available on the Physionet web site in order to facilitate the development and validation of future de-identification algorithms. en_US
dc.description.provenance Made available in DSpace on 2006-07-13T15:13:32Z (GMT). No. of bitstreams: 2 62279367.pdf: 3923649 bytes, checksum: 5402aba5d3f1c2061055baf03dc98203 (MD5) 62279367-MIT.pdf: 3926254 bytes, checksum: 323a30667b0c84a6b26281024316f442 (MD5) Previous issue date: 2005 en
dc.description.statementofresponsibility by Margaret Douglass. en_US
dc.format.extent 70 leaves en_US
dc.format.extent 3923649 bytes
dc.format.extent 3926254 bytes
dc.format.mimetype application/pdf
dc.format.mimetype application/pdf
dc.language.iso eng en_US
dc.publisher Massachusetts Institute of Technology en_US
dc.rights M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. en_US
dc.rights.uri http://dspace.mit.edu/handle/1721.1/7582
dc.subject Electrical Engineering and Computer Science. en_US
dc.title Computer-assisted de-identification of free-text nursing notes en_US
dc.type Thesis en_US
dc.description.degree M.Eng. en_US
dc.contributor.department Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. en_US
dc.identifier.oclc 62279367 en_US

Files in this item

Files Size Format
Preview, non-printable (open to all) 3.923Mb application/pdf
Full printable version (MIT only) 3.926Mb application/pdf

This item appears in the following Collection(s)

Show simple item record

Search DSpace@MIT


Advanced Search

Browse

My Account

Links