Show simple item record

dc.contributor.authorCassa, Christopher A.
dc.contributor.authorWieland, Shannon Christine
dc.contributor.authorMandl, Kenneth D.
dc.date.accessioned2010-10-06T20:07:40Z
dc.date.available2010-10-06T20:07:40Z
dc.date.issued2008-08
dc.date.submitted2008-04
dc.identifier.issn1476-072X
dc.identifier.urihttp://hdl.handle.net/1721.1/58920
dc.description.abstractBackground: Knowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology. One approach to preserving the privacy of individual-level addresses in a data set is to de-identify the data using a non-deterministic blurring algorithm that shifts the geocoded values. We investigate a vulnerability in this approach which enables an adversary to re-identify individuals using multiple anonymized versions of the original data set. If several such versions are available, each can be used to incrementally refine estimates of the original geocoded location. Results: We produce multiple anonymized data sets using a single set of addresses and then progressively average the anonymized results related to each address, characterizing the steep decline in distance from the re-identified point to the original location, (and the reduction in privacy). With ten anonymized copies of an original data set, we find a substantial decrease in average distance from 0.7 km to 0.2 km between the estimated, re-identified address and the original address. With fifty anonymized copies of an original data set, we find a decrease in average distance from 0.7 km to 0.1 km. Conclusion: We demonstrate that multiple versions of the same data, each anonymized by non-deterministic Gaussian skew, can be used to ascertain original geographic locations. We explore solutions to this problem that include infrastructure to support the safe disclosure of anonymized medical data to prevent inference or re-identification of original address data, and the use of a Markov-process based algorithm to mitigate this risk.en_US
dc.description.sponsorshipNational Institutes of Health. (U.S.). National Library of Medicine (1 R01 LM007677 )en_US
dc.publisherBioMed Central Ltden_US
dc.relation.isversionofhttp://dx.doi.org/10.1186/1476-072X-7-45en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttp://creativecommons.org/licenses/by/2.0en_US
dc.sourceBioMed Central Ltden_US
dc.titleRe-identification of home addresses from spatial locations anonymized by Gaussian skewen_US
dc.typeArticleen_US
dc.identifier.citationInternational Journal of Health Geographics. 2008 Aug 12;7(1):45en_US
dc.contributor.departmentHarvard University--MIT Division of Health Sciences and Technologyen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Civil and Environmental Engineeringen_US
dc.contributor.mitauthorCassa, Christopher A.
dc.contributor.mitauthorWieland, Shannon Christine
dc.contributor.mitauthorMandl, Kenneth D.
dc.relation.journalInternational Journal of Health Geographicsen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2010-09-03T16:14:13Z
dc.language.rfc3066en
dc.rights.holderCassa et al.; licensee BioMed Central Ltd.
dspace.orderedauthorsCassa, Christopher A; Wieland, Shannon C; Mandl, Kenneth Den
mit.licensePUBLISHER_CCen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record