Show simple item record

dc.contributor.authorAvraam, Demetris
dc.contributor.authorWilson, Rebecca
dc.contributor.authorButters, Oliver
dc.contributor.authorBurton, Thomas
dc.contributor.authorNicolaides, Christos
dc.contributor.authorJones, Elinor
dc.contributor.authorBoyd, Andy
dc.contributor.authorBurton, Paul
dc.date.accessioned2021-09-20T17:41:13Z
dc.date.available2021-09-20T17:41:13Z
dc.date.issued2021-01-07
dc.identifier.urihttps://hdl.handle.net/1721.1/131977
dc.description.abstractAbstract Data visualizations are a valuable tool used during both statistical analysis and the interpretation of results as they graphically reveal useful information about the structure, properties and relationships between variables, which may otherwise be concealed in tabulated data. In disciplines like medicine and the social sciences, where collected data include sensitive information about study participants, the sharing and publication of individual-level records is controlled by data protection laws and ethico-legal norms. Thus, as data visualizations – such as graphs and plots – may be linked to other released information and used to identify study participants and their personal attributes, their creation is often prohibited by the terms of data use. These restrictions are enforced to reduce the risk of breaching data subject confidentiality, however they limit analysts from displaying useful descriptive plots for their research features and findings. Here we propose the use of anonymization techniques to generate privacy-preserving visualizations that retain the statistical properties of the underlying data while still adhering to strict data disclosure rules. We demonstrate the use of (i) the well-known k-anonymization process which preserves privacy by reducing the granularity of the data using suppression and generalization, (ii) a novel deterministic approach that replaces individual-level observations with the centroids of each k nearest neighbours, and (iii) a probabilistic procedure that perturbs individual attributes with the addition of random stochastic noise. We apply the proposed methods to generate privacy-preserving data visualizations for exploratory data analysis and inferential regression plot diagnostics, and we discuss their strengths and limitations.en_US
dc.publisherSpringer Berlin Heidelbergen_US
dc.relation.isversionofhttps://doi.org/10.1140/epjds/s13688-020-00257-4en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceSpringer Berlin Heidelbergen_US
dc.titlePrivacy preserving data visualizationsen_US
dc.typeArticleen_US
dc.identifier.citationEPJ Data Science. 2021 Jan 07;10(1):2en_US
dc.contributor.departmentSloan School of Management
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2021-01-10T04:14:46Z
dc.language.rfc3066en
dc.rights.holderThe Author(s)
dspace.embargo.termsN
dspace.date.submission2021-01-10T04:14:46Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Needed


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record