Privacy preserving data visualizations

Avraam, Demetris; Wilson, Rebecca; Butters, Oliver; Burton, Thomas; Nicolaides, Christos; Jones, Elinor; Boyd, Andy; Burton, Paul

dc.contributor.author	Avraam, Demetris
dc.contributor.author	Wilson, Rebecca
dc.contributor.author	Butters, Oliver
dc.contributor.author	Burton, Thomas
dc.contributor.author	Nicolaides, Christos
dc.contributor.author	Jones, Elinor
dc.contributor.author	Boyd, Andy
dc.contributor.author	Burton, Paul
dc.date.accessioned	2021-09-20T17:41:13Z
dc.date.available	2021-09-20T17:41:13Z
dc.date.issued	2021-01-07
dc.identifier.uri	https://hdl.handle.net/1721.1/131977
dc.description.abstract	Abstract Data visualizations are a valuable tool used during both statistical analysis and the interpretation of results as they graphically reveal useful information about the structure, properties and relationships between variables, which may otherwise be concealed in tabulated data. In disciplines like medicine and the social sciences, where collected data include sensitive information about study participants, the sharing and publication of individual-level records is controlled by data protection laws and ethico-legal norms. Thus, as data visualizations – such as graphs and plots – may be linked to other released information and used to identify study participants and their personal attributes, their creation is often prohibited by the terms of data use. These restrictions are enforced to reduce the risk of breaching data subject confidentiality, however they limit analysts from displaying useful descriptive plots for their research features and findings. Here we propose the use of anonymization techniques to generate privacy-preserving visualizations that retain the statistical properties of the underlying data while still adhering to strict data disclosure rules. We demonstrate the use of (i) the well-known k-anonymization process which preserves privacy by reducing the granularity of the data using suppression and generalization, (ii) a novel deterministic approach that replaces individual-level observations with the centroids of each k nearest neighbours, and (iii) a probabilistic procedure that perturbs individual attributes with the addition of random stochastic noise. We apply the proposed methods to generate privacy-preserving data visualizations for exploratory data analysis and inferential regression plot diagnostics, and we discuss their strengths and limitations.	en_US
dc.publisher	Springer Berlin Heidelberg	en_US
dc.relation.isversionof	https://doi.org/10.1140/epjds/s13688-020-00257-4	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Springer Berlin Heidelberg	en_US
dc.title	Privacy preserving data visualizations	en_US
dc.type	Article	en_US
dc.identifier.citation	EPJ Data Science. 2021 Jan 07;10(1):2	en_US
dc.contributor.department	Sloan School of Management
dc.identifier.mitlicense	PUBLISHER_CC
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2021-01-10T04:14:46Z
dc.language.rfc3066	en
dc.rights.holder	The Author(s)
dspace.embargo.terms	N
dspace.date.submission	2021-01-10T04:14:46Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed

Files in this item

Name:: 13688_2020_Article_257.pdf
Size:: 7.136Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record