MIT Libraries homeMIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Protecting Genomic Data Privacy with Probabilistic Modeling

Author(s)
Simmons, Sean; Berger Leighton, Bonnie; Sahinalp, Cenk
Thumbnail
DownloadPublished version (1.178Mb)
Terms of use
Creative Commons Attribution NonCommercial License 4.0 https://creativecommons.org/licenses/by-nc/4.0/
Metadata
Show full item record
Abstract
As genetic sequencing becomes less expensive and data sets linking genetic data and medical records (e.g., Biobanks) become larger and more common, issues of data privacy and computational challenges become more necessary to address in order to realize the benefits of these datasets. One possibility for alleviating these issues is through the use of already-computed summary statistics (e.g., slopes and standard errors from a regression model of a phenotype on a genotype). If groups share summary statistics from their analyses of biobanks, many of the privacy issues and computational challenges concerning the access of these data could be bypassed. In this paper we explore the possibility of using summary statistics from simple linear models of phenotype on genotype in order to make inferences about more complex phenotypes (those that are derived from two or more simple phenotypes). We provide exact formulas for the slope, intercept, and standard error of the slope for linear regressions when combining phenotypes. Derived equations are validated via simulation and tested on a real data set exploring the genetics of fatty acids. Keywords: privacy; biobank; genetics; genome-wide association study; single nucleotide variant; computational challenges; data security; phenotypes
Date issued
2019
URI
https://hdl.handle.net/1721.1/123095
Department
Massachusetts Institute of Technology. Department of Mathematics; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Journal
Biocomputing 2019
Publisher
World Scientific
Citation
Gasdaska, Angela et al. "Leveraging summary statistics to make inferences about complex phenotypes in large biobanks." Biocomputing 2019, January 2019, Kohala Coast, Hawaii, USA, World Scientific, 2018 © 2018 The Authors
Version: Final published version
ISBN
978-981-3279-81-0

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries homeMIT Libraries logo

Find us on

Twitter Facebook Instagram YouTube RSS

MIT Libraries navigation

SearchHours & locationsBorrow & requestResearch supportAbout us
PrivacyPermissionsAccessibility
MIT
Massachusetts Institute of Technology
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.