Show simple item record

dc.contributor.authorSimmons, Sean
dc.contributor.authorBerger Leighton, Bonnie
dc.contributor.authorSahinalp, Cenk
dc.date.accessioned2019-11-26T20:52:22Z
dc.date.available2019-11-26T20:52:22Z
dc.date.issued2019
dc.identifier.isbn978-981-3279-81-0
dc.identifier.urihttps://hdl.handle.net/1721.1/123095
dc.description.abstractAs genetic sequencing becomes less expensive and data sets linking genetic data and medical records (e.g., Biobanks) become larger and more common, issues of data privacy and computational challenges become more necessary to address in order to realize the benefits of these datasets. One possibility for alleviating these issues is through the use of already-computed summary statistics (e.g., slopes and standard errors from a regression model of a phenotype on a genotype). If groups share summary statistics from their analyses of biobanks, many of the privacy issues and computational challenges concerning the access of these data could be bypassed. In this paper we explore the possibility of using summary statistics from simple linear models of phenotype on genotype in order to make inferences about more complex phenotypes (those that are derived from two or more simple phenotypes). We provide exact formulas for the slope, intercept, and standard error of the slope for linear regressions when combining phenotypes. Derived equations are validated via simulation and tested on a real data set exploring the genetics of fatty acids. Keywords: privacy; biobank; genetics; genome-wide association study; single nucleotide variant; computational challenges; data security; phenotypesen_US
dc.language.isoen
dc.publisherWorld Scientificen_US
dc.relation.isversionofhttp://dx.doi.org/10.1142/9789813279827_0037en_US
dc.rightsCreative Commons Attribution NonCommercial License 4.0en_US
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/en_US
dc.sourceWorld Scientificen_US
dc.titleProtecting Genomic Data Privacy with Probabilistic Modelingen_US
dc.typeArticleen_US
dc.identifier.citationGasdaska, Angela et al. "Leveraging summary statistics to make inferences about complex phenotypes in large biobanks." Biocomputing 2019, January 2019, Kohala Coast, Hawaii, USA, World Scientific, 2018 © 2018 The Authorsen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mathematicsen_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.relation.journalBiocomputing 2019en_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2019-11-07T18:58:59Z
dspace.date.submission2019-11-07T18:59:04Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record