The MIT Libraries is completing a major upgrade to DSpace@MIT. Starting May 5 2026, DSpace will remain functional, viewable, searchable, and downloadable, however, you will not be able to edit existing collections or add new material. We are aiming to have full functionality restored by May 18, 2026, but intermittent service interruptions may occur. Please email dspace-lib@mit.edu with any questions. Thank you for your patience as we implement this important upgrade.

Show simple item record

dc.contributor.advisorBerger, Bonnie
dc.contributor.authorSadhuka, Shuvom
dc.date.accessioned2024-05-24T17:59:39Z
dc.date.available2024-05-24T17:59:39Z
dc.date.issued2024-02
dc.date.submitted2024-02-21T17:10:18.745Z
dc.identifier.urihttps://hdl.handle.net/1721.1/155055
dc.description.abstractGene expression data provides molecular insights into the functional impact of genetic variation, for example through expression quantitative trait loci (eQTL). With an improving understanding of the association between genotypes and gene expression comes a greater concern that gene expression profiles could be matched to genotype profiles of the same individuals in another dataset, known as a linking attack. Prior work demonstrating such a risk could analyze only a fraction of eQTLs that are independent of each other due to restrictive model assumptions, leaving the full extent of this risk incompletely understood. To address this challenge, we introduce discriminative sequence model (DSM), a novel probabilistic framework for predicting a sequence of genotypes based on gene expression data. By modeling the joint distribution over all variants in a genomic region, DSM enables an accurate assessment of the power of linking attacks that leverage all known eQTLs with necessary calibration for linkage disequilibrium and redundant predictive signals. We demonstrate improved linking accuracy of DSM compared to two existing approaches on a range of real datasets including up to 22K individuals, suggesting that DSM helps uncover a substantial additional risk overlooked by previous studies. Our work provides a unified framework for assessing the privacy risks of sharing diverse omics datasets beyond transcriptomics.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleA More Holistic Analysis of Privacy Risks in Transcriptomic Datasets
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record