My sister's keeper?: genomic research and the identifiability of siblings

Cassa, Christopher A; Schmidt, Brian; Kohane, Isaac S; Mandl, Kenneth D

Author(s)

Cassa, Christopher A.; Schmidt, Brian; Kohane, Isaac; Mandl, Kenneth D.

DownloadCassa-2008-My sister's keeper__.pdf (631.1Kb)

PUBLISHER_CC

Terms of use

Creative Commons Attribution http://creativecommons.org/licenses/by/2.0/

Metadata

Show full item record

Abstract

Background Genomic sequencing of SNPs is increasingly prevalent, though the amount of familial information these data contain has not been quantified. Methods We provide a framework for measuring the risk to siblings of a patient's SNP genotype disclosure, and demonstrate that sibling SNP genotypes can be inferred with substantial accuracy. Results Extending this inference technique, we determine that a very low number of matches at commonly varying SNPs is sufficient to confirm sib-ship, demonstrating that published sequence data can reliably be used to derive sibling identities. Using HapMap trio data, at SNPs where one child is homozygotic major, with a minor allele frequency ≤ 0.20, (N = 452684, 65.1%) we achieve 91.9% inference accuracy for sibling genotypes. Conclusion These findings demonstrate that substantial discrimination and privacy risks arise from use of inferred familial genomic data.

Date issued

2008-07

URI

http://hdl.handle.net/1721.1/52440

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Civil and Environmental Engineering

Journal

BMC Medical Genomics

Publisher

BioMed Central Ltd.

Citation

Cassa, Christopher et al. “My sister's keeper?: genomic research and the identifiability of siblings.” BMC Medical Genomics 1.1 (2008): 32.

Version: Final published version

ISSN

1755-8794

Collections

MIT Open Access Articles

DSpace@MIT