Fast genotyping of known SNPs through approximate

Shajii, Ariya; Yorukoglu, Deniz; William Yu, Yun; Berger, Bonnie

dc.contributor.author	Shajii, Ariya
dc.contributor.author	Yorukoglu, Deniz
dc.contributor.author	Yu, Yun William
dc.contributor.author	Berger Leighton, Bonnie
dc.date.accessioned	2018-05-17T19:13:46Z
dc.date.available	2018-05-17T19:13:46Z
dc.date.issued	2016-08
dc.identifier.issn	1367-4803
dc.identifier.issn	1460-2059
dc.identifier.uri	http://hdl.handle.net/1721.1/115481
dc.description.abstract	Motivation: As the volume of next-generation sequencing (NGS) data increases, faster algorithms become necessary. Although speeding up individual components of a sequence analysis pipeline (e.g. read mapping) can reduce the computational cost of analysis, such approaches do not take full advantage of the particulars of a given problem. One problem of great interest, genotyping a known set of variants (e.g. dbSNP or Affymetrix SNPs), is important for characterization of known genetic traits and causative disease variants within an individual, as well as the initial stage of many ancestral and population genomic pipelines (e.g. GWAS). Results: We introduce lightweight assignment of variant alleles (LAVA), an NGS-based genotyping algorithm for a given set of SNP loci, which takes advantage of the fact that approximate matching of mid-size k-mers (with k = 32) can typically uniquely ide ntify loci in the human genome without full read alignment. LAVA accurately calls the vast majority of SNPs in dbSNP and Affymetrix's Genome-Wide Human SNP Array 6.0 up to about an order of magnitude faster than standard NGS genotyping pipelines. For Affymetrix SNPs, LAVA has significantly higher SNP calling accuracy than existing pipelines while using as low as ∼5 GB of RAM. As such, LAVA represents a scalable computational method for population-level genotyping studies as well as a flexible NGS-based replacement for SNP arrays. Availability and Implementation: LAVA software is available at http://lava.csail.mit.edu.	en_US
dc.publisher	Oxford University Press (OUP)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1093/BIOINFORMATICS/BTW460	en_US
dc.rights	Creative Commons Attribution-NonCommercial 4.0 International	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	en_US
dc.source	Oxford University Press	en_US
dc.title	Fast genotyping of known SNPs through approximate	en_US
dc.type	Article	en_US
dc.identifier.citation	Shajii, Ariya et al. “Fast Genotyping of Known SNPs through Approximatek-Mer Matching.” Bioinformatics 32, 17 (September 2016): i538–i544 © 2016 The Authors	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Mathematics	en_US
dc.contributor.mitauthor	Yorukoglu, Deniz
dc.contributor.mitauthor	Yu, Yun William
dc.contributor.mitauthor	Berger Leighton, Bonnie
dc.relation.journal	Bioinformatics	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2018-05-16T15:37:39Z
dspace.orderedauthors	Shajii, Ariya; Yorukoglu, Deniz; William Yu, Yun; Berger, Bonnie	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0003-2315-0768
dc.identifier.orcid	https://orcid.org/0000-0002-8275-9576
dc.identifier.orcid	https://orcid.org/0000-0002-2724-7228
mit.license	PUBLISHER_CC	en_US

Files in this item

Name:: btw460.pdf
Size:: 2.267Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record