Show simple item record

dc.contributor.authorMcKenna, Aaron
dc.contributor.authorHanna, Matthew
dc.contributor.authorSivachenko, Andrey
dc.contributor.authorCibulskis, Kristian
dc.contributor.authorKernytsky, Andrew
dc.contributor.authorGarimella, Kiran
dc.contributor.authorAltshuler, David
dc.contributor.authorGabriel, Stacey B.
dc.contributor.authorDaly, Mark J.
dc.contributor.authorDePristo, Mark A.
dc.contributor.authorBanks, Eric, 1976-
dc.date.accessioned2014-07-17T14:42:01Z
dc.date.available2014-07-17T14:42:01Z
dc.date.issued2010-07
dc.identifier.issn1088-9051
dc.identifier.urihttp://hdl.handle.net/1721.1/88421
dc.description.abstractNext-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.en_US
dc.description.sponsorshipNational Human Genome Research Institute (U.S.) (Large Scale Sequencing and Analysis of Genomes grant (54 HG003067))en_US
dc.description.sponsorshipNational Human Genome Research Institute (U.S.) (Joint SNP and CNV calling in 1000 Genomes sequence data grant (U01 HG005208))en_US
dc.language.isoen_US
dc.publisherCold Spring Harbor Laboratory Pressen_US
dc.relation.isversionofhttp://dx.doi.org/10.1101/gr.107524.110en_US
dc.rightsCreative Commons Attribution-Noncommericalen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0/en_US
dc.sourceCold Spring Harbor Laboratory Pressen_US
dc.titleThe Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing dataen_US
dc.typeArticleen_US
dc.identifier.citationMcKenna, A., M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, et al. “The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing Data.” Genome Research 20, no. 9 (September 1, 2010): 1297–1303.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Biologyen_US
dc.contributor.mitauthorAltshuler, Daviden_US
dc.relation.journalGenome Researchen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsMcKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; DePristo, M. A.en_US
dc.identifier.orcidhttps://orcid.org/0000-0002-7250-4107
mit.licensePUBLISHER_CCen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record