Show simple item record

dc.contributor.authorShajii, Ariya
dc.contributor.authorNumanagic, Ibrahim
dc.contributor.authorWhelan, Christopher
dc.contributor.authorBerger Leighton, Bonnie
dc.date.accessioned2019-11-14T19:12:22Z
dc.date.available2019-11-14T19:12:22Z
dc.date.issued2018-08
dc.identifier.issn2405-4712
dc.identifier.urihttps://hdl.handle.net/1721.1/122938
dc.description.abstractSequencing technologies are capturing longer-range genomic information at lower error rates, enabling alignment to genomic regions that are inaccessible with short reads. However, many methods are unable to align reads to much of the genome, recognized as important in disease, and thus report erroneous results in downstream analyses. We introduce EMA, a novel two-tiered statistical binning model for barcoded read alignment, that first probabilistically maps reads to potentially multiple “read clouds” and then within clouds by newly exploiting the non-uniform read densities characteristic of barcoded read sequencing. EMA substantially improves downstream accuracy over existing methods, including phasing and genotyping on 10x data, with fewer false variant calls in nearly half the time. EMA effectively resolves particularly challenging alignments in genomic regions that contain nearby homologous elements, uncovering variants in the pharmacogenomically important CYP2D region, and clinically important genes C4 (schizophrenia) and AMY1A (obesity), which go undetected by existing methods. Our work provides a framework for future generation sequencing. Researchers are applying barcoded read sequencing to capture longer-range information in the genome at low error rates. We introduce a two-tiered statistical binning model, named EMA, which probabilistically assigns reads to “clouds” and then optimizes read assignments within clouds based on read densities. Unlike previous approaches, our efficient method enables alignment to highly homologous regions of the genome important in disease and substantially improves downstream genotyping and haplotyping. Our method also uncovers rare variants in clinically important genes. Keywords: third-generation sequencing; read mapping; barcoded short-reads; linked-readsen_US
dc.description.sponsorshipNational Institutes of Health (U.S.) (Grant GM108348)en_US
dc.language.isoen
dc.publisherElsevier BVen_US
dc.relation.isversionofhttp://dx.doi.org/10.1016/j.cels.2018.07.005en_US
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivs Licenseen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en_US
dc.sourceElsevieren_US
dc.titleStatistical Binning for Barcoded Reads Improves Downstream Analysesen_US
dc.typeArticleen_US
dc.identifier.citationShajii, Ariya et al. "Statistical Binning for Barcoded Reads Improves Downstream Analyses." Cell Systems 7, 2 (2018): 219-226 © 2018 The Author(s)en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mathematicsen_US
dc.relation.journalCell Systemsen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2019-11-07T19:02:19Z
dspace.date.submission2019-11-07T19:02:23Z
mit.journal.volume7en_US
mit.journal.issue2en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record