Show simple item record

dc.contributor.authorBerger, Emily
dc.contributor.authorYorukoglu, Deniz
dc.contributor.authorZhang, Lillian
dc.contributor.authorNyquist, Sarah Kate
dc.contributor.authorShalek, Alexander K
dc.contributor.authorKellis, Manolis
dc.contributor.authorNumanagic, Ibrahim
dc.contributor.authorBerger Leighton, Bonnie
dc.date.accessioned2022-07-20T16:42:14Z
dc.date.available2021-09-20T18:21:31Z
dc.date.available2022-07-20T16:42:14Z
dc.date.issued2020
dc.identifier.urihttps://hdl.handle.net/1721.1/132259.2
dc.description.abstract© 2020, The Author(s). Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X’s feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10× faster than other tools. The advantage of HapTree-X’s ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases.en_US
dc.language.isoen
dc.publisherSpringer Science and Business Media LLCen_US
dc.relation.isversionof10.1038/s41467-020-18320-zen_US
dc.rightsCreative Commons Attribution 4.0 International licenseen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceNatureen_US
dc.titleImproved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasetsen_US
dc.typeArticleen_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mathematicsen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Chemistryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.relation.journalNature Communicationsen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2021-01-07T13:54:09Z
dspace.orderedauthorsBerger, E; Yorukoglu, D; Zhang, L; Nyquist, SK; Shalek, AK; Kellis, M; Numanagić, I; Berger, Ben_US
dspace.date.submission2021-01-07T13:54:15Z
mit.journal.volume11en_US
mit.journal.issue1en_US
mit.licensePUBLISHER_CC
mit.metadata.statusPublication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version