dc.contributor.author | Berger, Emily | |
dc.contributor.author | Yorukoglu, Deniz | |
dc.contributor.author | Zhang, Lillian | |
dc.contributor.author | Nyquist, Sarah Kate | |
dc.contributor.author | Shalek, Alexander K | |
dc.contributor.author | Kellis, Manolis | |
dc.contributor.author | Numanagic, Ibrahim | |
dc.contributor.author | Berger Leighton, Bonnie | |
dc.date.accessioned | 2022-07-20T16:42:14Z | |
dc.date.available | 2021-09-20T18:21:31Z | |
dc.date.available | 2022-07-20T16:42:14Z | |
dc.date.issued | 2020 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/132259.2 | |
dc.description.abstract | © 2020, The Author(s). Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X’s feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10× faster than other tools. The advantage of HapTree-X’s ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases. | en_US |
dc.language.iso | en | |
dc.publisher | Springer Science and Business Media LLC | en_US |
dc.relation.isversionof | 10.1038/s41467-020-18320-z | en_US |
dc.rights | Creative Commons Attribution 4.0 International license | en_US |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_US |
dc.source | Nature | en_US |
dc.title | Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets | en_US |
dc.type | Article | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Mathematics | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Chemistry | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.relation.journal | Nature Communications | en_US |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dc.date.updated | 2021-01-07T13:54:09Z | |
dspace.orderedauthors | Berger, E; Yorukoglu, D; Zhang, L; Nyquist, SK; Shalek, AK; Kellis, M; Numanagić, I; Berger, B | en_US |
dspace.date.submission | 2021-01-07T13:54:15Z | |
mit.journal.volume | 11 | en_US |
mit.journal.issue | 1 | en_US |
mit.license | PUBLISHER_CC | |
mit.metadata.status | Publication Information Needed | en_US |