Scalable methods for storage, processing and analysis of sequencing datasets

Yorukoglu, Deniz

Author(s)

Yorukoglu, Deniz

DownloadFull printable version (17.94Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Bonnie Berger.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Massive amounts of next-generation sequencing (NGS) reads generated from sequencing machines around the world have revolutionized biotechnology enabling wide-scale disease and variation studies, personalized medicine and helping us understand our evolutionary history. However, the amount of sequencing data generated every day increases at an exponential rate posing an imminent need for smart algorithmic solutions to handle massive sequencing datasets and efficiently extract the useful knowledge within them. This thesis consists of four research contributions on these two fronts. First, we present a computational framework that leverages the redundancy within large genomic datasets for performing faster read-mapping while improving sensitivity. Second, we describe a lossy compression method for quality scores within sequencing datasets that strikingly improves the downstream accuracy for genotyping. Third, we introduce a Bayesian framework for accurate diploid and polyploid haplotype reconstruction of an individual genome using NGS datasets. Lastly, we extend this haplotype reconstruction framework to high-throughput transcriptome sequencing datasets.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 179-189).

Date issued

2017

URI

http://hdl.handle.net/1721.1/108991

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses