Show simple item record

dc.contributor.advisorBonnie Berger.en_US
dc.contributor.authorYorukoglu, Denizen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2017-05-11T19:59:29Z
dc.date.available2017-05-11T19:59:29Z
dc.date.copyright2017en_US
dc.date.issued2017en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/108991
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 179-189).en_US
dc.description.abstractMassive amounts of next-generation sequencing (NGS) reads generated from sequencing machines around the world have revolutionized biotechnology enabling wide-scale disease and variation studies, personalized medicine and helping us understand our evolutionary history. However, the amount of sequencing data generated every day increases at an exponential rate posing an imminent need for smart algorithmic solutions to handle massive sequencing datasets and efficiently extract the useful knowledge within them. This thesis consists of four research contributions on these two fronts. First, we present a computational framework that leverages the redundancy within large genomic datasets for performing faster read-mapping while improving sensitivity. Second, we describe a lossy compression method for quality scores within sequencing datasets that strikingly improves the downstream accuracy for genotyping. Third, we introduce a Bayesian framework for accurate diploid and polyploid haplotype reconstruction of an individual genome using NGS datasets. Lastly, we extend this haplotype reconstruction framework to high-throughput transcriptome sequencing datasets.en_US
dc.description.statementofresponsibilityby Deniz Yorukoglu.en_US
dc.format.extent189 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleScalable methods for storage, processing and analysis of sequencing datasetsen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc986521809en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record