MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Scalable methods for storage, processing and analysis of sequencing datasets

Author(s)
Yorukoglu, Deniz
Thumbnail
DownloadFull printable version (17.94Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Bonnie Berger.
Terms of use
MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Massive amounts of next-generation sequencing (NGS) reads generated from sequencing machines around the world have revolutionized biotechnology enabling wide-scale disease and variation studies, personalized medicine and helping us understand our evolutionary history. However, the amount of sequencing data generated every day increases at an exponential rate posing an imminent need for smart algorithmic solutions to handle massive sequencing datasets and efficiently extract the useful knowledge within them. This thesis consists of four research contributions on these two fronts. First, we present a computational framework that leverages the redundancy within large genomic datasets for performing faster read-mapping while improving sensitivity. Second, we describe a lossy compression method for quality scores within sequencing datasets that strikingly improves the downstream accuracy for genotyping. Third, we introduce a Bayesian framework for accurate diploid and polyploid haplotype reconstruction of an individual genome using NGS datasets. Lastly, we extend this haplotype reconstruction framework to high-throughput transcriptome sequencing datasets.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
 
Cataloged from PDF version of thesis.
 
Includes bibliographical references (pages 179-189).
 
Date issued
2017
URI
http://hdl.handle.net/1721.1/108991
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.