Development of new tools and applications for high-throughput sequencing of microbiomes in environmental or clinical samples
Author(s)Blackburn, Matthew Christopher
Massachusetts Institute of Technology. Dept. of Chemical Engineering.
Eric J. Alm.
MetadataShow full item record
Novel sequencing technologies are rapidly advancing studies of microbial community structure and diversity. Sequencing platforms like the Illumina Genome Analyzer II (GAI1) and the Applied Biosystems SOLiD enable experiments that were previously too expensive or time-consuming by providing a very large number of short reads at a significantly lower cost per base pair (bp) than conventional longer-read systems like the Roche-454 GS FLX pyrosequencing instrument. Short-read platforms, however, are not readily amenable to some applications like metagenomics and metatranscriptomics, and therefore pyrosequencing remains the dominant sequencing technique in these fields. The primary reason short-read technologies have not been used for metagenomic analyses is due to the difficulty of confidently assigning phylogeny or putative gene function to short sequences. In an effort to overcome this limitation, a strategy was developed for preparing libraries from sheared genomic DNA with tunable size distributions using solid phase reversible immobilization (SPRI). This size selection captures DNA fragments of the necessary length to enable the generation of overlapping reads when sequenced from both ends. The lower-quality ends of mated reads were then used to produce a high-quality consensus sequence in the region of overlap. The fraction of composite reads that could be assigned to a taxon was similar to those from 454-FLX, despite the slightly shorter average read length of the composite Illumina reads. This technique successfully demonstrates a practical and economical alternative to 454-FLX for metagenomics. In addition, a scalable, fully automated process for creating sequence-ready, barcoded libraries of 16S rDNA for microbial diversity studies was developed for the Illumina platform. This process will enable sequencing of hundreds of environmental samples on a single Illumina flowcell, greatly decreasing the cost per sample while providing thousands of short-reads for microbial ecology studies. The incorporation of error-correcting, short DNA "barcodes" (also called tags or indexes) during polymerase chain reaction (PCR) amplification of the 16S sequence facilitates sample multiplexing. This process also utilizes the SPRI method to replace column-based reaction clean-ups, enabling the library preparation procedure to be performed almost entirely by a robotic liquid handling workstation. Finally, two unique PCR primer systems (primer-clipping and primer-skipping) were engineered to increase the informative read length of 16S sequence by either cutting the known universal tract out of the final-product to be sequenced, or by omitting sequencing of the universal regions using specially-crafted primers designed to be compatible with Illumina platform conditions. By applying both the overlapping-read technique and multiplexed 16S library preparation workflow, a streamlined approach for efficient gene and species discovery has been assembled to accommodate new metagenomic applications for the Illumina sequencing platform.
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Chemical Engineering, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 61-65).
DepartmentMassachusetts Institute of Technology. Dept. of Chemical Engineering.
Massachusetts Institute of Technology