Design methods for sensitive and comprehensive microbial surveillance
Author(s)Metsky, Hayden C.
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Pardis C. Sabeti.
MetadataShow full item record
We are surrounded by a vast and dynamic microbial world. Effective surveillance tools can benefit medicine and public health, including infectious disease diagnostics, proactive pathogen detection and characterization, and microbiome studies. New genomic technologies are transforming microbial surveillance, but face challenges stemming from low concentrations in collected samples and extensive, ever-changing diversity. In this thesis, we first demonstrate a need for stronger surveillance through mapping the spread of Zika virus during the 2015-16 epidemic. We generate 110 Zika virus genomes from across the Americas, forming the largest and most diverse Zika virus dataset at the time. We perform a Bayesian phylogenetic analysis of Zika's spread and discover that it circulated undetected in multiple regions for many months. Two reasons are that Zika virus is present in samples at ultra-low abundance and was, during its rapid spread, an obscure pathogen.Motivated by this, we develop computational approaches that enable sensitive, comprehensive surveillance. We present CATCH, an algorithm that enhances enrichment of highly diverse whole genomes for more sensitive sequencing. CATCH designs scalable capture probe sets that are comprehensive, to a well-defined extent, against known sequence diversity. We use CATCH to design probes targeting whole genomes of the 356 viral species known to infect humans, including their vast subspecies diversity. Applied to 30 patient and environmental samples, we show that these probes improve hypothesis-free detection of viral infections and considerably enhance genome assembly. Academic labs, research hospitals, and government public health institutes are using CATCH to help detect and characterize microbes. We also present ADAPT, a system for end-to-end sequence design of nucleic acid diagnostic assays.We develop algorithms to comprehensively consider known diversity and enforce high taxon-specificity, even under relaxed criteria arising with RNA binding. Focusing on CRISPR-Cas13 detection, we perform high-throughput screening of crRNA-target pairs and develop a model, applied to our dataset, that predicts detection activity; using this, ADAPT's designs have high predicted activity. Along with CATCH, ADAPT advances microbial surveillance by leveraging and progressing with the extensive, ever-changing landscape of microbial genome diversity.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2020Cataloged from student-submitted PDF of thesis.Includes bibliographical references (pages 169-203).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.