Mash Screen: high-throughput sequence containment estimation for genome discovery
Author(s)
Ondov, Brian D.; Starrett, Gabriel J.; Sappington, Anna; Kostic, Aleksandra; Koren, Sergey; Buck, Christopher B.; Phillippy, Adam M.; ... Show more Show less
Download13059_2019_Article_1841.pdf (1.492Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.
Date issued
2019-11-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Genome Biology
Publisher
BioMed Central
Citation
Ondov, Brian D. et al. "Mash Screen: high-throughput sequence containment estimation for genome discovery." Genome Biology 20 (Nov. 2019): 232 doi https://doi.org/10.1186/s13059-019-1841-x ©2019 Author(s)
Version: Final published version
ISSN
1474-760X