MIT Libraries homeMIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Distribution-Based Clustering: Using Ecology To Refine the Operational Taxonomic Unit

Author(s)
Preheim, Sarah Pacocha; Perrotta, Allison Rose; Martin-Platero, Antonio M.; Gupta, Anika; Alm, Eric J.
Thumbnail
DownloadAlm_Distribution-based with figs tables supp info.pdf (2.767Mb)
OPEN_ACCESS_POLICY

Open Access Policy

Creative Commons Attribution-Noncommercial-Share Alike

Terms of use
Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/
Metadata
Show full item record
Abstract
16S rRNA sequencing, commonly used to survey microbial communities, begins by grouping individual reads into operational taxonomic units (OTUs). There are two major challenges in calling OTUs: identifying bacterial population boundaries and differentiating true diversity from sequencing errors. Current approaches to identifying taxonomic groups or eliminating sequencing errors rely on sequence data alone, but both of these activities could be informed by the distribution of sequences across samples. Here, we show that using the distribution of sequences across samples can help identify population boundaries even in noisy sequence data. The logic underlying our approach is that bacteria in different populations will often be highly correlated in their abundance across different samples. Conversely, 16S rRNA sequences derived from the same population, whether slightly different copies in the same organism, variation of the 16S rRNA gene within a population, or sequences generated randomly in error, will have the same underlying distribution across sampled environments. We present a simple OTU-calling algorithm (distribution-based clustering) that uses both genetic distance and the distribution of sequences across samples and demonstrate that it is more accurate than other methods at grouping reads into OTUs in a mock community. Distribution-based clustering also performs well on environmental samples: it is sensitive enough to differentiate between OTUs that differ by a single base pair yet predicts fewer overall OTUs than most other methods. The program can decrease the total number of OTUs with redundant information and improve the power of many downstream analyses to describe biologically relevant trends.
Date issued
2013-08
URI
http://hdl.handle.net/1721.1/91469
Department
Massachusetts Institute of Technology. Department of Biological Engineering; Massachusetts Institute of Technology. Department of Civil and Environmental Engineering
Journal
Applied and Environmental Microbiology
Publisher
American Society for Microbiology
Citation
Preheim, S. P., A. R. Perrotta, A. M. Martin-Platero, A. Gupta, and E. J. Alm. “Distribution-Based Clustering: Using Ecology To Refine the Operational Taxonomic Unit.” Applied and Environmental Microbiology 79, no. 21 (August 23, 2013): 6593–6603.
Version: Author's final manuscript
ISSN
0099-2240

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries homeMIT Libraries logo

Find us on

Twitter Facebook Instagram YouTube RSS

MIT Libraries navigation

SearchHours & locationsBorrow & requestResearch supportAbout us
PrivacyPermissionsAccessibility
MIT
Massachusetts Institute of Technology
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.