Show simple item record

dc.contributor.authorFriedman, Jonathan
dc.contributor.authorAlm, Eric J.
dc.date.accessioned2013-01-09T21:54:30Z
dc.date.available2013-01-09T21:54:30Z
dc.date.issued2012-09
dc.date.submitted2011-09
dc.identifier.issn1553-734X
dc.identifier.issn1553-7358
dc.identifier.urihttp://hdl.handle.net/1721.1/76233
dc.description.abstractHigh-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity.en_US
dc.description.sponsorshipUnited States. Dept. of Energy (Contract DE-AC02-05CH11231)en_US
dc.language.isoen_US
dc.publisherPublic Library of Scienceen_US
dc.relation.isversionofhttp://dx.doi.org/10.1371/journal.pcbi.1002687en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttp://creativecommons.org/licenses/by/2.5/en_US
dc.sourcePLoSen_US
dc.titleInferring Correlation Networks from Genomic Survey Dataen_US
dc.typeArticleen_US
dc.identifier.citationFriedman, Jonathan, and Eric J. Alm. “Inferring Correlation Networks from Genomic Survey Data.” Ed. Christian von Mering. PLoS Computational Biology 8.9 (2012): e1002687.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computational and Systems Biology Programen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Biological Engineeringen_US
dc.contributor.mitauthorFriedman, Jonathan
dc.contributor.mitauthorAlm, Eric J.
dc.relation.journalPLoS Computational Biologyen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsFriedman, Jonathan; Alm, Eric J.en
dc.identifier.orcidhttps://orcid.org/0000-0001-8294-9364
dc.identifier.orcidhttps://orcid.org/0000-0002-1801-1504
mit.licensePUBLISHER_CCen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record