Show simple item record

dc.contributor.authorSubramanian, Ayshwarya
dc.contributor.authorAlperovich, Mikhail
dc.contributor.authorYang, Yiming
dc.contributor.authorLi, Bo
dc.date.accessioned2023-01-03T13:46:02Z
dc.date.available2023-01-03T13:46:02Z
dc.date.issued2022-12-27
dc.identifier.urihttps://hdl.handle.net/1721.1/146943
dc.description.abstractAbstract Background Quality control (QC) of cells, a critical first step in single-cell RNA sequencing data analysis, has largely relied on arbitrarily fixed data-agnostic thresholds applied to QC metrics such as gene complexity and fraction of reads mapping to mitochondrial genes. The few existing data-driven approaches perform QC at the level of samples or studies without accounting for biological variation. Results We first demonstrate that QC metrics vary with both tissue and cell types across technologies, study conditions, and species. We then propose data-driven QC (ddqc), an unsupervised adaptive QC framework to perform flexible and data-driven QC at the level of cell types while retaining critical biological insights and improved power for downstream analysis. ddqc applies an adaptive threshold based on the median absolute deviation on four QC metrics (gene and UMI complexity, fraction of reads mapping to mitochondrial and ribosomal genes). ddqc retains over a third more cells when compared to conventional data-agnostic QC filters. Finally, we show that ddqc recovers biologically meaningful trends in gradation of gene complexity among cell types that can help answer questions of biological interest such as which cell types express the least and most number of transcripts overall, and ribosomal transcripts specifically. Conclusions ddqc retains cell types such as metabolically active parenchymal cells and specialized cells such as neutrophils which are often lost by conventional QC. Taken together, our work proposes a revised paradigm to quality filtering best practices—iterative QC, providing a data-driven QC framework compatible with observed biological diversity.en_US
dc.publisherBioMed Centralen_US
dc.relation.isversionofhttps://doi.org/10.1186/s13059-022-02820-wen_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceBioMed Centralen_US
dc.titleBiology-inspired data-driven quality control for scientific discovery in single-cell transcriptomicsen_US
dc.typeArticleen_US
dc.identifier.citationGenome Biology. 2022 Dec 27;23(1):267en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mathematics
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2023-01-01T04:52:49Z
dc.language.rfc3066en
dc.rights.holderThe Author(s)
dspace.date.submission2023-01-01T04:52:49Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record