Quality score compression improves genotyping accuracy
Author(s)
Yu, Yun William; Yorukoglu, Deniz; Peng, Jian; Berger Leighton, Bonnie
DownloadBerger_Quality score.pdf (671.3Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
To the Editor:
Most next-generation sequencing (NGS) quality scores are space intensive, redundant and often misleading. In this Correspondence, we recover quality information directly from sequence data using a compression tool named Quartz, rendering such scores redundant and yielding substantially better space and time efficiencies for storage and analysis. Quartz is designed to operate on NGS reads in FASTQ format, but it can be trivially modified to discard quality scores in other formats for which scores are paired with sequence information. Discarding 95% of quality scores resulted, counterintuitively, in improved SNP calling, implying that compression need not come at the expense of accuracy.
Date issued
2015-03Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of MathematicsJournal
Nature Biotechnology
Publisher
Springer Nature
Citation
Yu, Y William, Deniz Yorukoglu, Jian Peng, and Bonnie Berger. “Quality Score Compression Improves Genotyping Accuracy.” Nature Biotechnology 33, no. 3 (March 6, 2015): 240–243.
Version: Author's final manuscript
ISSN
1087-0156
1546-1696