Show simple item record

dc.contributor.authorRolfe, Philip Alexander
dc.contributor.authorGifford, David K.
dc.date.accessioned2011-08-19T14:58:51Z
dc.date.available2011-08-19T14:58:51Z
dc.date.issued2011-07
dc.date.submitted2010-11
dc.identifier.issn1471-2105
dc.identifier.urihttp://hdl.handle.net/1721.1/65339
dc.description.abstractBackground: The advent of high-throughput sequencing has enabled sequencing based measurements of cellular function, with an individual measurement potentially consisting of more than 108 reads. While tools are available for aligning sets of reads to genomes and interpreting the results, fewer tools have been developed to address the storage and retrieval requirements of large collections of aligned datasets. We present ReadDB, a network accessible column store database system for aligned high-throughput read datasets. Results: ReadDB stores collections of aligned read positions and provides a client interface to support visualization and analysis. ReadDB is implemented as a network server that responds to queries on genomic intervals in an experiment with either the set of contained reads or a histogram based interval summary. Tests on datasets ranging from 105 to 108 reads demonstrate that ReadDB performance is generally within a factor of two of local-storage based methods and often three to five times better than other network-based methods. Conclusions: ReadDB is a high-performance foundation for ChIP-Seq and RNA-Seq analysis. The client-server model provides convenient access to compute cluster nodes or desktop visualization software without requiring a shared network filesystem or large amounts of local storage. The client code provides a simple interface for fast data access to visualization or analysis. ReadDB provides a new way to store genome-aligned reads for use in applications where read sequence and alignment mismatches are not needed.en_US
dc.description.sponsorshipNational Institutes of Health (U.S.) (grant 5TL1EB008540)en_US
dc.description.sponsorshipNational Institutes of Health (U.S.) (grant 5R01HG002668)en_US
dc.description.sponsorshipNational Institutes of Health (U.S.) (grant P01-NS055923-01)en_US
dc.description.sponsorshipNational Institutes of Health (U.S.) (grant 1-UL1-RR024920)en_US
dc.publisherBioMed Central Ltden_US
dc.relation.isversionofhttp://dx.doi.org/10.1186/1471-2105-12-278en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttp://creativecommons.org/licenses/by/2.0en_US
dc.sourceBioMed Central Ltden_US
dc.titleReadDB Provides Efficient Storage for Mapped Short Readsen_US
dc.typeArticleen_US
dc.identifier.citationBMC Bioinformatics. 2011 Jul 07;12(1):278en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.mitauthorRolfe, Philip Alexander
dc.contributor.mitauthorGifford, David K.
dc.relation.journalBMC Bioinformaticsen_US
dc.eprint.versionFinal published versionen_US
dc.identifier.pmid21736741
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2011-07-25T16:01:45Z
dc.language.rfc3066en
dc.rights.holderRolfe et al.; licensee BioMed Central Ltd.
dspace.orderedauthorsRolfe, P Alexander; Gifford, David Ken
dc.identifier.orcidhttps://orcid.org/0000-0003-1709-4034
mit.licensePUBLISHER_CCen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record