Show simple item record

dc.contributor.authorDaniels, Noah M.
dc.contributor.authorGallant, Andrew
dc.contributor.authorPeng, Jian
dc.contributor.authorCowen, Lenore J.
dc.contributor.authorBaym, Michael Hartmann
dc.contributor.authorBerger Leighton, Bonnie
dc.date.accessioned2016-08-26T18:03:39Z
dc.date.available2016-08-26T18:03:39Z
dc.date.issued2013-06
dc.identifier.issn1367-4803
dc.identifier.issn1460-2059
dc.identifier.urihttp://hdl.handle.net/1721.1/104045
dc.description.abstractMotivation: The exponential growth of protein sequence databases has increasingly made the fundamental question of searching for homologs a computational bottleneck. The amount of unique data, however, is not growing nearly as fast; we can exploit this fact to greatly accelerate homology search. Acceleration of programs in the popular PSI/DELTA-BLAST family of tools will not only speed-up homology search directly but also the huge collection of other current programs that primarily interact with large protein databases via precisely these tools. Results: We introduce a suite of homology search tools, powered by compressively accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate with all known state-of-the-art tools, including HHblits, DELTA-BLAST and PSI-BLAST. Further, our tools are implemented in a manner that allows direct substitution into existing analysis pipelines. The key idea is that we introduce a local similarity-based compression scheme that allows us to operate directly on the compressed data. Importantly, CaBLASTP’s runtime scales almost linearly in the amount of unique data, as opposed to current BLASTP variants, which scale linearly in the size of the full protein database being searched. Our compressive algorithms will speed-up many tasks, such as protein structure prediction and orthology mapping, which rely heavily on homology search.en_US
dc.description.sponsorshipSimons Foundationen_US
dc.description.sponsorshipNational Institutes of Health (U.S.) (NIH grant (R01GM080330))en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (NSF MSPRF grant)en_US
dc.language.isoen_US
dc.publisherOxford University Pressen_US
dc.relation.isversionofhttp://dx.doi.org/10.1093/bioinformatics/btt214en_US
dc.rightsCreative Commons Attribution-NonCommercial 3.0 Unported licenceen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0/en_US
dc.sourceOxford University Pressen_US
dc.titleCompressive genomics for protein databasesen_US
dc.typeArticleen_US
dc.identifier.citationDaniels, N. M., A. Gallant, J. Peng, L. J. Cowen, M. Baym, and B. Berger. “Compressive Genomics for Protein Databases.” Bioinformatics 29, no. 13 (June 21, 2013): i283–i290.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mathematicsen_US
dc.contributor.mitauthorPeng, Jianen_US
dc.contributor.mitauthorBaym, Michael Hartmannen_US
dc.contributor.mitauthorBerger Leighton, Bonnieen_US
dc.relation.journalBioinformaticsen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0003-1303-5598
dc.identifier.orcidhttps://orcid.org/0000-0002-2724-7228
mit.licensePUBLISHER_CCen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record