Show simple item record

dc.contributor.authorYu, Yun William
dc.contributor.authorDaniels, Noah
dc.contributor.authorDanko, David C.
dc.contributor.authorBerger Leighton, Bonnie
dc.date.accessioned2016-08-30T20:53:34Z
dc.date.available2016-08-30T20:53:34Z
dc.date.issued2015-08
dc.date.submitted2015-06
dc.identifier.issn24054712
dc.identifier.urihttp://hdl.handle.net/1721.1/104078
dc.description.abstractMany datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset’s entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains—high-throughput drug screening (Ammolite, 150× speedup), metagenomics (MICA, 3.5× speedup of DIAMOND [3,700× BLASTX]), and protein structure search (esFragBag, 10× speedup of FragBag). Our framework can be used to achieve “‘compressive omics,” and the general theory can be readily applied to data science problems outside of biology (source code: http://gems.csail.mit.edu).en_US
dc.description.sponsorshipHertz Foundation (Fellowship)en_US
dc.description.sponsorshipNational Institutes of Health (U.S.) (NIH grant GM108348)en_US
dc.language.isoen_US
dc.publisherElsevieren_US
dc.relation.isversionofhttp://dx.doi.org/10.1016/j.cels.2015.08.004en_US
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivs Licenseen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en_US
dc.sourcePMCen_US
dc.titleEntropy-Scaling Search of Massive Biological Dataen_US
dc.typeArticleen_US
dc.identifier.citationYu, Y. William, Noah M. Daniels, David Christian Danko, and Bonnie Berger. “Entropy-Scaling Search of Massive Biological Data.” Cell Systems 1, no. 2 (August 2015): 130–140.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mathematicsen_US
dc.contributor.mitauthorYu, Yun Williamen_US
dc.contributor.mitauthorDaniels, Noahen_US
dc.contributor.mitauthorDanko, David C.en_US
dc.contributor.mitauthorBerger Leighton, Bonnieen_US
dc.relation.journalCell Systemsen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0002-8275-9576
dc.identifier.orcidhttps://orcid.org/0000-0002-9538-825X
dc.identifier.orcidhttps://orcid.org/0000-0002-2724-7228
mit.licensePUBLISHER_CCen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record