Show simple item record

dc.contributor.authorAgarwal, Sameer
dc.contributor.authorIyer, Anand P.
dc.contributor.authorPanda, Aurojit
dc.contributor.authorMozafari, Barzan
dc.contributor.authorStoica, Ion
dc.contributor.authorMadden, Samuel R.
dc.date.accessioned2014-09-26T13:11:32Z
dc.date.available2014-09-26T13:11:32Z
dc.date.issued2012-08
dc.identifier.issn21508097
dc.identifier.urihttp://hdl.handle.net/1721.1/90381
dc.description.abstractIn this demonstration, we present BlinkDB, a massively parallel, sampling-based approximate query processing framework for running interactive queries on large volumes of data. The key observation in BlinkDB is that one can make reasonable decisions in the absence of perfect answers. BlinkDB extends the Hive/HDFS stack and can handle the same set of SPJA (selection, projection, join and aggregate) queries as supported by these systems. BlinkDB provides real-time answers along with statistical error guarantees, and can scale to petabytes of data and thousands of machines in a fault-tolerant manner. Our experiments using the TPC-H benchmark and on an anonymized real-world video content distribution workload from Conviva Inc. show that BlinkDB can execute a wide range of queries up to 150x faster than Hive on MapReduce and 10--150x faster than Shark (Hive on Spark) over tens of terabytes of data stored across 100 machines, all with an error of 2--10%.en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)en_US
dc.description.sponsorshipQUALCOMM Inc.en_US
dc.description.sponsorshipAmazon.com (Firm)en_US
dc.description.sponsorshipGoogle (Firm)en_US
dc.description.sponsorshipSAP Corporationen_US
dc.description.sponsorshipBlue Gojien_US
dc.description.sponsorshipCisco Systems, Inc.en_US
dc.description.sponsorshipCloudera, Inc.en_US
dc.description.sponsorshipEricsson, Inc.en_US
dc.description.sponsorshipGeneral Electric Companyen_US
dc.description.sponsorshipHewlett-Packard Companyen_US
dc.description.sponsorshipIntel Corporationen_US
dc.description.sponsorshipMarkLogic Corporationen_US
dc.description.sponsorshipMicrosoft Corporationen_US
dc.description.sponsorshipNetAppen_US
dc.description.sponsorshipOracle Corporationen_US
dc.description.sponsorshipSplunk Inc.en_US
dc.description.sponsorshipVMware, Inc.en_US
dc.description.sponsorshipUnited States. Defense Advanced Research Projects Agency (Contract FA8650-11-C-7136)en_US
dc.language.isoen_US
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.relation.isversionofhttp://dx.doi.org/10.14778/2367502.2367533en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceOther univ. web domainen_US
dc.titleBlink and it's done: Interactive queries on very large dataen_US
dc.typeArticleen_US
dc.identifier.citationSameer Agarwal, Anand P. Iyer, Aurojit Panda, Samuel Madden, Barzan Mozafari, and Ion Stoica. 2012. Blink and it's done: interactive queries on very large data. Proc. VLDB Endow. 5, 12 (August 2012), 1902-1905.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorMozafari, Barzanen_US
dc.contributor.mitauthorMadden, Samuel R.en_US
dc.relation.journalProceedings of the VLDB Endowmenten_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsAgarwal, Sameer; Iyer, Anand P.; Panda, Aurojit; Madden, Samuel; Mozafari, Barzan; Stoica, Ionen_US
dc.identifier.orcidhttps://orcid.org/0000-0002-7470-3265
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record