MacroBase: Prioritizing Attention in Fast Data
Author(s)
Abuzaid, Firas; Bailis, Peter; Ding, Jialin; Gan, Edward; Madden, Samuel; Narayanan, Deepak; Rong, Kexin; Suri, Sahaana; ... Show more Show less
DownloadAccepted version (1.056Mb)
Terms of use
Metadata
Show full item recordAbstract
© 2018 Association for Computing Machinery. As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables eficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver order-of-magnitude speedups over alternatives by optimizing the combination of explanation (i.e., feature selection) and classification tasks and by leveraging a new reservoir sampler and heavy-hitters sketch specialized for fast data streams. As a result, MacroBase delivers accurate results at speeds of up to 2M events per second per query on a single core. The system has delivered meaningful results in production, including at a telematics company monitoring hundreds of thousands of vehicles.
Date issued
2018Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryJournal
ACM Transactions on Database Systems
Publisher
Association for Computing Machinery (ACM)