Show simple item record

dc.contributor.authorVartak, Manasi
dc.contributor.authorTrindade, Joana M. F. da
dc.contributor.authorMadden, Samuel R
dc.contributor.authorZaharia, Matei A
dc.date.accessioned2019-06-18T18:10:52Z
dc.date.available2019-06-18T18:10:52Z
dc.date.issued2018-06
dc.identifier.isbn9781450361613
dc.identifier.urihttps://hdl.handle.net/1721.1/121346
dc.description.abstractModel diagnosis is the process of analyzing machine learning (ML) model performance to identify where the model works well and where it doesn’t. It is a key part of the modeling process and helps ML developers iteratively improve model accuracy. Often, model diagnosis is performed by analyzing different datasets or inter- mediates associated with the model such as the input data and hidden representations learned by the model (e.g., [ 4 , 24 , 39 ]). The bottleneck in fast model diagnosis is the creation and storage of model intermediates. Storing these intermediates requires tens to hundreds of GB of storage whereas re-running the model for each diagnostic query slows down model diagnosis. To address this bottleneck, we propose a system called MISTIQUE that can work with traditional ML pipelines as well as deep neural networks to efficiently capture, store, and query model intermediates for diag- nosis. For each diagnostic query, MISTIQUE intelligently chooses whether to re-run the model or read a previously stored intermediate. For intermediates that are stored in MISTIQUE , we propose a range of optimizations to reduce storage footprint including quantization, summarization, and data deduplication. We evaluate our techniques on a range of real-world ML models in scikit-learn and Tensorflow. We demonstrate that our optimizations reduce storage by up to 110X for traditional ML pipelines and up to 6X for deep neural networks. Furthermore, by using MISTIQUE , we can speed up diagnostic queries on traditional ML pipelines by up to 390X and 210X on deep neural networks.en_US
dc.description.sponsorshipFacebook PhD Fellowshipen_US
dc.description.sponsorshipAlfred P. Sloan Foundation. University Centers for Exemplary Mentoring (UCEM) fellowshipen_US
dc.language.isoen
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.relation.isversionof10.1145/3183713.3196934en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceother univ websiteen_US
dc.titleMISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosisen_US
dc.typeArticleen_US
dc.identifier.citationManasi Vartak, Joana M. F. da Trindade, Samuel Madden, and Matei Zaharia. 2018. MISTIQUE : A System to Store and Query Model Intermediates for Model Diagnosis. In SIGMOD’18: 2018 International Conference on Management of Data, June 10–15, 2018, Houston, TX, USA. ACM, New York, NY, USA, 16 pages.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.relation.journalSIGMOD’18: 2018 International Conference on Management of Dataen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2019-06-18T17:11:09Z
dspace.date.submission2019-06-18T17:11:10Z
mit.journal.volume2018en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record