dc.contributor.author | Vartak, Manasi | |
dc.contributor.author | Trindade, Joana M. F. da | |
dc.contributor.author | Madden, Samuel R | |
dc.contributor.author | Zaharia, Matei A | |
dc.date.accessioned | 2019-06-18T18:10:52Z | |
dc.date.available | 2019-06-18T18:10:52Z | |
dc.date.issued | 2018-06 | |
dc.identifier.isbn | 9781450361613 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/121346 | |
dc.description.abstract | Model diagnosis is the process of analyzing machine learning (ML) model performance to identify where the model works well and where it doesn’t. It is a key part of the modeling process and helps ML developers iteratively improve model accuracy. Often, model diagnosis is performed by analyzing different datasets or inter- mediates associated with the model such as the input data and hidden representations learned by the model (e.g., [ 4 , 24 , 39 ]). The bottleneck in fast model diagnosis is the creation and storage of model intermediates. Storing these intermediates requires tens to hundreds of GB of storage whereas re-running the model for each diagnostic query slows down model diagnosis. To address this bottleneck, we propose a system called MISTIQUE that can work with traditional ML pipelines as well as deep neural networks to efficiently capture, store, and query model intermediates for diag- nosis. For each diagnostic query, MISTIQUE intelligently chooses whether to re-run the model or read a previously stored intermediate. For intermediates that are stored in MISTIQUE , we propose a range of optimizations to reduce storage footprint including quantization, summarization, and data deduplication. We evaluate our techniques on a range of real-world ML models in scikit-learn and Tensorflow. We demonstrate that our optimizations reduce storage by up to 110X for traditional ML pipelines and up to 6X for deep neural networks. Furthermore, by using MISTIQUE , we can speed up diagnostic queries on traditional ML pipelines by up to 390X and 210X on deep neural networks. | en_US |
dc.description.sponsorship | Facebook PhD Fellowship | en_US |
dc.description.sponsorship | Alfred P. Sloan Foundation. University Centers for Exemplary Mentoring (UCEM) fellowship | en_US |
dc.language.iso | en | |
dc.publisher | Association for Computing Machinery (ACM) | en_US |
dc.relation.isversionof | 10.1145/3183713.3196934 | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.source | other univ website | en_US |
dc.title | MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Manasi Vartak, Joana M. F. da Trindade, Samuel Madden, and Matei Zaharia. 2018. MISTIQUE : A System to Store and Query Model Intermediates for Model Diagnosis. In SIGMOD’18: 2018 International Conference on Management of Data, June 10–15, 2018, Houston, TX, USA. ACM, New York, NY, USA, 16 pages. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.relation.journal | SIGMOD’18: 2018 International Conference on Management of Data | en_US |
dc.eprint.version | Author's final manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2019-06-18T17:11:09Z | |
dspace.date.submission | 2019-06-18T17:11:10Z | |
mit.journal.volume | 2018 | en_US |