MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis
Author(s)Vartak, Manasi; Trindade, Joana M. F. da; Madden, Samuel R; Zaharia, Matei A
MetadataShow full item record
Model diagnosis is the process of analyzing machine learning (ML) model performance to identify where the model works well and where it doesn’t. It is a key part of the modeling process and helps ML developers iteratively improve model accuracy. Often, model diagnosis is performed by analyzing different datasets or inter- mediates associated with the model such as the input data and hidden representations learned by the model (e.g., [ 4 , 24 , 39 ]). The bottleneck in fast model diagnosis is the creation and storage of model intermediates. Storing these intermediates requires tens to hundreds of GB of storage whereas re-running the model for each diagnostic query slows down model diagnosis. To address this bottleneck, we propose a system called MISTIQUE that can work with traditional ML pipelines as well as deep neural networks to efficiently capture, store, and query model intermediates for diag- nosis. For each diagnostic query, MISTIQUE intelligently chooses whether to re-run the model or read a previously stored intermediate. For intermediates that are stored in MISTIQUE , we propose a range of optimizations to reduce storage footprint including quantization, summarization, and data deduplication. We evaluate our techniques on a range of real-world ML models in scikit-learn and Tensorflow. We demonstrate that our optimizations reduce storage by up to 110X for traditional ML pipelines and up to 6X for deep neural networks. Furthermore, by using MISTIQUE , we can speed up diagnostic queries on traditional ML pipelines by up to 390X and 210X on deep neural networks.
DepartmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
SIGMOD’18: 2018 International Conference on Management of Data
Association for Computing Machinery (ACM)
Manasi Vartak, Joana M. F. da Trindade, Samuel Madden, and Matei Zaharia. 2018. MISTIQUE : A System to Store and Query Model Intermediates for Model Diagnosis. In SIGMOD’18: 2018 International Conference on Management of Data, June 10–15, 2018, Houston, TX, USA. ACM, New York, NY, USA, 16 pages.
Author's final manuscript