Show simple item record

dc.contributor.authorShang, Zeyuan
dc.contributor.authorZgraggen, Emanuel
dc.contributor.authorBuratti, Benedetto
dc.contributor.authorKossman, Ferdinand
dc.contributor.authorEichmann, Philipp
dc.contributor.authorChung, Yeounoh
dc.contributor.authorBinnig, Carsten
dc.contributor.authorUpfal, Eli
dc.contributor.authorKraska, Tim
dc.date.accessioned2022-07-18T20:54:12Z
dc.date.available2021-09-20T18:21:37Z
dc.date.available2022-07-18T20:54:12Z
dc.date.issued2019
dc.identifier.urihttps://hdl.handle.net/1721.1/132275.2
dc.description.abstract© 2019 Association for Computing Machinery. Statistical knowledge and domain expertise are key to extract actionable insights out of data, yet such skills rarely coexist together. In Machine Learning, high-quality results are only attainable via mindful data preprocessing, hyperparameter tuning and model selection. Domain experts are often overwhelmed by such complexity, de-facto inhibiting a wider adoption of ML techniques in other elds. Existing libraries that claim to solve this problem, still require well-trained practitioners. Those frameworks involve heavy data preparation steps and are often too slow for interactive feedback from the user, severely limiting the scope of such systems. In this paper we present Alpine Meadow, arst Interactive Automated Machine Learning tool. What makes our system unique is not only the focus on interactivity, but also the combined systemic and algorithmic design approach; on one hand we leverage ideas from query optimization, on the other we devise novel selection and pruning strategies combining cost-based Multi-Armed Bandits and Bayesian Optimization. We evaluate our system on over 300 datasets and compare against other AutoML tools, including the current NIPS winner, as well as expert solutions. Not only is Alpine Meadow able to signicantly outperform the other AutoML systems while - in contrast to the other systems - providing interactive latencies, but also outperforms in 80% of the cases expert solutions over data sets we have never seen before.en_US
dc.language.isoen
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.relation.isversionof10.1145/3299869.3319863en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceOther repositoryen_US
dc.titleDemocratizing Data Science through Interactive Curation of ML Pipelinesen_US
dc.typeArticleen_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.relation.journalProceedings of the ACM SIGMOD International Conference on Management of Dataen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-01-11T15:14:35Z
dspace.orderedauthorsShang, Z; Zgraggen, E; Buratti, B; Kossmann, F; Eichmann, P; Chung, Y; Binnig, C; Upfal, E; Kraska, Ten_US
dspace.date.submission2021-01-11T15:14:40Z
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusPublication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version