Show simple item record

dc.contributor.advisorTim Kraska.en_US
dc.contributor.authorShang, Zeyuan.en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2020-11-03T20:29:18Z
dc.date.available2020-11-03T20:29:18Z
dc.date.copyright2020en_US
dc.date.issued2020en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/128304
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionThesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2020en_US
dc.descriptionCataloged from student-submitted PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 63-66).en_US
dc.description.abstractStatistical knowledge and domain expertise are key to extract actionable insights out of data, yet such skills rarely coexist together. In Machine Learning, high-quality results are only attainable via mindful data preprocessing, hyperparameter tuning and model selection. Domain experts are often overwhelmed by such complexity, de-facto inhibiting a wider adoption of ML techniques in other fields. Existing libraries that claim to solve this problem, still require well-trained practitioners. Those frameworks involve heavy data preparation steps and are often too slow for interactive feedback from the user, severely limiting the scope of such systems. In this work we present Alpine Meadow, a first Interactive Automated Machine Learning tool. What makes the system unique is not only the focus on interactivity, but also the combined systemic and algorithmic design approach; on one hand we leverage ideas from query optimization, on the other we devise novel selection and pruning strategies combining cost-based Multi-Armed Bandits and Bayesian Optimization. We evaluate the system on over 300 datasets and compare against other AutoML tools, including the current NIPS winner, as well as expert solutions. Not only is Alpine Meadow able to significantly outperform the other AutoML systems while -- in contrast to the other systems -- providing interactive latencies, but also outperforms in 80% of the cases expert solutions over data sets we have never seen before.en_US
dc.description.statementofresponsibilityby Zeyuan Shang.en_US
dc.format.extent66 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleDemocratizing data science through interactive curation of ML pipelinesen_US
dc.typeThesisen_US
dc.description.degreeS.M.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.identifier.oclc1202001398en_US
dc.description.collectionS.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienceen_US
dspace.imported2020-11-03T20:29:16Zen_US
mit.thesis.degreeMasteren_US
mit.thesis.departmentEECSen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record