MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Mining Software Artifacts for use in Automated Machine Learning

Author(s)
Cambronero Sánchez, José Pablo
Thumbnail
DownloadThesis PDF (5.926Mb)
Advisor
Rinard, Martin C.
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Successfully implementing classical supervised machine learning pipelines requires that users have software engineering, machine learning, and domain experience. Machine learning libraries have helped along the first two dimensions by providing modular implementations of popular algorithms. However, implementing a pipeline remains an iterative, tedious, and data-dependent task as users have to experiment with different pipeline designs. To make the pipeline development process accessible to non-experts and more efficient for experts, automated techniques can be used to efficiently search for high performing pipelines with little user intervention. The collection of techniques and systems that automate this task are commonly termed automated machine learning (AutoML). Inspired by the success of software mining in areas such as code search, program synthesis, and program repair, we investigate the hypothesis that information mined from software artifacts can be used to build, improve interactions with, and address missing use cases of AutoML. In particular, I will present three systems -- AL, AMS, and Janus -- that make use of software artifacts. AL mines dynamic execution traces from a collection of programs that implement machine learning pipelines and uses these mined traces to learn to produce new pipelines. AMS mines documentation and program examples to automatically generate a search space for an AutoML tool by starting from a user-chosen set of API components. And Janus mines pipeline transformations from a collection of machine learning pipelines, which can be used to improve an input pipeline while producing a nearby variant. Jointly, these systems and their experimental results show that mining software artifacts can simplify AutoML systems, make their customization easier, and apply them to novel use cases.
Date issued
2021-06
URI
https://hdl.handle.net/1721.1/139465
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.