Monkey: Platform-Agnostic Hybrid-Cloud Cluster Compute Orchestration Designed for AI/ML

Lamp, Avery

dc.contributor.advisor	Agrawal, Pulkit
dc.contributor.author	Lamp, Avery
dc.date.accessioned	2022-01-14T14:59:56Z
dc.date.available	2022-01-14T14:59:56Z
dc.date.issued	2021-06
dc.date.submitted	2021-06-17T20:13:33.943Z
dc.identifier.uri	https://hdl.handle.net/1721.1/139258
dc.description.abstract	As AI/ML research progresses, the amount of compute needed to train and evaluate state-of-the-art AI algorithms consistently increases. With increasing needs for compute, researchers spend time designing distributed systems to scalably train and hyper-parameter optimize their latest model rather than focusing on their core research. We aim to build a fault-tolerant distributed system capable of cheaply and flexibly scheduling reproducible research training jobs on heterogeneous hybrid-cloud compute clusters including local machines and provider agnostic cloud machines. Our system focuses on ML researchers with two main goals, minimizing costs (using preemptible/spot-instances) and user friendliness. The system aims to require minimal user setup and configuration, allowing researchers to quickly get started training models. The Monkey System includes a web console and visualization dashboard to track, evaluate, and compare multiple jobs’ progress and results.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Monkey: Platform-Agnostic Hybrid-Cloud Cluster Compute Orchestration Designed for AI/ML
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: Lamp-alamp-meng-eecs-2021-thes ...
Size:: 890.0Kb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record