| dc.contributor.advisor | Szekely, Ariel | |
| dc.contributor.advisor | Kaashoek, M. Frans | |
| dc.contributor.author | Liu, Katie | |
| dc.date.accessioned | 2025-10-06T17:37:32Z | |
| dc.date.available | 2025-10-06T17:37:32Z | |
| dc.date.issued | 2025-05 | |
| dc.date.submitted | 2025-06-23T14:02:53.277Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/162971 | |
| dc.description.abstract | Machine learning inference in multi-tenant cloud environments leads to significant challenges when it comes to minimizing latency and resource contention, especially as models grow in size and complexity. This thesis addresses the cold start overhead and scheduling inefficiencies of multi-tenant ML serving by integrating the RayServe distributed model-serving framework into σOS, a cloud operating system that unifies container and serverless paradigms. The thesis also proposes two model-aware schedulers within σOS that intelligently routes inference requests to reduce the number of cold starts: Model Colocation, which prioritizes placing requests on machines where the required model is already loaded, and Centralized Model Registry, which tracks globally available models to inform scheduling decisions. These policies proactively reduce model load times by reusing cached models. Experimental results on language translation workloads in an 8-node cluster show that these schedulers achieve a ≈ 50% reduction in average inference latency and eliminates roughly 4–5 cold starts per workload, compared to σOS’s default scheduler. Through this model-aware approach to scheduling, our work enables more efficient, scalable, and low-latency ML inference serving in multi-tenant cloud settings. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | In Copyright - Educational Use Permitted | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
| dc.title | Enabling Efficient ML Inference in SigmaOS with Model-Aware Scheduling | |
| dc.type | Thesis | |
| dc.description.degree | M.Eng. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Master | |
| thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |