Show simple item record

dc.contributor.advisorSzekely, Ariel
dc.contributor.advisorKaashoek, M. Frans
dc.contributor.authorLiu, Katie
dc.date.accessioned2025-10-06T17:37:32Z
dc.date.available2025-10-06T17:37:32Z
dc.date.issued2025-05
dc.date.submitted2025-06-23T14:02:53.277Z
dc.identifier.urihttps://hdl.handle.net/1721.1/162971
dc.description.abstractMachine learning inference in multi-tenant cloud environments leads to significant challenges when it comes to minimizing latency and resource contention, especially as models grow in size and complexity. This thesis addresses the cold start overhead and scheduling inefficiencies of multi-tenant ML serving by integrating the RayServe distributed model-serving framework into σOS, a cloud operating system that unifies container and serverless paradigms. The thesis also proposes two model-aware schedulers within σOS that intelligently routes inference requests to reduce the number of cold starts: Model Colocation, which prioritizes placing requests on machines where the required model is already loaded, and Centralized Model Registry, which tracks globally available models to inform scheduling decisions. These policies proactively reduce model load times by reusing cached models. Experimental results on language translation workloads in an 8-node cluster show that these schedulers achieve a ≈ 50% reduction in average inference latency and eliminates roughly 4–5 cold starts per workload, compared to σOS’s default scheduler. Through this model-aware approach to scheduling, our work enables more efficient, scalable, and low-latency ML inference serving in multi-tenant cloud settings.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleEnabling Efficient ML Inference in SigmaOS with Model-Aware Scheduling
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record