MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Congestion Control in Machine Learning Clusters

Author(s)
Rajasekaran, Sudarsanan
Thumbnail
DownloadThesis PDF (2.080Mb)
Advisor
Ghobadi, Manya
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
This paper argues that fair-sharing, the holy grail of congestion control algorithms for decades, is not necessarily a desirable property in Machine Learning (ML) training clusters. We demonstrate that for a specific combination of jobs, introducing unfairness improves the training time for all competing jobs. We call this specific combination of jobs compatible and define the compatibility criterion using a novel geometric abstraction. Our abstraction rolls time around a circle and rotates the communication phases of jobs to identify fully compatible jobs. Using this abstraction, we demonstrate up to 1.3× improvement in the average training iteration time of popular ML models. We advocate that resource management algorithms should take job compatibility on network links into account. We then propose three directions to ameliorate the impact of network congestion in ML training clusters: (i) an adaptively unfair congestion control scheme, (ii) priority queues on switches, and (iii) precise flow scheduling.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/156313
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.