Show simple item record

dc.contributor.authorJeon, Byungsoo
dc.contributor.authorWu, Mengdi
dc.contributor.authorCao, Shiyi
dc.contributor.authorKim, Sunghyun
dc.contributor.authorPark, Sunghyun
dc.contributor.authorAggarwal, Neeraj
dc.contributor.authorUnger, Colin
dc.contributor.authorArfeen, Daiyaan
dc.contributor.authorLiao, Peiyuan
dc.contributor.authorMiao, Xupeng
dc.contributor.authorAlizadeh, Mohammad
dc.contributor.authorGanger, Gregory
dc.contributor.authorChen, Tianqi
dc.contributor.authorJia, Zhihao
dc.date.accessioned2025-05-09T15:33:17Z
dc.date.available2025-05-09T15:33:17Z
dc.date.issued2025-02-03
dc.identifier.isbn979-8-4007-0698-1
dc.identifier.urihttps://hdl.handle.net/1721.1/159248
dc.descriptionASPLOS ’25, March 30–April 3, 2025, Rotterdam, Netherlandsen_US
dc.description.abstractDeep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device (e.g. GPU). Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN computation for different micro-batches of training samples in a pipeline fashion. However, existing pipeline-parallel approaches only consider sequential pipeline stages and thus ignore the topology of a DNN, resulting in missed model-parallel opportunities. This paper presents graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes existing sequential pipeline parallelism and preserves the inherent topology of a DNN to enable concurrent execution of computationally-independent operators, resulting in reduced memory requirement and improved GPU performance. In addition, we develop GraphPipe, a distributed system that exploits GPP strategies to enable performant and scalable DNN training. GraphPipe partitions a DNN into a graph of stages, optimizes micro-batch schedules for these stages, and parallelizes DNN training using the discovered GPP strategies. Evaluation on a variety of DNNs shows that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6×. GraphPipe also reduces the search time by 9-21× compared to PipeDream and Piper.en_US
dc.publisherACM|Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1en_US
dc.relation.isversionofhttps://doi.org/10.1145/3669940.3707220en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleGraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelismen_US
dc.typeArticleen_US
dc.identifier.citationByungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, and Zhihao Jia. 2025. GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism. In Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1 (ASPLOS '25). Association for Computing Machinery, New York, NY, USA, 557–571.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2025-04-01T07:48:46Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2025-04-01T07:48:47Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record