Towards Zero Spawn Overhead: Work Stealing Without Deques

Handleman, Aaron; Singer, Kyle; Schardl, Tao B.; Lee, I-Ting Angelina

dc.contributor.author	Handleman, Aaron
dc.contributor.author	Singer, Kyle
dc.contributor.author	Schardl, Tao B.
dc.contributor.author	Lee, I-Ting Angelina
dc.date.accessioned	2025-08-13T15:43:28Z
dc.date.available	2025-08-13T15:43:28Z
dc.date.issued	2025-07-16
dc.identifier.isbn	979-8-4007-1258-6
dc.identifier.uri	https://hdl.handle.net/1721.1/162362
dc.description	SPAA ’25, July 28-August 1, 2025, Portland, OR, USA	en_US
dc.description.abstract	In a randomized work-stealing scheduler, parallel speedup depends on the spawn overhead, which workers pay to allow tasks to execute in parallel, and the steal overhead, which thieves pay to start executing new work. The importance of minimizing the spawn overhead in a randomized work-stealing scheduler is first formalized by Frigo et al., coined as the work-first principle [15], which states that one should minimize spawn overhead even at the expense of a larger steal overhead. Since then, many strategies have been proposed to reduce the spawn overhead, which is dominated by maintaining a per-worker double-ended queue, or deque, to keep track of available parallel work. In pursuit of zero spawn overhead, this work considers a strategy that eliminates the use of deques entirely, obviating the need for a worker to perform explicit bookkeeping or set up a deque to enable parallelism. To that end, we propose DLite, a compiler and runtime ABI (Application Binary Interface) that incurs near-zero spawn overhead, empirically measured to be about 6% compared to a regular function invocation. DLite pushes the tradeoffs advocated by the work-first principle to the extreme, which decreases the spawn overhead to almost nil, at the expense of a high steal cost. Specifically, DLite employs a backtracking strategy: When a steal attempt occurs, the victim provides its current stack and base pointers to the thief, and the thief then reconstructs the necessary state to realize the parallel execution. We have implemented Cilk-DLite, which extends the OpenCilk platform [33] to implement DLite. When the application has ample parallelism, Cilk-DLite exhibits similar scalability to OpenCilk with much lower spawn overhead. When the application lacks parallelism, the high steal cost in Cilk-DLite can impede scalability due to slower work distribution. We also implemented variants of Cilk-DLite that make different design choices to evaluate the tradeoffs between spawn overhead and steal cost.	en_US
dc.publisher	ACM\|37th ACM Symposium on Parallelism in Algorithms and Architectures	en_US
dc.relation.isversionof	https://doi.org/10.1145/3694906.3743349	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	Towards Zero Spawn Overhead: Work Stealing Without Deques	en_US
dc.type	Article	en_US
dc.identifier.citation	Aaron Handleman, Kyle Singer, Tao B. Schardl, and I-Ting Angelina Lee. 2025. Towards Zero Spawn Overhead: Work Stealing Without Deques. In Proceedings of the 37th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '25). Association for Computing Machinery, New York, NY, USA, 75–88.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2025-08-01T07:55:46Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2025-08-01T07:55:46Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3694906.3743349.pdf
Size:: 734.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record