m3: Accurate Flow-Level Performance Estimation using Machine Learning

Li, Chenning; Nasr-Esfahany, Arash; Zhao, Kevin; Noorbakhsh, Kimia; Goyal, Prateesh; Alizadeh, Mohammad; Anderson, Thomas

dc.contributor.author	Li, Chenning
dc.contributor.author	Nasr-Esfahany, Arash
dc.contributor.author	Zhao, Kevin
dc.contributor.author	Noorbakhsh, Kimia
dc.contributor.author	Goyal, Prateesh
dc.contributor.author	Alizadeh, Mohammad
dc.contributor.author	Anderson, Thomas
dc.date.accessioned	2024-09-05T13:38:53Z
dc.date.available	2024-09-05T13:38:53Z
dc.date.issued	2024-08-04
dc.identifier.isbn	979-8-4007-0614-1
dc.identifier.uri	https://hdl.handle.net/1721.1/156674
dc.description	ACM SIGCOMM ’24, August 4–8, 2024, Sydney, NSW, Australia	en_US
dc.description.abstract	Data center network operators often need accurate estimates of aggregate network performance. Unfortunately, existing methods for estimating aggregate network statistics are either inaccurate or too slow to be practical at the data center scale. In this paper, we develop and evaluate a scale-free, fast, and accurate model for estimating data center network tail latency performance for a given workload, topology, and network configuration. First, we show that path-level simulations---simulations of traffic that intersects a given path---produce almost the same aggregate statistics as full network-wide packet-level simulations. We use a simple and fast flow-level fluid simulation in a novel way to capture and summarize essential elements of the path workload, including the effect of cross-traffic on flows on that path. We use this coarse simulation as input to a machine-learning model to predict path-level behavior, and run it on a sample of paths to produce accurate network-wide estimates. Our model generalizes over the choice of congestion control (CC) protocol, CC protocol parameters, and routing. Relative to Parsimon, a state-of-the-art system for rapidly estimating aggregate network tail latency, our approach is significantly faster (5.7×), more accurate (45.9% less error), and more robust.	en_US
dc.publisher	ACM\|ACM SIGCOMM 2024 Conference	en_US
dc.relation.isversionof	https://doi.org/10.1145/3651890.3672243	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	m3: Accurate Flow-Level Performance Estimation using Machine Learning	en_US
dc.type	Article	en_US
dc.identifier.citation	Chenning Li, Arash Nasr-Esfahany, Kevin Zhao, Kimia Noorbakhsh, Prateesh Goyal, Mohammad Alizadeh, and Thomas E. Anderson. 2024. M3: Accurate Flow-Level Performance Estimation using Machine Learning. In Proceedings of the ACM SIGCOMM 2024 Conference (ACM SIGCOMM '24). Association for Computing Machinery, New York, NY, USA, 813–827.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.identifier.mitlicense	PUBLISHER_CC
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2024-09-01T07:47:13Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2024-09-01T07:47:13Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: license_rdf
Size:: 40bytes
Format:: application/rdf+xml

View/Open

Name:: 3651890.3672243.pdf
Size:: 1.752Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record