m3: Accurate Flow-Level Performance Estimation using Machine Learning
Author(s)
Li, Chenning; Nasr-Esfahany, Arash; Zhao, Kevin; Noorbakhsh, Kimia; Goyal, Prateesh; Alizadeh, Mohammad; Anderson, Thomas; ... Show more Show less
Download3651890.3672243.pdf (1.752Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
Data center network operators often need accurate estimates of aggregate network performance. Unfortunately, existing methods for estimating aggregate network statistics are either inaccurate or too slow to be practical at the data center scale.
In this paper, we develop and evaluate a scale-free, fast, and accurate model for estimating data center network tail latency performance for a given workload, topology, and network configuration. First, we show that path-level simulations---simulations of traffic that intersects a given path---produce almost the same aggregate statistics as full network-wide packet-level simulations. We use a simple and fast flow-level fluid simulation in a novel way to capture and summarize essential elements of the path workload, including the effect of cross-traffic on flows on that path. We use this coarse simulation as input to a machine-learning model to predict path-level behavior, and run it on a sample of paths to produce accurate network-wide estimates. Our model generalizes over the choice of congestion control (CC) protocol, CC protocol parameters, and routing. Relative to Parsimon, a state-of-the-art system for rapidly estimating aggregate network tail latency, our approach is significantly faster (5.7×), more accurate (45.9% less error), and more robust.
Description
ACM SIGCOMM ’24, August 4–8, 2024, Sydney, NSW, Australia
Date issued
2024-08-04Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryPublisher
ACM|ACM SIGCOMM 2024 Conference
Citation
Chenning Li, Arash Nasr-Esfahany, Kevin Zhao, Kimia Noorbakhsh, Prateesh Goyal, Mohammad Alizadeh, and Thomas E. Anderson. 2024. M3: Accurate Flow-Level Performance Estimation using Machine Learning. In Proceedings of the ACM SIGCOMM 2024 Conference (ACM SIGCOMM '24). Association for Computing Machinery, New York, NY, USA, 813–827.
Version: Final published version
ISBN
979-8-4007-0614-1
Collections
The following license files are associated with this item: