Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks

Dai, Yan; Huang, Longbo

dc.contributor.author	Dai, Yan
dc.contributor.author	Huang, Longbo
dc.date.accessioned	2025-01-29T19:42:26Z
dc.date.available	2025-01-29T19:42:26Z
dc.date.issued	2024-12-13
dc.identifier.issn	2476-1249
dc.identifier.uri	https://hdl.handle.net/1721.1/158129
dc.description.abstract	Stochastic Network Optimization (SNO) concerns scheduling in stochastic queueing systems and has been widely studied in network theory. Classical SNO algorithms require network conditions to be stationary w.r.t. time, which fails to capture the non-stationary components in increasingly many real-world scenarios. Moreover, most existing algorithms in network optimization assume perfect knowledge of network conditions before decision, which again rules out applications where unpredictability in network conditions presents. Motivated by these issues, this paper considers Adversarial Network Optimization (ANO) under bandit feedback. Specifically, we consider the task of i) maximizing some unknown and time-varying utility function associated with scheduler's actions, where ii) the underlying network topology is a non-stationary multi-hop network whose conditions change arbitrarily with time, and iii) only bandit feedback (the effect of actually deployed actions) is revealed after decision-making. We propose the UMO2 algorithm, which does not require any pre-decision knowledge or counterfactual feedback, ensures network stability, and also matches the utility maximization performance of any "mildly varying" reference policy up to a polynomially decaying gap. To our knowledge, no previous algorithm can handle multi-hop networks or achieve utility maximization guarantees in ANO problems with bandit feedback, whereas ours is able to do both. Technically, our method builds upon a novel integration of online learning techniques into the Lyapunov drift-plus-penalty method. Specifically, we propose meticulous analytical techniques to jointly balance online learning and Lyapunov arguments, which is used to handle the complex inter-dependency among queues in multi-hop networks. To tackle the learning obstacles due to potentially unbounded queue sizes and negative queue differences, we design a new online linear optimization algorithm that automatically adapts to the unknown (potentially negative) loss magnitudes. Finally, we also propose a bandit convex optimization algorithm with novel queue-dependent learning rate scheduling that suites drastically varying queue lengths in utility maximization. Our new insights and techniques in online learning can also be of independent interest.	en_US
dc.publisher	ACM	en_US
dc.relation.isversionof	https://doi.org/10.1145/3700413	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks	en_US
dc.type	Article	en_US
dc.identifier.citation	Dai, Yan and Huang, Longbo. 2024. "Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks." Proceedings of the ACM on Measurement and Analysis of Computing Systems, 8 (3).
dc.contributor.department	Massachusetts Institute of Technology. Operations Research Center	en_US
dc.relation.journal	Proceedings of the ACM on Measurement and Analysis of Computing Systems	en_US
dc.identifier.mitlicense	PUBLISHER_CC
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2025-01-01T08:51:24Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2025-01-01T08:51:25Z
mit.journal.volume	8	en_US
mit.journal.issue	3	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: license_rdf
Size:: 40bytes
Format:: application/rdf+xml

View/Open

Name:: 3700413.pdf
Size:: 407.1Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record