Generalizable Reinforcement Learning for Network Control

Wigmore, Jerrod

dc.contributor.advisor	Modiano, Eytan
dc.contributor.author	Wigmore, Jerrod
dc.date.accessioned	2026-03-16T15:46:38Z
dc.date.available	2026-03-16T15:46:38Z
dc.date.issued	2025-09
dc.date.submitted	2025-09-17T13:23:38.235Z
dc.identifier.uri	https://hdl.handle.net/1721.1/165163
dc.description.abstract	This thesis confronts the critical generalization gap of Deep Reinforcement Learning (DRL) that hinders its effective application to queueing network control, where policies often fail to perform robustly in unseen topologies and traffic conditions upon deployment. We develop and analyze a suite of novel techniques that systematically embed structural domain knowledge and safety considerations to create more robust, efficient, and generalist learning agents. To improve generalization for a large class of queueing network control problems, we first introduce the Switch-Type Network (STN), a policy architecture that embeds the "switch-type" property common in classical control. This architectural prior improves sample efficiency and enables superior zero-shot generalization across varying network parameters. To address generalization across multi-hop networks, we then propose the Multi-Axis Graph Neural Network (MA-GNN), which augments the traditional inter-node message passing operations of a GNN with a novel intranode aggregation mechanism to capture complex, permutation-invariant dependencies between different traffic classes. This allows the MAGNN to learn and output high-level control coefficients that are effective for unseen network topologies. Recognizing the limitations of offline training, we shift to online adaptation and introduce an intervention-assisted DRL framework that guarantees stability in environments with unbounded state-spaces. By partitioning the state space and ceding control to a provably stable policy in high-congestion regions, this framework prevents catastrophic learning failures; its stability is proven via Lyapunov analysis, and foundational policy gradient theorems are extended to support the interventional setting. As a complementary case study in structured exploration, we also develop a Bayesian Hierarchical Bandit model and a Hierarchical Thompson Sampling (HTS) algorithm for the multi-band radio channel selection problem, which leverages environmental correlations to guide exploration and significantly reduce regret. Collectively, these contributions provide a comprehensive framework for creating DRL agents that are more robust and practical, demonstrating that embedding knowledge of policy structure, network topology, safety, and environmental correlations is a crucial step towards deploying autonomous agents in complex, real-world systems.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Generalizable Reinforcement Learning for Network Control
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: wigmore-jerrod-phd-aeroastro-2 ...
Size:: 3.816Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record