Generalizable Reinforcement Learning for Network Control

Wigmore, Jerrod

Author(s)

Wigmore, Jerrod

DownloadThesis PDF (3.816Mb)

Advisor

Modiano, Eytan

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

This thesis confronts the critical generalization gap of Deep Reinforcement Learning (DRL) that hinders its effective application to queueing network control, where policies often fail to perform robustly in unseen topologies and traffic conditions upon deployment. We develop and analyze a suite of novel techniques that systematically embed structural domain knowledge and safety considerations to create more robust, efficient, and generalist learning agents. To improve generalization for a large class of queueing network control problems, we first introduce the Switch-Type Network (STN), a policy architecture that embeds the "switch-type" property common in classical control. This architectural prior improves sample efficiency and enables superior zero-shot generalization across varying network parameters. To address generalization across multi-hop networks, we then propose the Multi-Axis Graph Neural Network (MA-GNN), which augments the traditional inter-node message passing operations of a GNN with a novel intranode aggregation mechanism to capture complex, permutation-invariant dependencies between different traffic classes. This allows the MAGNN to learn and output high-level control coefficients that are effective for unseen network topologies. Recognizing the limitations of offline training, we shift to online adaptation and introduce an intervention-assisted DRL framework that guarantees stability in environments with unbounded state-spaces. By partitioning the state space and ceding control to a provably stable policy in high-congestion regions, this framework prevents catastrophic learning failures; its stability is proven via Lyapunov analysis, and foundational policy gradient theorems are extended to support the interventional setting. As a complementary case study in structured exploration, we also develop a Bayesian Hierarchical Bandit model and a Hierarchical Thompson Sampling (HTS) algorithm for the multi-band radio channel selection problem, which leverages environmental correlations to guide exploration and significantly reduce regret. Collectively, these contributions provide a comprehensive framework for creating DRL agents that are more robust and practical, demonstrating that embedding knowledge of policy structure, network topology, safety, and environmental correlations is a crucial step towards deploying autonomous agents in complex, real-world systems.

Date issued

2025-09

URI

https://hdl.handle.net/1721.1/165163

Department

Massachusetts Institute of Technology. Department of Aeronautics and Astronautics

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses