MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Generalizable Reinforcement Learning for Network Control

Author(s)
Wigmore, Jerrod
Thumbnail
DownloadThesis PDF (3.816Mb)
Advisor
Modiano, Eytan
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
This thesis confronts the critical generalization gap of Deep Reinforcement Learning (DRL) that hinders its effective application to queueing network control, where policies often fail to perform robustly in unseen topologies and traffic conditions upon deployment. We develop and analyze a suite of novel techniques that systematically embed structural domain knowledge and safety considerations to create more robust, efficient, and generalist learning agents. To improve generalization for a large class of queueing network control problems, we first introduce the Switch-Type Network (STN), a policy architecture that embeds the "switch-type" property common in classical control. This architectural prior improves sample efficiency and enables superior zero-shot generalization across varying network parameters. To address generalization across multi-hop networks, we then propose the Multi-Axis Graph Neural Network (MA-GNN), which augments the traditional inter-node message passing operations of a GNN with a novel intranode aggregation mechanism to capture complex, permutation-invariant dependencies between different traffic classes. This allows the MAGNN to learn and output high-level control coefficients that are effective for unseen network topologies. Recognizing the limitations of offline training, we shift to online adaptation and introduce an intervention-assisted DRL framework that guarantees stability in environments with unbounded state-spaces. By partitioning the state space and ceding control to a provably stable policy in high-congestion regions, this framework prevents catastrophic learning failures; its stability is proven via Lyapunov analysis, and foundational policy gradient theorems are extended to support the interventional setting. As a complementary case study in structured exploration, we also develop a Bayesian Hierarchical Bandit model and a Hierarchical Thompson Sampling (HTS) algorithm for the multi-band radio channel selection problem, which leverages environmental correlations to guide exploration and significantly reduce regret. Collectively, these contributions provide a comprehensive framework for creating DRL agents that are more robust and practical, demonstrating that embedding knowledge of policy structure, network topology, safety, and environmental correlations is a crucial step towards deploying autonomous agents in complex, real-world systems.
Date issued
2025-09
URI
https://hdl.handle.net/1721.1/165163
Department
Massachusetts Institute of Technology. Department of Aeronautics and Astronautics
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.