Exploring Memory in Reinforcement-Learned Agents for Smarter Lateral Movement
Author(s)
Johnson Schofield, Catherine
DownloadThesis PDF (9.153Mb)
Advisor
Stephenson, William
Ross, Dennis
Terms of use
Metadata
Show full item recordAbstract
Computer networks are the backbone of most organizations’ technology infrastructure. Yet, they remain susceptible to many hidden vulnerabilities. One proactive approach to uncover and mitigate threats is red teaming. Red teams imitate hacking to find and exploit vulnerabilities in a network. This practice removes uncertainty about which parts of a network attackers could compromise. A central component of red teaming is lateral movement in which a red team operator moves through a network by traversing workspaces on that network. Each step in the lateral movement process requires careful decision-making given the information gleaned so far, consequences of past actions, and knowledge about workspaces on the network. The process is complex and typically requires years of experience for a red team operator to master.
Automating red teaming with machine learning, and specifically reinforcement learning (RL), could help secure a domain more efficiently and allow operators to focus on higher-level decisions. However, unlike humans, traditional RL agents forget details from past experiences. This is a problem because remaining stealthy requires remembering consequences of past actions. By adding a memory architecture to the agent, the agent can remember these consequences and make better action choices in the lateral movement environment.
I propose several variations of Long-Short-Term Memory (LSTM), transformers, and Hierarchical Chunk Attention Memory (HCAM), which help the agents to better remember past events inside a memory-enhanced lateral movement simulation. I compare the performance of a control agent, an RL agent with a linear neural network, to the performance of memory agents, RL agents with architectures capable of determining dependencies on past events. I test the agents on a control environment that does not include a memory task, and a memory environment that does.
Agents with the memory architectures perform better than the control agent on the memory environment, at varying levels. I show that agents with an LSTM outperform the control agent on the memory environment by about 25%, matching the performance of the control agent on the control environment. While the HCAM and transformer agents do not perform as well as the LSTM agents, they still show the ability to slightly outperform the control agents on the memory environment. They also show potential for performing well in more generic memory tasks.
Date issued
2024-02Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology