Exploring Memory in Reinforcement-Learned Agents for Smarter Lateral Movement

Johnson Schofield, Catherine

Author(s)

Johnson Schofield, Catherine

DownloadThesis PDF (9.153Mb)

Advisor

Stephenson, William

Ross, Dennis

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Computer networks are the backbone of most organizations’ technology infrastructure. Yet, they remain susceptible to many hidden vulnerabilities. One proactive approach to uncover and mitigate threats is red teaming. Red teams imitate hacking to find and exploit vulnerabilities in a network. This practice removes uncertainty about which parts of a network attackers could compromise. A central component of red teaming is lateral movement in which a red team operator moves through a network by traversing workspaces on that network. Each step in the lateral movement process requires careful decision-making given the information gleaned so far, consequences of past actions, and knowledge about workspaces on the network. The process is complex and typically requires years of experience for a red team operator to master. Automating red teaming with machine learning, and specifically reinforcement learning (RL), could help secure a domain more efficiently and allow operators to focus on higher-level decisions. However, unlike humans, traditional RL agents forget details from past experiences. This is a problem because remaining stealthy requires remembering consequences of past actions. By adding a memory architecture to the agent, the agent can remember these consequences and make better action choices in the lateral movement environment. I propose several variations of Long-Short-Term Memory (LSTM), transformers, and Hierarchical Chunk Attention Memory (HCAM), which help the agents to better remember past events inside a memory-enhanced lateral movement simulation. I compare the performance of a control agent, an RL agent with a linear neural network, to the performance of memory agents, RL agents with architectures capable of determining dependencies on past events. I test the agents on a control environment that does not include a memory task, and a memory environment that does. Agents with the memory architectures perform better than the control agent on the memory environment, at varying levels. I show that agents with an LSTM outperform the control agent on the memory environment by about 25%, matching the performance of the control agent on the control environment. While the HCAM and transformer agents do not perform as well as the LSTM agents, they still show the ability to slightly outperform the control agents on the memory environment. They also show potential for performing well in more generic memory tasks.

Date issued

2024-02

URI

https://hdl.handle.net/1721.1/153857

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses