LLM-Directed Agent Models in Cyberspace

Laney, Samuel P.

Author(s)

Laney, Samuel P.

DownloadThesis PDF (2.742Mb)

Advisor

O'Reilly, Una-May

Vilas-Boas, Felipe

Yu, Chris

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Network penetration testing, a proactive method for identifying vulnerabilities in cy- berspace, has long been the domain of human experts. However, rapid advancements in machine learning have opened up new possibilities for automating many of these tasks. This thesis aims to explore the application of Large Language Models (LLMs) for automating penetration tests and Cyber Capture the Flag (CTF) challenges, bridging the gap between static tools and dynamic human intuition in cybersecurity. This work provides an evaluation framework for assessing the performance of LLMs in autonomously solving CTF challenges, with an emphasis on understanding the capabilities, limitations, and best prompting strategies for LLMs in this domain. Notably, this thesis presents an agent configuration that offers a 102% improvement in challenge completion on a database of PicoCTF challenges compared to the published baseline. By analyzing a variety of agent strategies, response formats, and historical action representations in the context of CTF challenges, this work aims to provide insights into the best practices and limitations in leveraging LLMs for cybersecurity tasks. Additionally, this work proposes a hierarchical architecture to guide an LLM-enabled agent in performing complex, multi-step penetration testing tasks with strategic foresight. This proof of concept approach shows success in entry level challenges. While LLMs exhibit impressive capabilities, they are limited out of the box in their ability to solve complex, multi-step tasks requiring exploration, necessitating approaches such as those described in this work to improve performance in these areas.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156291

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses