LLM-Directed Agent Models in Cyberspace

Laney, Samuel P.

dc.contributor.advisor	O'Reilly, Una-May
dc.contributor.advisor	Vilas-Boas, Felipe
dc.contributor.advisor	Yu, Chris
dc.contributor.author	Laney, Samuel P.
dc.date.accessioned	2024-08-21T18:54:24Z
dc.date.available	2024-08-21T18:54:24Z
dc.date.issued	2024-05
dc.date.submitted	2024-07-10T12:59:43.623Z
dc.identifier.uri	https://hdl.handle.net/1721.1/156291
dc.description.abstract	Network penetration testing, a proactive method for identifying vulnerabilities in cy- berspace, has long been the domain of human experts. However, rapid advancements in machine learning have opened up new possibilities for automating many of these tasks. This thesis aims to explore the application of Large Language Models (LLMs) for automating penetration tests and Cyber Capture the Flag (CTF) challenges, bridging the gap between static tools and dynamic human intuition in cybersecurity. This work provides an evaluation framework for assessing the performance of LLMs in autonomously solving CTF challenges, with an emphasis on understanding the capabilities, limitations, and best prompting strategies for LLMs in this domain. Notably, this thesis presents an agent configuration that offers a 102% improvement in challenge completion on a database of PicoCTF challenges compared to the published baseline. By analyzing a variety of agent strategies, response formats, and historical action representations in the context of CTF challenges, this work aims to provide insights into the best practices and limitations in leveraging LLMs for cybersecurity tasks. Additionally, this work proposes a hierarchical architecture to guide an LLM-enabled agent in performing complex, multi-step penetration testing tasks with strategic foresight. This proof of concept approach shows success in entry level challenges. While LLMs exhibit impressive capabilities, they are limited out of the box in their ability to solve complex, multi-step tasks requiring exploration, necessitating approaches such as those described in this work to improve performance in these areas.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	LLM-Directed Agent Models in Cyberspace
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcid	0000-0002-2577-3312
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: laney-splaney-sm-eecs-2024-the ...
Size:: 2.742Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record