Show simple item record

dc.contributor.advisorO'Reilly, Una-May
dc.contributor.advisorVilas-Boas, Felipe
dc.contributor.advisorYu, Chris
dc.contributor.authorLaney, Samuel P.
dc.date.accessioned2024-08-21T18:54:24Z
dc.date.available2024-08-21T18:54:24Z
dc.date.issued2024-05
dc.date.submitted2024-07-10T12:59:43.623Z
dc.identifier.urihttps://hdl.handle.net/1721.1/156291
dc.description.abstractNetwork penetration testing, a proactive method for identifying vulnerabilities in cy- berspace, has long been the domain of human experts. However, rapid advancements in machine learning have opened up new possibilities for automating many of these tasks. This thesis aims to explore the application of Large Language Models (LLMs) for automating penetration tests and Cyber Capture the Flag (CTF) challenges, bridging the gap between static tools and dynamic human intuition in cybersecurity. This work provides an evaluation framework for assessing the performance of LLMs in autonomously solving CTF challenges, with an emphasis on understanding the capabilities, limitations, and best prompting strategies for LLMs in this domain. Notably, this thesis presents an agent configuration that offers a 102% improvement in challenge completion on a database of PicoCTF challenges compared to the published baseline. By analyzing a variety of agent strategies, response formats, and historical action representations in the context of CTF challenges, this work aims to provide insights into the best practices and limitations in leveraging LLMs for cybersecurity tasks. Additionally, this work proposes a hierarchical architecture to guide an LLM-enabled agent in performing complex, multi-step penetration testing tasks with strategic foresight. This proof of concept approach shows success in entry level challenges. While LLMs exhibit impressive capabilities, they are limited out of the box in their ability to solve complex, multi-step tasks requiring exploration, necessitating approaches such as those described in this work to improve performance in these areas.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleLLM-Directed Agent Models in Cyberspace
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcid0000-0002-2577-3312
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record