MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Generative Discovery via Reinforcement Learning

Author(s)
Hong, Zhang-Wei
Thumbnail
DownloadThesis PDF (5.058Mb)
Advisor
Agrawal, Pulkit
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Discovering new knowledge is crucial for technological advancement and mirrors how humans and animals learn new skills—often through trial and error. Ancient humans, for example, discovered fire by experimenting with different methods, and children learned to walk and use tools through repeated attempts and failures. In chemistry, scientists find new catalysts by testing various compositions. But how exactly do humans use trial-and-error to improve existing solutions (like learning more efficient ways to walk or synthesizing novel compounds)? Can we design computational models that mimic or exceed human discovery? Such computational models could greatly accelerate progress in science and engineering since they can automate or assist human scientists’ and engineers’ works and discover new knowledge more efficiently (e.g., new compounds, streamlining the robot controller design, etc.). Reinforcement learning (RL) is well-suited for discovery tasks because it enables machines to learn through trial and error. My work overcomes the following major limitation of today’s RL algorithms and thereby advances their discovery potential: Mitigate the bias of reward shaping. RL relies on reward signals from trial-anderror experience, but these signals can be sparse, meaning they are only provided once a desired solution is found and otherwise zero. Most trials, therefore, offer little to no feedback. A common strategy to improve performance under sparse rewards is to provide additional hints (i.e., reward shaping) to guide RL algorithms. However, if these hints are inaccurate, they can steer the algorithm toward worse solutions than those without them. I propose a new RL framework that can be combined with any standard RL algorithm, ensuring that training with hints finds better solutions instead of harming performance. Learning with sub-optimal data. RL can learn not only from online interaction with the world but also from datasets of logged experiences. For expensive or time-consuming tasks like material discovery or robot learning, offline RL could be preferred because it leverages existing data rather than requires new interaction with the world. However, such datasets could contain mostly low-reward solutions, which limits the offline RL algorithm’s performance in finding solutions better than what’s in the dataset (as we show later in this thesis). I introduce sample reweighting strategies that reweight the dataset in a way that current offline RL algorithms trained with the weighted samples are able to discover solutions far better than what’s in the dataset, even if low-reward solutions predominated the dataset. Safety via Diversity. Standard RL algorithms aim to find a single “best” solution. Yet, in many discovery problems—such as drug development—it is more valuable to generate multiple high-rewards solutions with distinct properties (i.e., diversity) than to focus on only one. I study this problem in an emerging discovery task-red-teaming large language models (LLMs). In red-teaming, we desire diverse prompts that trigger undesired outputs from target language models. Current approaches leveraging RL to train an LLM to red-team another one, but they fall short of the diversity of generated prompts and often converge to a few prompts that consistently trigger undesired outputs. I propose to reward the agent to maximize the diversity of generated prompts, which also improves the the success of prompts at triggering undesired outputs from the target LLM.
Date issued
2025-02
URI
https://hdl.handle.net/1721.1/159135
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.