The MIT Libraries is completing a major upgrade to DSpace@MIT.
Starting May 5 2026, DSpace will remain functional, viewable, searchable, and downloadable, however, you will not be able to edit existing collections or add new material.
We are aiming to have full functionality restored by May 18, 2026, but intermittent service interruptions may occur.
Please email dspace-lib@mit.edu with any questions.
Thank you for your patience as we implement this important upgrade.
Future of Personalized, Aligned Language Models
| dc.contributor.advisor | Agrawal, Pulkit | |
| dc.contributor.author | Han, Seungwook | |
| dc.date.accessioned | 2025-11-17T19:09:34Z | |
| dc.date.available | 2025-11-17T19:09:34Z | |
| dc.date.issued | 2025-05 | |
| dc.date.submitted | 2025-08-14T19:31:54.764Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/163724 | |
| dc.description.abstract | Aligning Large Language Models (LLMs) to cater to different human preferences, learning new skills, and unlearning harmful behavior is an important problem. Search-based methods, such as Best-of-N or Monte-Carlo Tree Search, are effective, but impractical for LLM adaptation due to their high inference cost. On the other hand, using Reinforcement Learning (RL) for adaptation is computationally efficient, but performs worse due to the optimization challenges in co-training the value function and the policy. We present a new framework for reward optimization, Value Augmented Sampling (VAS), that can maximize different reward functions using data sampled from only the initial, frozen LLM. VAS solves for the optimal reward-maximizing policy without co-training the policy and the value function, making the optimization stable, outperforming established baselines, such as PPO and DPO, on standard benchmarks, and achieving comparable results to Best-of-128 with lower inference cost. Unlike existing RL methods that require changing the weights of the LLM, VAS does not require access to the weights of the pre-trained LLM. Thus, it can even adapt LLMs (e.g., ChatGPT), which are available only as APIs. In addition, our algorithm unlocks the new capability of composing several rewards and controlling the extent of each one during deployment time. By bringing together stability, flexibility, and efficiency, we explore the future of aligned, personalized language models that can be adapted seamlessly to meet a wide spectrum of human preferences. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | In Copyright - Educational Use Permitted | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
| dc.title | Future of Personalized, Aligned Language Models | |
| dc.type | Thesis | |
| dc.description.degree | S.M. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Master | |
| thesis.degree.name | Master of Science in Electrical Engineering and Computer Science |
