Show simple item record

dc.contributor.advisorKatz, Boris
dc.contributor.authorJens, Meagan
dc.date.accessioned2023-07-31T19:45:54Z
dc.date.available2023-07-31T19:45:54Z
dc.date.issued2023-06
dc.date.submitted2023-06-06T16:35:02.485Z
dc.identifier.urihttps://hdl.handle.net/1721.1/151519
dc.description.abstractPeople leverage the compositional nature of their environment to generalize to new scenarios. For example, if you understand the meaning of the verb "to sing" and the adverb "loudly," then you can determine the meaning of the novel phrase "to sing loudly" from these known components. This process is known as generalization through systematic compositionality. Developing agents that can use systematic compositionality to generalize to new conditions has been a long-standing problem in AI. In response to this challenge, grounded benchmarks have been developed to evaluate an agent’s ability to generalize using this approach. However, there are key problems with the current grounded benchmarks. To start, these benchmarks are ad-hoc. They propose sets of tasks without any formalism, so it is challenging to determine whether or not these tasks exhaustively explore the set of possible generalizations. This lack of structure also makes it challenging to compare benchmarks concretely. Another key issue with these benchmarks is that their environments are defined by a fixed set of rules and a small set of objects whose states can be changed. By strictly delineating the rules of these environments, we have overlooked a critical rule-understanding and manipulation capability that agents will need in the real world. Our approach to addressing these issues is twofold. First, we define a formalism to investigate generalization mathematically as a function of the environment architecture. We then use this formalism to create a novel type of generalization benchmark for agents that must learn to change the rules of their environments. Lastly, we run both supervised learning and reinforcement learning models on a small subset of the benchmark tasks to validate our environment and pinpoint key conditions under which agents fail to generalize.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleBaba is AI: A Grounded Benchmark for Compositional Generalization in Dynamic Rule Systems
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record