Baba is AI: A Grounded Benchmark for
Compositional Generalization in Dynamic Rule
Systems

Jens, Meagan

dc.contributor.advisor	Katz, Boris
dc.contributor.author	Jens, Meagan
dc.date.accessioned	2023-07-31T19:45:54Z
dc.date.available	2023-07-31T19:45:54Z
dc.date.issued	2023-06
dc.date.submitted	2023-06-06T16:35:02.485Z
dc.identifier.uri	https://hdl.handle.net/1721.1/151519
dc.description.abstract	People leverage the compositional nature of their environment to generalize to new scenarios. For example, if you understand the meaning of the verb "to sing" and the adverb "loudly," then you can determine the meaning of the novel phrase "to sing loudly" from these known components. This process is known as generalization through systematic compositionality. Developing agents that can use systematic compositionality to generalize to new conditions has been a long-standing problem in AI. In response to this challenge, grounded benchmarks have been developed to evaluate an agent’s ability to generalize using this approach. However, there are key problems with the current grounded benchmarks. To start, these benchmarks are ad-hoc. They propose sets of tasks without any formalism, so it is challenging to determine whether or not these tasks exhaustively explore the set of possible generalizations. This lack of structure also makes it challenging to compare benchmarks concretely. Another key issue with these benchmarks is that their environments are defined by a fixed set of rules and a small set of objects whose states can be changed. By strictly delineating the rules of these environments, we have overlooked a critical rule-understanding and manipulation capability that agents will need in the real world. Our approach to addressing these issues is twofold. First, we define a formalism to investigate generalization mathematically as a function of the environment architecture. We then use this formalism to create a novel type of generalization benchmark for agents that must learn to change the rules of their environments. Lastly, we run both supervised learning and reinforcement learning models on a small subset of the benchmark tasks to validate our environment and pinpoint key conditions under which agents fail to generalize.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Baba is AI: A Grounded Benchmark for Compositional Generalization in Dynamic Rule Systems
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: jens-meaganj-meng-eecs-2023-th ...
Size:: 1.546Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record

Baba is AI: A Grounded Benchmark for Compositional Generalization in Dynamic Rule Systems

Files in this item

This item appears in the following Collection(s)