dc.contributor.advisor | Katz, Boris | |
dc.contributor.author | Sleeper, Dylan | |
dc.date.accessioned | 2022-06-15T13:11:29Z | |
dc.date.available | 2022-06-15T13:11:29Z | |
dc.date.issued | 2022-02 | |
dc.date.submitted | 2022-02-22T18:32:10.921Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/143309 | |
dc.description.abstract | In this work, we collect a new human annotated dataset called Grounded SCAN Human (gSCAN Human) as an extension of the original Grounded SCAN (gSCAN) dataset. The original gSCAN dataset was created to test various compositional generalizations by holding out certain examples during train time. During test time, models must zero-shot execute commands that require the agent to move in new directions, commands that contain novel combinations of objects and adjectives, and other such generalizations in different test sets called splits. However, gSCAN does not contain splits that test zero-shot generalizations to new sentence structures and a whole new vocabulary. The gSCAN Human dataset was created to test these generalizations: can a model trained using a simple grammar generalize to human annotations? We collected and verified a total of 1, 391 human annotations across all of the gSCAN splits (excluding the test and dev split) and evaluated various models on each of the splits. We test the original gSCAN baseline with several modifications, including the baseline with a transformer replacing the encoder, and one with early multimodal fusion of the sentence encoding with the visual embedding. We also test a multimodal transformer similar to VilBERT, which is the state of the art on the original gSCAN splits. We find that the models are somewhat robust to varying sentence structure and new vocabulary; however the models are far less successful given a combination of the two, as evaluated by the human data. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | In Copyright - Educational Use Permitted | |
dc.rights | Copyright MIT | |
dc.rights.uri | http://rightsstatements.org/page/InC-EDU/1.0/ | |
dc.title | Grounded SCAN Human: A Benchmark for Zero-Shot Generalizations | |
dc.type | Thesis | |
dc.description.degree | M.Eng. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |