Show simple item record

dc.contributor.authorBerzak, Yevgeni
dc.contributor.authorBarbu, Andrei
dc.contributor.authorHarari, Daniel
dc.contributor.authorKatz, Boris
dc.contributor.authorUllman, Shimon
dc.description.abstractUnderstanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.en_US
dc.description.sponsorshipThis work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF – 1231216.en_US
dc.publisherCenter for Brains, Minds and Machines (CBMM), arXiven_US
dc.relation.ispartofseriesCBMM Memo Series;051
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.subjectComputer Languageen_US
dc.subjectLanguage understandingen_US
dc.subjectComputer visionen_US
dc.titleDo You See What I Mean? Visual Resolution of Linguistic Ambiguitiesen_US
dc.typeTechnical Reporten_US
dc.typeWorking Paperen_US
dc.identifier.citationarXiv:1603.08079v1 [cs.CV]en_US

Files in this item


This item appears in the following Collection(s)

Show simple item record