Transformer Module Networks for Systematic Generalization in Visual Question Answering

Yamada, Moyuru; D'Amario, Vanessa; Takemoto, Kentaro; Boix, Xavier; Sasaki, Tomotake

dc.contributor.author	Yamada, Moyuru
dc.contributor.author	D'Amario, Vanessa
dc.contributor.author	Takemoto, Kentaro
dc.contributor.author	Boix, Xavier
dc.contributor.author	Sasaki, Tomotake
dc.date.accessioned	2022-02-03T18:30:49Z
dc.date.available	2022-02-03T18:30:49Z
dc.date.issued	2022-02-03
dc.identifier.uri	https://hdl.handle.net/1721.1/139843
dc.description.abstract	Transformer-based models achieve great performance on Visual Question Answering (VQA). How- ever, when we evaluate them on systematic generalization, i.e., handling novel combinations of known concepts, their performance degrades. Neural Module Networks (NMNs) are a promising approach for systematic generalization that consists on composing modules, i.e., neural networks that tackle a sub-task. Inspired by Transformers and NMNs, we propose Transformer Module Network (TMN), a novel Transformer-based model for VQA that dynamically composes modules into a question-specific Transformer network. TMNs achieve state-of-the-art systematic generalization performance in three VQA datasets, namely, CLEVR-CoGenT, CLOSURE and GQA-SGL, in some cases improving more than 30% over standard Transformers.	en_US
dc.description.sponsorship	This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.	en_US
dc.publisher	Center for Brains, Minds and Machines (CBMM)	en_US
dc.relation.ispartofseries	CBMM Memo;121
dc.title	Transformer Module Networks for Systematic Generalization in Visual Question Answering	en_US
dc.type	Article	en_US
dc.type	Technical Report	en_US
dc.type	Other	en_US

Files in this item

Name:: CBMM-Memo-121.pdf
Size:: 1.060Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

CBMM Memo Series

Show simple item record