dc.contributor.author | Yamada, Moyuru | |
dc.contributor.author | D'Amario, Vanessa | |
dc.contributor.author | Takemoto, Kentaro | |
dc.contributor.author | Boix, Xavier | |
dc.contributor.author | Sasaki, Tomotake | |
dc.date.accessioned | 2022-02-03T18:30:49Z | |
dc.date.available | 2022-02-03T18:30:49Z | |
dc.date.issued | 2022-02-03 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/139843 | |
dc.description.abstract | Transformer-based models achieve great performance on Visual Question Answering (VQA). How- ever, when we evaluate them on systematic generalization, i.e., handling novel combinations of known concepts, their performance degrades. Neural Module Networks (NMNs) are a promising approach for systematic generalization that consists on composing modules, i.e., neural networks that tackle a sub-task. Inspired by Transformers and NMNs, we propose Transformer Module Network (TMN), a novel Transformer-based model for VQA that dynamically composes modules into a question-specific Transformer network. TMNs achieve state-of-the-art systematic generalization performance in three VQA datasets, namely, CLEVR-CoGenT, CLOSURE and GQA-SGL, in some cases improving more than 30% over standard Transformers. | en_US |
dc.description.sponsorship | This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. | en_US |
dc.publisher | Center for Brains, Minds and Machines (CBMM) | en_US |
dc.relation.ispartofseries | CBMM Memo;121 | |
dc.title | Transformer Module Networks for Systematic Generalization in Visual Question Answering | en_US |
dc.type | Article | en_US |
dc.type | Technical Report | en_US |
dc.type | Other | en_US |