MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • Center for Brains, Minds & Machines
  • Publications
  • CBMM Memo Series
  • View Item
  • DSpace@MIT Home
  • Center for Brains, Minds & Machines
  • Publications
  • CBMM Memo Series
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Transformer Module Networks for Systematic Generalization in Visual Question Answering

Author(s)
Yamada, Moyuru; D'Amario, Vanessa; Takemoto, Kentaro; Boix, Xavier; Sasaki, Tomotake
Thumbnail
DownloadCBMM-Memo-121.pdf (1.060Mb)
Metadata
Show full item record
Abstract
Transformer-based models achieve great performance on Visual Question Answering (VQA). How- ever, when we evaluate them on systematic generalization, i.e., handling novel combinations of known concepts, their performance degrades. Neural Module Networks (NMNs) are a promising approach for systematic generalization that consists on composing modules, i.e., neural networks that tackle a sub-task. Inspired by Transformers and NMNs, we propose Transformer Module Network (TMN), a novel Transformer-based model for VQA that dynamically composes modules into a question-specific Transformer network. TMNs achieve state-of-the-art systematic generalization performance in three VQA datasets, namely, CLEVR-CoGenT, CLOSURE and GQA-SGL, in some cases improving more than 30% over standard Transformers.
Date issued
2022-02-03
URI
https://hdl.handle.net/1721.1/139843
Publisher
Center for Brains, Minds and Machines (CBMM)
Series/Report no.
CBMM Memo;121

Collections
  • CBMM Memo Series

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.