MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Deep Attentional Modulation for Zero-shot Learning in Object Recognition

Author(s)
Singh, Aaditya
Thumbnail
DownloadThesis PDF (2.835Mb)
Advisor
Katz, Boris
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
In the human brain, top-down attention plays a crucial role in the human ability to recognize seemingly infinite visual concepts using the same visual pathway. Even more impressive, humans have the ability to recognize objects from just a description (zero-shot) or a few examples (few-shot). Traditionally, artificial neural networks have struggled at reproducing this ability, with large performance drops in the zero-and few-shot domains caused by overfitting. Most methods are focusing on learning a good, fixed feature extractor, then tying those features to new classes using linear transformations, which are less prone to overfitting on few examples. On the opposite side of this spectrum of simpler models are meta-learning techniques that finetune whole feature extractors to fit the few examples. While both of these methods have shown reasonable success, we believe that a middle ground, taking into account inductive biases inspired biological attention, can lead to improved performance. In this work, we study the use of top-down attentional modulation, already shown to be useful in visual question answering, in the domain of zero- and few-shot object recognition. We find that deep modulation can be critical in distinguishing unseen classes from previously seen classes in the zero-shot setting, and also provides gains in distinguishing between unseen classes in the few-shot domain. We hope that the insights brought to light in this work can contribute to growing need for computer vision systems that generalize to novel concepts and new environments.
Date issued
2021-06
URI
https://hdl.handle.net/1721.1/139113
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.