Deep Attentional Modulation for Zero-shot Learning in Object Recognition
Author(s)
Singh, Aaditya
DownloadThesis PDF (2.835Mb)
Advisor
Katz, Boris
Terms of use
Metadata
Show full item recordAbstract
In the human brain, top-down attention plays a crucial role in the human ability to recognize seemingly infinite visual concepts using the same visual pathway. Even more impressive, humans have the ability to recognize objects from just a description (zero-shot) or a few examples (few-shot). Traditionally, artificial neural networks have struggled at reproducing this ability, with large performance drops in the zero-and few-shot domains caused by overfitting. Most methods are focusing on learning a good, fixed feature extractor, then tying those features to new classes using linear transformations, which are less prone to overfitting on few examples. On the opposite side of this spectrum of simpler models are meta-learning techniques that finetune whole feature extractors to fit the few examples. While both of these methods have shown reasonable success, we believe that a middle ground, taking into account inductive biases inspired biological attention, can lead to improved performance. In this work, we study the use of top-down attentional modulation, already shown to be useful in visual question answering, in the domain of zero- and few-shot object recognition. We find that deep modulation can be critical in distinguishing unseen classes from previously seen classes in the zero-shot setting, and also provides gains in distinguishing between unseen classes in the few-shot domain. We hope that the insights brought to light in this work can contribute to growing need for computer vision systems that generalize to novel concepts and new environments.
Date issued
2021-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology