Deep Attentional Modulation for Zero-shot Learning in Object Recognition

Singh, Aaditya

dc.contributor.advisor	Katz, Boris
dc.contributor.author	Singh, Aaditya
dc.date.accessioned	2022-01-14T14:50:46Z
dc.date.available	2022-01-14T14:50:46Z
dc.date.issued	2021-06
dc.date.submitted	2021-06-17T20:14:23.644Z
dc.identifier.uri	https://hdl.handle.net/1721.1/139113
dc.description.abstract	In the human brain, top-down attention plays a crucial role in the human ability to recognize seemingly infinite visual concepts using the same visual pathway. Even more impressive, humans have the ability to recognize objects from just a description (zero-shot) or a few examples (few-shot). Traditionally, artificial neural networks have struggled at reproducing this ability, with large performance drops in the zero-and few-shot domains caused by overfitting. Most methods are focusing on learning a good, fixed feature extractor, then tying those features to new classes using linear transformations, which are less prone to overfitting on few examples. On the opposite side of this spectrum of simpler models are meta-learning techniques that finetune whole feature extractors to fit the few examples. While both of these methods have shown reasonable success, we believe that a middle ground, taking into account inductive biases inspired biological attention, can lead to improved performance. In this work, we study the use of top-down attentional modulation, already shown to be useful in visual question answering, in the domain of zero- and few-shot object recognition. We find that deep modulation can be critical in distinguishing unseen classes from previously seen classes in the zero-shot setting, and also provides gains in distinguishing between unseen classes in the few-shot domain. We hope that the insights brought to light in this work can contribute to growing need for computer vision systems that generalize to novel concepts and new environments.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Deep Attentional Modulation for Zero-shot Learning in Object Recognition
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: Singh-aaditya-meng-eecs-2021-t ...
Size:: 2.835Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record