Show simple item record

dc.contributor.advisorKatz, Boris
dc.contributor.authorSingh, Aaditya
dc.date.accessioned2022-01-14T14:50:46Z
dc.date.available2022-01-14T14:50:46Z
dc.date.issued2021-06
dc.date.submitted2021-06-17T20:14:23.644Z
dc.identifier.urihttps://hdl.handle.net/1721.1/139113
dc.description.abstractIn the human brain, top-down attention plays a crucial role in the human ability to recognize seemingly infinite visual concepts using the same visual pathway. Even more impressive, humans have the ability to recognize objects from just a description (zero-shot) or a few examples (few-shot). Traditionally, artificial neural networks have struggled at reproducing this ability, with large performance drops in the zero-and few-shot domains caused by overfitting. Most methods are focusing on learning a good, fixed feature extractor, then tying those features to new classes using linear transformations, which are less prone to overfitting on few examples. On the opposite side of this spectrum of simpler models are meta-learning techniques that finetune whole feature extractors to fit the few examples. While both of these methods have shown reasonable success, we believe that a middle ground, taking into account inductive biases inspired biological attention, can lead to improved performance. In this work, we study the use of top-down attentional modulation, already shown to be useful in visual question answering, in the domain of zero- and few-shot object recognition. We find that deep modulation can be critical in distinguishing unseen classes from previously seen classes in the zero-shot setting, and also provides gains in distinguishing between unseen classes in the few-shot domain. We hope that the insights brought to light in this work can contribute to growing need for computer vision systems that generalize to novel concepts and new environments.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleDeep Attentional Modulation for Zero-shot Learning in Object Recognition
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record