Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations

Lee, Michael Jinsuk

Author(s)

Lee, Michael Jinsuk

DownloadThesis PDF (41.93Mb)

Advisor

DiCarlo, James J.

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

How humans learn to recognize new objects is an open problem. In this thesis, we consider one class of theories for how this is accomplished: humans re-represent incoming retinal images in a stable, multidimensional Euclidean space, and build linear decoders in this space for new object categories from image exemplars. In Part I, we empirically characterize human learning behavior over a battery of different learning subtasks, and find humans rapidly learn new objects from a small number of examples. We then build neurally-mechanistic, end-to-end models of object learning based on recent advances in image-computable models of ventral stream representations. We point to shortcomings of these models, including the fact none of these models actually match the ability to human few-shot learn. In Part II, we analyze this few-shot learning failure from a theoretical perspective, and show that a geometric property of image representations — variation in directions orthogonal to the one needed to linearly solve the task — slows learning. Given this observation, we motivate the hypothesis that current models of visual processing represent images along a much higher number of dimensions, relative to humans. In Part III, we identify (and remove) these hypothesized excess dimensions by developing the "perceptual alignment" method, where we combine a classical approach in experimental psychology — inferring internal stimulus representations using measurements of human similarity judgements — with deep learning methods, and create new, lower-dimensional, image-computable representations which capture patterns of human similarity judgements. Finally, we show models based on these new representations predict the ability of humans to few-shot learn across a variety of object domains. They also successfully predict the inability of humans to learn tasks based on representational dimensions that are present in baseline models but absent in perceptually aligned ones. Taken together, this thesis shows specific, neurally-mechanistic models based on a simple theory of learning are strong accounts of how humans rapidly learn new objects.

Date issued

2022-09

URI

https://hdl.handle.net/1721.1/147557

Department

Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses