MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Rapid Visual Object Learning in Humans is Explainable by Low-Dimensional Image Representations

Author(s)
Lee, Michael Jinsuk
Thumbnail
DownloadThesis PDF (41.93Mb)
Advisor
DiCarlo, James J.
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
How humans learn to recognize new objects is an open problem. In this thesis, we consider one class of theories for how this is accomplished: humans re-represent incoming retinal images in a stable, multidimensional Euclidean space, and build linear decoders in this space for new object categories from image exemplars. In Part I, we empirically characterize human learning behavior over a battery of different learning subtasks, and find humans rapidly learn new objects from a small number of examples. We then build neurally-mechanistic, end-to-end models of object learning based on recent advances in image-computable models of ventral stream representations. We point to shortcomings of these models, including the fact none of these models actually match the ability to human few-shot learn. In Part II, we analyze this few-shot learning failure from a theoretical perspective, and show that a geometric property of image representations — variation in directions orthogonal to the one needed to linearly solve the task — slows learning. Given this observation, we motivate the hypothesis that current models of visual processing represent images along a much higher number of dimensions, relative to humans. In Part III, we identify (and remove) these hypothesized excess dimensions by developing the "perceptual alignment" method, where we combine a classical approach in experimental psychology — inferring internal stimulus representations using measurements of human similarity judgements — with deep learning methods, and create new, lower-dimensional, image-computable representations which capture patterns of human similarity judgements. Finally, we show models based on these new representations predict the ability of humans to few-shot learn across a variety of object domains. They also successfully predict the inability of humans to learn tasks based on representational dimensions that are present in baseline models but absent in perceptually aligned ones. Taken together, this thesis shows specific, neurally-mechanistic models based on a simple theory of learning are strong accounts of how humans rapidly learn new objects.
Date issued
2022-09
URI
https://hdl.handle.net/1721.1/147557
Department
Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.