Now showing items 1-13 of 13

    • Can a biologically-plausible hierarchy e ectively replace face detection, alignment, and recognition pipelines? 

      Liao, Qianli; Leibo, Joel Z; Mroueh, Youssef; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2014-03-27)
      The standard approach to unconstrained face recognition in natural photographs is via a detection, alignment, recognition pipeline. While that approach has achieved impressive results, there are several reasons to be ...
    • Complexity of Representation and Inference in Compositional Models with Part Sharing 

      Yuille, Alan L.; Mottaghi, Roozbeh (Center for Brains, Minds and Machines (CBMM), arXiv, 2015-05-05)
      This paper performs a complexity analysis of a class of serial and parallel compositional models of multiple objects and shows that they enable efficient representation and rapid inference. Compositional models are generative ...
    • Do You See What I Mean? Visual Resolution of Linguistic Ambiguities 

      Berzak, Yevgeni; Barbu, Andrei; Harari, Daniel; Katz, Boris; Ullman, Shimon (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-06-10)
      Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a ...
    • Fast, invariant representation for human action in the visual system 

      Isik, Leyla; Tacchetti, Andrea; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-01-06)
      The ability to recognize the actions of others from visual input is essential to humans' daily lives. The neural computations underlying action recognition, however, are still poorly understood. We use magnetoencephalography ...
    • The Invariance Hypothesis Implies Domain-Specific Regions in Visual Cortex 

      Leibo, Joel Z; Liao, Qianli; Anselmi, Fabio; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), bioRxiv, 2015-04-26)
      Is visual cortex made up of general-purpose information processing machinery, or does it consist of a collection of specialized modules? If prior knowledge, acquired from learning a set of objects is only transferable to ...
    • Learning Real and Boolean Functions: When Is Deep Better Than Shallow 

      Mhaskar, Hrushikesh; Liao, Qianli; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-03-08)
      We describe computational tasks - especially in vision - that correspond to compositional/hierarchical functions. While the universal approximation property holds both for hierarchical and shallow networks, we prove that ...
    • Neural tuning size is a key factor underlying holistic face processing 

      Tan, Cheston; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2014-06-14)
      Faces are a class of visual stimuli with unique significance, for a variety of reasons. They are ubiquitous throughout the course of a person’s life, and face recognition is crucial for daily social interaction. Faces are ...
    • Representation Learning in Sensory Cortex: a theory 

      Anselmi, Fabio; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2014-11-14)
      We review and apply a computational theory of the feedforward path of the ventral stream in visual cortex based on the hypothesis that its main function is the encoding of invariant representations of images. A key ...
    • Robust Estimation of 3D Human Poses from a Single Image 

      Wang, Chunyu; Wang, Yizhou; Lin, Zhouchen; Yuille, Alan L.; Gao, Wen (Center for Brains, Minds and Machines (CBMM), arXiv, 2014-06-10)
      Human pose estimation is a key step to action recognition. We propose a method of estimating 3D human poses from a single image, which works in conjunction with an existing 2D pose/joint detector. 3D pose estimation is ...
    • Seeing What You’re Told: Sentence-Guided Activity Recognition In Video 

      Siddharth, Narayanaswamy; Barbu, Andrei; Siskind, Jeffrey Mark (Center for Brains, Minds and Machines (CBMM), arXiv, 2014-05-29)
      We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, ...
    • Unsupervised learning of clutter-resistant visual representations from natural videos 

      Liao, Qianli; Leibo, Joel Z; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2015-04-27)
      Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning ...
    • View-tolerant face recognition and Hebbian learning imply mirror-symmetric neural tuning to head orientation 

      Leibo, Joel Z.; Liao, Qianli; Freiwald, Winrich; Anselmi, Fabio; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-06-03)
      The primate brain contains a hierarchy of visual areas, dubbed the ventral stream, which rapidly computes object representations that are both specific for object identity and relatively robust against identity-preserving ...
    • When Computer Vision Gazes at Cognition 

      Gao, Tao; Harari, Daniel; Tenenbaum, Joshua; Ullman, Shimon (Center for Brains, Minds and Machines (CBMM), arXiv, 2014-12-12)
      Joint attention is a core, early-developing form of social interaction. It is based on our ability to discriminate the third party objects that other people are looking at. While it has been shown that people can accurately ...