CBMM Memo Serieshttps://hdl.handle.net/1721.1/885312020-08-15T05:51:06Z2020-08-15T05:51:06ZOn the Capability of Neural Networks to Generalize to Unseen Category-Pose CombinationsMadan, SpandanHenry, TimothyDozier, JamellHo, HelenBhandari, NishchalSasaki, TomotakeDurand, FredoPfister, HanspeterBoix, Xavierhttps://hdl.handle.net/1721.1/1262622020-07-21T03:15:10Z2020-07-17T00:00:00ZOn the Capability of Neural Networks to Generalize to Unseen Category-Pose Combinations
Madan, Spandan; Henry, Timothy; Dozier, Jamell; Ho, Helen; Bhandari, Nishchal; Sasaki, Tomotake; Durand, Fredo; Pfister, Hanspeter; Boix, Xavier
Recognizing an object’s category and pose lies at the heart of visual understanding. Recent works suggest that deep neural networks (DNNs) often fail to generalize to category-pose combinations not seen during training. However, it is unclear when and how such generalization may be possible. Does the number of combinations seen during training impact generalization? Is it better to learn category and pose in separate networks, or in a single shared network? Furthermore, what are the neural mechanisms that drive the network’s generalization? In this paper, we answer these questions by analyzing state-of-the-art DNNs trained to recognize both object category and pose (position, scale, and 3D viewpoint) with quantitative control over the number of category-pose combinations seen during training. We also investigate the emergence of two types of specialized neurons that can explain generalization to unseen combinations—neurons selective to category and invariant to pose, and vice versa. We perform experiments on MNIST extended with position or scale, the iLab dataset with vehicles at different viewpoints, and a challenging new dataset for car model recognition and viewpoint estimation that we introduce in this paper, the Biased-Cars dataset. Our results demonstrate that as the number of combinations seen during training increases, networks generalize better to unseen category-pose combinations, facilitated by an increase in the selectivity and invariance of individual neurons. We find that learning category and pose in separate networks compared to a shared one leads to an increase in such selectivity and invariance, as separate networks are not forced to preserve information about both category and pose. This enables separate networks to significantly outperform shared ones at predicting unseen category-pose combinations.
2020-07-17T00:00:00ZLoss landscape: SGD can have a better view than GDPoggio, TomasoCooper, Yaimhttps://hdl.handle.net/1721.1/1260412020-07-31T10:40:14Z2020-07-01T00:00:00ZLoss landscape: SGD can have a better view than GD
Poggio, Tomaso; Cooper, Yaim
Consider a loss function L = ni=1 l2i with li = f(xi) − yi, where f(x) is a deep feedforward network with R layers, no bias terms and scalar output. Assume the network is overparametrized that is, d >> n, where d is the number of parameters and n is the number of data points. The networks are assumed to interpolate the training data (e.g. the minimum of L is zero). If GD converges, it will converge to a critical point of L, namely a solution of ni=1 li∇li = 0. There are two kinds of critical points - those for which each term of the above sum vanishes individually, and those for which the expression only vanishes when all the terms are summed. The main claim in this note is that while GD can converge to both types of critical points, SGD can only converge to the first kind, which include all global minima.
2020-07-01T00:00:00ZBiologically Inspired Mechanisms for Adversarial RobustnessVuyyuru Reddy, ManishBanburski, AndrzejPlant, NishkaPoggio, Tomasohttps://hdl.handle.net/1721.1/1259812020-07-31T10:36:47Z2020-06-23T00:00:00ZBiologically Inspired Mechanisms for Adversarial Robustness
Vuyyuru Reddy, Manish; Banburski, Andrzej; Plant, Nishka; Poggio, Tomaso
A convolutional neural network strongly robust to adversarial perturbations at reasonable computational and performance cost has not yet been demonstrated. The primate visual ventral stream seems to be robust to small perturbations in visual stimuli but the underlying mechanisms that give rise to this robust perception are not understood. In this work, we investigate the role of two biologically plausible mechanisms in adversarial robustness. We demonstrate that the non-uniform sampling performed by the primate retina and the presence of multiple receptive fields with a range of receptive field sizes at each eccentricity improve the robustness of neural networks to small adversarial perturbations. We verify that these two mechanisms do not suffer from gradient obfuscation and study their contribution to adversarial robustness through ablation studies.
2020-06-23T00:00:00ZHierarchically Local Tasks and Deep Convolutional NetworksDeza, ArturoLiao, QianliBanburski, AndrzejPoggio, Tomasohttps://hdl.handle.net/1721.1/1259802020-07-31T10:15:34Z2020-06-24T00:00:00ZHierarchically Local Tasks and Deep Convolutional Networks
Deza, Arturo; Liao, Qianli; Banburski, Andrzej; Poggio, Tomaso
The main success stories of deep learning, starting with ImageNet, depend on convolutional networks, which on certain tasks perform significantly better than traditional shallow classifiers, such as support vector machines. Is there something special about deep convolutional networks that other learning machines do not possess? Recent results in approximation theory have shown that there is an exponential advantage of deep convolutional-like networks in approximating functions with hierarchical locality in their compositional structure. These mathematical results, however, do not say which tasks are expected to have input-output functions with hierarchical locality. Among all the possible hierarchically local tasks in vision, text and speech we explore a few of them experimentally by studying how they are affected by disrupting locality in the input images. We also discuss a taxonomy of tasks ranging from local, to hierarchically local, to global and make predictions about the type of networks required to perform efficiently on these different types of tasks.
2020-06-24T00:00:00Z