CBMM Memo Serieshttp://hdl.handle.net/1721.1/885312018-10-18T10:05:26Z2018-10-18T10:05:26ZBiologically-Plausible Learning Algorithms Can Scale to Large DatasetsXiao, WillChen, HonglinLiao, QianliPoggio, Tomasohttp://hdl.handle.net/1721.1/1181952018-09-29T06:17:05Z2018-09-27T00:00:00ZBiologically-Plausible Learning Algorithms Can Scale to Large Datasets
Xiao, Will; Chen, Honglin; Liao, Qianli; Poggio, Tomaso
The backpropagation (BP) algorithm is often thought to be biologically implausible in the brain. One of the main reasons is that BP requires symmetric weight matrices in the feedforward and feed- back pathways. To address this “weight transport problem” (Grossberg, 1987), two more biologically plausible algorithms, proposed by Liao et al. (2016) and Lillicrap et al. (2016), relax BP’s weight symmetry requirements and demonstrate comparable learning capabilities to that of BP on small datasets. However, a recent study by Bartunov et al. (2018) evaluate variants of target-propagation (TP) and feedback alignment (FA) on MINIST, CIFAR, and ImageNet datasets, and find that although many of the proposed algorithms perform well on MNIST and CIFAR, they perform significantly worse than BP on ImageNet. Here, we additionally evaluate the sign-symmetry algorithm (Liao et al., 2016), which differs from both BP and FA in that the feedback and feedforward weights share signs but not magnitudes. We examine the performance of sign-symmetry and feedback alignment on ImageNet and MS COCO datasets using different network architectures (ResNet-18 and AlexNet for ImageNet, RetinaNet for MS COCO). Surprisingly, networks trained with sign-symmetry can attain classification performance approaching that of BP-trained networks. These results complement the study by Bartunov et al. (2018), and establish a new benchmark for future biologically plausible learning algorithms on more difficult datasets and more complex architectures.
2018-09-27T00:00:00ZClassical generalization bounds are surprisingly tight for Deep NetworksLiao, QianliMiranda, BrandoHidary, JackPoggio, Tomasohttp://hdl.handle.net/1721.1/1169112018-07-28T06:18:59Z2018-07-11T00:00:00ZClassical generalization bounds are surprisingly tight for Deep Networks
Liao, Qianli; Miranda, Brando; Hidary, Jack; Poggio, Tomaso
Deep networks are usually trained and tested in a regime in which the training classification error is not a good predictor of the test error. Thus the consensus has been that generalization, defined as convergence of the empirical to the expected error, does not hold for deep networks. Here we show that, when normalized appropriately after training, deep networks trained on exponential type losses show a good linear dependence of test loss on training loss. The observation, motivated by a previous theoretical analysis of overparameterization and overfitting, not only demonstrates the validity of classical generalization bounds for deep learning but suggests that they are tight. In addition, we also show that the bound of the classification error by the normalized cross entropy loss is empirically rather tight on the data sets we studied.
2018-07-11T00:00:00ZTheory IIIb: Generalization in Deep NetworksPoggio, TomasoLiao, QianliMiranda, BrandoBurbanski, AndrzejHidary, Jackhttp://hdl.handle.net/1721.1/1166922018-06-30T06:17:40Z2018-06-29T00:00:00ZTheory IIIb: Generalization in Deep Networks
Poggio, Tomaso; Liao, Qianli; Miranda, Brando; Burbanski, Andrzej; Hidary, Jack
A main puzzle of deep neural networks (DNNs) revolves around the apparent absence of "overfitting", defined in this paper as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gradient descent. This is surprising because of the large capacity demonstrated by DNNs to fit randomly labeled data and the absence of explicit regularization. Recent results by Srebro et al. provide a satisfying solution of the puzzle for linear networks used in binary classification. They prove that minimization of loss functions such as the logistic, the cross-entropy and the exp-loss yields asymptotic, "slow" convergence to the maximum margin solution for linearly separable datasets, independently of the initial conditions. Here we prove a similar result for nonlinear multilayer DNNs near zero minima of the empirical loss. The result holds for exponential-type losses but not for the square loss. In particular, we prove that the normalized weight matrix at each layer of a deep network converges to a minimum norm solution (in the separable case). Our analysis of the dynamical system corresponding to gradient descent of a multilayer network suggests a simple criterion for predicting the generalization performance of different zero minimizers of the empirical loss.
2018-06-29T00:00:00ZDeep Regression Forests for Age EstimationShen, WeiGuo, YiluWang, YanZhao, KaiWang, BoYuille, Alan L.http://hdl.handle.net/1721.1/1154132018-05-17T06:18:21Z2018-06-01T00:00:00ZDeep Regression Forests for Age Estimation
Shen, Wei; Guo, Yilu; Wang, Yan; Zhao, Kai; Wang, Bo; Yuille, Alan L.
Age estimation from facial images is typically cast as a nonlinear regression problem. The main challenge of this problem is the facial feature space w.r.t. ages is inhomogeneous, due to the large variation in facial appearance across different persons of the same age and the non-stationary property of aging patterns. In this paper, we propose Deep Regression Forests (DRFs), an end-to-end model, for age estimation. DRFs connect the split nodes to a fully connected layer of a convolutional neural network (CNN) and deal with inhomogeneous data by jointly learning input-dependant data partitions at the split nodes and data abstractions at the leaf nodes. This joint learning follows an alternating strategy: First, by fixing the leaf nodes, the split nodes as well as the CNN parameters are optimized by Back-propagation; Then, by fixing the split nodes, the leaf nodes are optimized by iterating a step-size free update rule derived from Variational Bounding. We verify the proposed DRFs on three standard age estimation benchmarks and achieve state-of-the-art results on all of them.
2018-06-01T00:00:00Z