CBMM Memo Serieshttps://hdl.handle.net/1721.1/885312019-12-09T23:38:53Z2019-12-09T23:38:53ZDouble descent in the condition numberPoggio, TomasoKur, GilBanburski, Andrzejhttps://hdl.handle.net/1721.1/1231082019-12-05T04:30:41Z2019-12-04T00:00:00ZDouble descent in the condition number
Poggio, Tomaso; Kur, Gil; Banburski, Andrzej
In solving a system of n linear equations in d variables Ax=b, the condition number of the (n,d) matrix A measures how much errors in the data b affect the solution x. Bounds of this type are important in many inverse problems. An example is machine learning where the key task is to estimate an underlying function from a set of measurements at random points in a high dimensional space and where low sensitivity to error in the data is a requirement for good predictive performance. Here we report the simple observation that when the columns of A are random vectors, the condition number of A is highest, that is worse, when d=n, that is when the inverse of A exists. An overdetermined system (n>d) and especially an underdetermined system (n<d), for which the pseudoinverse must be used instead of the inverse, typically have significantly better, that is lower, condition numbers. Thus the condition number of A plotted as function of d shows a double descent behavior with a peak at d=n.
2019-12-04T00:00:00ZHippocampal Remapping as Hidden State InferenceSanders, HoniWilson, Matthew A.Gershman, Samueal J.https://hdl.handle.net/1721.1/1220402019-09-06T03:00:51Z2019-08-22T00:00:00ZHippocampal Remapping as Hidden State Inference
Sanders, Honi; Wilson, Matthew A.; Gershman, Samueal J.
Cells in the hippocampus tuned to spatial location (place cells) typically change their tuning when an animal changes context, a phenomenon known as remapping. A fundamental challenge to understanding remapping is the fact that what counts as a “context change” has never been precisely defined. Furthermore, different remapping phenomena have been classified on the basis of how much the tuning changes after different types and degrees of context change, but the relationship between these variables is not clear. We address these ambiguities by formalizing remapping in terms of hidden state inference. According to this view, remapping does not directly reflect objective, observable properties of the environment, but rather subjective beliefs about the hidden state of the environment. We show how the hidden state framework can resolve a number of puzzles about the nature of remapping.
2019-08-22T00:00:00ZBrain Signals Localization by Alternating ProjectionsAdler, AmirWax, MatiPantazis, Dimitrioshttps://hdl.handle.net/1721.1/1220342019-11-21T03:09:15Z2019-08-29T00:00:00ZBrain Signals Localization by Alternating Projections
Adler, Amir; Wax, Mati; Pantazis, Dimitrios
We present a novel solution to the problem of localization of brain signals. The solution is sequential and iterative, and is based on minimizing the least-squares (LS) criterion by the alternating projection (AP) algorithm, well known in the context of array signal processing. Unlike existing solutions belonging to the linearly constrained minimum variance (LCMV) and to the multiple-signal classification (MUSIC) families, the algorithm is applicable even in the case of a single sample and in the case of synchronous sources. The performance of the solution is demonstrated via simulations.
2019-08-29T00:00:00ZTheoretical Issues in Deep NetworksPoggio, TomasoBanburski, AndrzejLiao, Qianlihttps://hdl.handle.net/1721.1/1220142019-12-06T03:55:12Z2019-08-17T00:00:00ZTheoretical Issues in Deep Networks
Poggio, Tomaso; Banburski, Andrzej; Liao, Qianli
While deep learning is successful in a number of applications, it is not yet well understood theoretically. A theoretical characterization of deep learning should answer questions about their approximation power, the dynamics of optimization by gradient descent and good out-of-sample performance --- why the expected error does not suffer, despite the absence of explicit regularization, when the networks are overparametrized. We review our recent results towards this goal. In {\it approximation theory} both shallow and deep networks are known to approximate any continuous functions on a bounded domain at a cost which is exponential (the number of parameters is exponential in the dimensionality of the function). However, we proved that for certain types of compositional functions, deep networks of the convolutional type (even without weight sharing) can have a linear dependence on dimensionality, unlike shallow networks. In characterizing {\it minimization} of the empirical exponential loss we consider the gradient descent dynamics of the weight directions rather than the weights themselves, since the relevant function underlying classification corresponds to the normalized network. The dynamics of the normalized weights implied by standard gradient descent turns out to be equivalent to the dynamics of the constrained problem of minimizing an exponential-type loss subject to a unit $L_2$ norm constraint. In particular, the dynamics of the typical, unconstrained gradient descent converges to the same critical points of the constrained problem. Thus, there is {\it implicit regularization} in training deep networks under exponential-type loss functions with gradient descent. The critical points of the flow are hyperbolic minima (for any long but finite time) and minimum norm minimizers (e.g. maxima of the margin). Though appropriately normalized networks can show a small generalization gap (difference between empirical and expected loss) even for finite $N$ (number of training examples) wrt the exponential loss, they do not generalize in terms of the classification error. Bounds on it for finite $N$ remain an open problem. Nevertheless, our results, together with other recent papers, characterize an implicit vanishing regularization by gradient descent which is likely to be a key prerequisite -- in terms of complexity control -- for the good performance of deep overparametrized ReLU classifiers.
2019-08-17T00:00:00Z