Show simple item record

dc.contributor.advisorTomaso Poggio
dc.contributor.authorLittle, Anna V.en_US
dc.contributor.authorMaggioni, Mauroen_US
dc.contributor.authorRosasco, Lorenzoen_US
dc.contributor.otherCenter for Biological and Computational Learning (CBCL)en_US
dc.date.accessioned2012-09-10T18:00:08Z
dc.date.available2012-09-10T18:00:08Z
dc.date.issued2012-09-08
dc.identifier.urihttp://hdl.handle.net/1721.1/72597
dc.description.abstractLarge data sets are often modeled as being noisy samples from probability distributions in R^D, with D large. It has been noticed that oftentimes the support M of these probability distributions seems to be well-approximated by low-dimensional sets, perhaps even by manifolds. We shall consider sets that are locally well approximated by k-dimensional planes, with k << D, with k-dimensional manifolds isometrically embedded in R^D being a special case. Samples from this distribution; are furthermore corrupted by D-dimensional noise. Certain tools from multiscale geometric measure theory and harmonic analysis seem well-suited to be adapted to the study of samples from such probability distributions, in order to yield quantitative geometric information about them. In this paper we introduce and study multiscale covariance matrices, i.e. covariances corresponding to the distribution restricted to a ball of radius r, with a fixed center and varying r, and under rather general geometric assumptions we study how their empirical, noisy counterparts behave. We prove that in the range of scales where these covariance matrices are most informative, the empirical, noisy covariances are close to their expected, noiseless counterparts. In fact, this is true as soon as the number of samples in the balls where the covariance matrices are computed is linear in the intrinsic dimension of M. As an application, we present an algorithm for estimating the intrinsic dimension of M.en_US
dc.format.extent59 p.en_US
dc.relation.ispartofseriesMIT-CSAIL-TR-2012-029
dc.relation.ispartofseriesCBCL-310
dc.subjectmachine learningen_US
dc.subjecthigh dimensional dataen_US
dc.titleMultiscale Geometric Methods for Data Sets I: Multiscale SVD, Noise and Curvatureen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record