Advanced Search
DSpace@MIT

Multiscale Geometric Methods for Data Sets I: Multiscale SVD, Noise and Curvature

Research and Teaching Output of the MIT Community

Show simple item record

dc.contributor.advisor Tomaso Poggio
dc.contributor.author Little, Anna V. en_US
dc.contributor.author Maggioni, Mauro en_US
dc.contributor.author Rosasco, Lorenzo en_US
dc.contributor.other Center for Biological and Computational Learning (CBCL) en_US
dc.date.accessioned 2012-09-10T18:00:08Z
dc.date.available 2012-09-10T18:00:08Z
dc.date.issued 2012-09-08
dc.identifier.uri http://hdl.handle.net/1721.1/72597
dc.description.abstract Large data sets are often modeled as being noisy samples from probability distributions in R^D, with D large. It has been noticed that oftentimes the support M of these probability distributions seems to be well-approximated by low-dimensional sets, perhaps even by manifolds. We shall consider sets that are locally well approximated by k-dimensional planes, with k << D, with k-dimensional manifolds isometrically embedded in R^D being a special case. Samples from this distribution; are furthermore corrupted by D-dimensional noise. Certain tools from multiscale geometric measure theory and harmonic analysis seem well-suited to be adapted to the study of samples from such probability distributions, in order to yield quantitative geometric information about them. In this paper we introduce and study multiscale covariance matrices, i.e. covariances corresponding to the distribution restricted to a ball of radius r, with a fixed center and varying r, and under rather general geometric assumptions we study how their empirical, noisy counterparts behave. We prove that in the range of scales where these covariance matrices are most informative, the empirical, noisy covariances are close to their expected, noiseless counterparts. In fact, this is true as soon as the number of samples in the balls where the covariance matrices are computed is linear in the intrinsic dimension of M. As an application, we present an algorithm for estimating the intrinsic dimension of M. en_US
dc.format.extent 59 p. en_US
dc.relation.ispartofseries MIT-CSAIL-TR-2012-029
dc.relation.ispartofseries CBCL-310
dc.subject machine learning en_US
dc.subject high dimensional data en_US
dc.title Multiscale Geometric Methods for Data Sets I: Multiscale SVD, Noise and Curvature en_US


Files in this item

Name Size Format Description
MIT-CSAIL-TR-2012 ... 1.820Mb PDF

This item appears in the following Collection(s)

Show simple item record

MIT-Mirage