Show simple item record

dc.contributor.authorRus, Daniela
dc.contributor.authorFeldman, Dan
dc.contributor.authorVolkov, Mikhail
dc.date.accessioned2021-11-03T17:26:16Z
dc.date.available2021-11-03T17:26:16Z
dc.date.issued2016
dc.identifier.urihttps://hdl.handle.net/1721.1/137254
dc.description.abstract© 2016 NIPS Foundation - All Rights Reserved. In this paper we present a practical solution with performance guarantees to the problem of dimensionality reduction for very large scale sparse matrices. We show applications of our approach to computing the Principle Component Analysis (PCA) of any n × d matrix, using one pass over the stream of its rows. Our solution uses coresets: a scaled subset of the n rows that approximates their sum of squared distances to every k-dimensional affine subspace. An open theoretical problem has been to compute such a coreset that is independent of both n and d. An open practical problem has been to compute a non-trivial approximation to the PCA of very large but sparse databases such as the Wikipedia document-term matrix in a reasonable time. We answer both of these questions affirmatively. Our main technical result is a new framework for deterministic coreset constructions based on a reduction to the problem of counting items in a stream.en_US
dc.language.isoen
dc.relation.isversionofhttps://papers.nips.cc/paper/6596-dimensionality-reduction-of-massive-sparse-datasets-using-coresetsen_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceNeural Information Processing Systems (NIPS)en_US
dc.titleDimensionality reduction of massive sparse datasets using coresetsen_US
dc.typeArticleen_US
dc.identifier.citationRus, Daniela, Feldman, Dan and Volkov, Mikhail. 2016. "Dimensionality reduction of massive sparse datasets using coresets."
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2019-07-17T17:30:26Z
dspace.date.submission2019-07-17T17:30:27Z
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record