Dimensionality reduction for k-means clustering
Author(s)
Musco, Cameron N. (Cameron Nicholas)
DownloadFull printable version (1.044Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Nancy A. Lynch.
Terms of use
Metadata
Show full item recordAbstract
In this thesis we study dimensionality reduction techniques for approximate k-means clustering. Given a large dataset, we consider how to quickly compress to a smaller dataset (a sketch), such that solving the k-means clustering problem on the sketch will give an approximately optimal solution on the original dataset. First, we provide an exposition of technical results of [CEM+15], which show that provably accurate dimensionality reduction is possible using common techniques such as principal component analysis, random projection, and random sampling. We next present empirical evaluations of dimensionality reduction techniques to supplement our theoretical results. We show that our dimensionality reduction algorithms, along with heuristics based on these algorithms, indeed perform well in practice. Finally, we discuss possible extensions of our work to neurally plausible algorithms for clustering and dimensionality reduction. This thesis is based on joint work with Michael Cohen, Samuel Elder, Nancy Lynch, Christopher Musco, and Madalina Persu.
Description
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 123-131).
Date issued
2015Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.