Dimensionality reduction for k-means clustering

Musco, Cameron N. (Cameron Nicholas)

Author(s)

Musco, Cameron N. (Cameron Nicholas)

DownloadFull printable version (1.044Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Nancy A. Lynch.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

In this thesis we study dimensionality reduction techniques for approximate k-means clustering. Given a large dataset, we consider how to quickly compress to a smaller dataset (a sketch), such that solving the k-means clustering problem on the sketch will give an approximately optimal solution on the original dataset. First, we provide an exposition of technical results of [CEM+15], which show that provably accurate dimensionality reduction is possible using common techniques such as principal component analysis, random projection, and random sampling. We next present empirical evaluations of dimensionality reduction techniques to supplement our theoretical results. We show that our dimensionality reduction algorithms, along with heuristics based on these algorithms, indeed perform well in practice. Finally, we discuss possible extensions of our work to neurally plausible algorithms for clustering and dimensionality reduction. This thesis is based on joint work with Michael Cohen, Samuel Elder, Nancy Lynch, Christopher Musco, and Madalina Persu.

Description

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Cataloged from student-submitted PDF version of thesis.

Includes bibliographical references (pages 123-131).

Date issued

2015

URI

http://hdl.handle.net/1721.1/101473

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses