## Accelerated clustering through locality-sensitive hashing

##### Author(s)

Kishore, Shaunak
DownloadFull printable version (846.9Kb)

##### Other Contributors

Massachusetts Institute of Technology. Dept. of Mathematics.

##### Advisor

Jonathan A. Kelner.

##### Terms of use

##### Metadata

Show full item record##### Abstract

We obtain improved running times for two algorithms for clustering data: the expectation-maximization (EM) algorithm and Lloyd's algorithm. The EM algorithm is a heuristic for finding a mixture of k normal distributions in Rd that maximizes the probability of drawing n given data points. Lloyd's algorithm is a special case of this algorithm in which the covariance matrix of each normally-distributed component is required to be the identity. We consider versions of these algorithms where the number of mixture components is inferred by assuming a Dirichlet process as a generative model. The separation probability of this process, [alpha], is typically a small constant. We speed up each iteration of the EM algorithm from O(nd2k) to O(ndk log 3(k/a))+nd 2 ) time and each iteration of Lloyd's algorithm from O(ndk) to O(nd(k/a). 39) time.

##### Description

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science; and, (S.B.)--Massachusetts Institute of Technology, Dept. of Mathematics, 2012. Cataloged from PDF version of thesis. Includes bibliographical references (p. 18).

##### Date issued

2012##### Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Department of Mathematics##### Publisher

Massachusetts Institute of Technology

##### Keywords

Electrical Engineering and Computer Science., Mathematics.