Accelerated clustering through locality-sensitive hashing
Massachusetts Institute of Technology. Dept. of Mathematics.
Jonathan A. Kelner.
MetadataShow full item record
We obtain improved running times for two algorithms for clustering data: the expectation-maximization (EM) algorithm and Lloyd's algorithm. The EM algorithm is a heuristic for finding a mixture of k normal distributions in Rd that maximizes the probability of drawing n given data points. Lloyd's algorithm is a special case of this algorithm in which the covariance matrix of each normally-distributed component is required to be the identity. We consider versions of these algorithms where the number of mixture components is inferred by assuming a Dirichlet process as a generative model. The separation probability of this process, [alpha], is typically a small constant. We speed up each iteration of the EM algorithm from O(nd2k) to O(ndk log 3(k/a))+nd 2 ) time and each iteration of Lloyd's algorithm from O(ndk) to O(nd(k/a). 39) time.
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science; and, (S.B.)--Massachusetts Institute of Technology, Dept. of Mathematics, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 18).
DepartmentMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.; Massachusetts Institute of Technology. Dept. of Mathematics.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science., Mathematics.