Show simple item record

dc.contributor.advisorIndyk, Piotr
dc.contributor.authorXu, Haike
dc.date.accessioned2024-08-21T18:53:57Z
dc.date.available2024-08-21T18:53:57Z
dc.date.issued2024-05
dc.date.submitted2024-07-10T13:00:03.632Z
dc.identifier.urihttps://hdl.handle.net/1721.1/156284
dc.description.abstractGraph-based approaches to nearest neighbor search are popular and powerful tools for handling large datasets in practice, but they have limited theoretical guarantees. We study the worst-case performance of recent graph-based approximate nearest neighbor search algorithms, such as HNSW, NSG and DiskANN. For DiskANN, we show that its “slow preprocessing” version provably supports approximate nearest neighbor search query with constant approximation ratio and poly-logarithmic query time, on data sets with bounded “intrinsic” dimension. For the other data structure variants studied, including DiskANN with “fast preprocessing”, HNSW and NSG, we present a family of instances on which the empirical query time required to achieve a “reasonable” accuracy is linear in instance size. For example, for DiskANN, we show that the query procedure can take at least 0.1n steps on instances of size n before it encounters any of the 5 nearest neighbors of the query.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleWorst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record