A dimension reduction technique to preserve nearest neighbors on high dimensional data

Chachamis, Christos Nestor.

Author(s)

Chachamis, Christos Nestor.

Download1192539440-MIT.pdf (1.464Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Samuel Madden.

Terms of use

MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Dimension reduction techniques are widely used for various tasks, including visualizations and data pre-processing. In this project, we develop a new dimension-reduction method that helps with the problem of Approximate Nearest Neighbor Search on high dimensional data. It uses a deep neural network to reduce the data to a lower dimension, while also preserving nearest neighbors and local structure. We evaluate the performance of this network on several datasets, including synthetic and real ones, and, finally, we compare our method against other dimension reduction techniques, like tSNE. Our experiment results show that this method can sufficiently preserve the local structure, in both the training and test data. In particular, we observe that most of the distances of the predicted nearest neighbors in the test data are within 10% of the distances of the actual nearest neighbors. Another advantage of our method is that it can easily work on new and unseen data, without having to fit the model from scratch.

Description

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020

Cataloged from the official PDF of thesis.

Includes bibliographical references (pages 71-72).

Date issued

2020

URI

https://hdl.handle.net/1721.1/127381

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses