MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Efficient Algorithms for Vector Similarities

Author(s)
Silwal, Sandeep B.
Thumbnail
DownloadThesis PDF (3.090Mb)
Advisor
Indyk, Piotr
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
A key cog in machine learning is the humble embedding: vector representations of real world objects such as text, images, graphs, or molecules whose geometric similarities capture intuitive notions of semantic similarities. It is thus common to curate massive datasets of embeddings by inferencing on a machine learning model of choice. However, the sheer dataset size and large dimensionality is often \emph{the} bottleneck in effectively leveraging and learning from this rich dataset. Inspired by this computational bottleneck in modern machine learning pipelines, we study the following question: "How can we efficiently compute on large scale high dimensional data?" In this thesis, we focus on two aspects of this question. 1) Efficient local similarity computation: we give faster algorithms for individual similarity computations, such as calculating notions of similarity between collections of vectors, as well as dimensionality reduction techniques which preserve similarities. In addition to computational efficiency, other resource constraints such as space and privacy are also considered. 2) Efficient global similarity analysis: we study algorithms for analyzing global relationships between vectors encoded in similarity matrices. Our algorithms compute on similarity matrices, such as distance or kernel matrices, without ever initializing them, thus avoiding an infeasible quadratic time bottleneck. Overall, the main message of this thesis is that sublinear algorithms design principles are instrumental in designing scalable algorithms for big data.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/156582
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.