MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Learning Reconfigurable Vision Models

Author(s)
Gonzalez Ortiz, Jose Javier
Thumbnail
DownloadThesis PDF (12.20Mb)
Advisor
Guttag, John V.
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Over the past decade, deep learning methods have emerged as the predominant approach in a wide variety of fields, such as computer vision, natural language processing, and speech recognition. However, these models have also been notorious for their high computational costs and substantial data requirements. Furthermore, they often present significant challenges to non-technical users, who most often lack the expertise needed to effectively tailor these models to their specific applications. In this thesis, we tackle these challenges by exploring amortizing the cost of training models with similar learning tasks. Instead of training multiple models independently, we propose learning a single, reconfigurable model that effectively captures the spectrum of underlying problems. Once trained, this model can be dynamically reconfigured at inference time, adapting its properties without incurring additional training costs. First, we introduce Scale-Space Hypernetworks, a method for learning a continuum of CNNs with varying efficiency characteristics. This enables us to characterize an entire Pareto accuracy-efficiency curve of models by training a single hypernetwork, dramatically reducing training costs. Then, we characterize a previously unidentified optimization problem in hypernetwork training, and propose a revised hypernetwork formulation that leads to faster convergence and more stable training. Lastly, we present UniverSeg, an in-context learning method for universal biomedical image segmentation. Given a query image and an example set of image-label pairs that define a new segmentation task, it produces accurate segmentation without additional training, outperforming several related methods on unseen segmentation tasks. We empirically demonstrate the validity of our methods in real-world applications, focusing on computer vision and biomedical imaging, where we assess a wide array of tasks and datasets. In all of these works we find that it is not only feasible to train reconfigurable models but that in doing so, we achieve substantial efficiency gains both at training and at inference time.
Date issued
2024-02
URI
https://hdl.handle.net/1721.1/153839
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.