Learning Reconfigurable Vision Models

Gonzalez Ortiz, Jose Javier

Author(s)

Gonzalez Ortiz, Jose Javier

DownloadThesis PDF (12.20Mb)

Advisor

Guttag, John V.

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Over the past decade, deep learning methods have emerged as the predominant approach in a wide variety of fields, such as computer vision, natural language processing, and speech recognition. However, these models have also been notorious for their high computational costs and substantial data requirements. Furthermore, they often present significant challenges to non-technical users, who most often lack the expertise needed to effectively tailor these models to their specific applications. In this thesis, we tackle these challenges by exploring amortizing the cost of training models with similar learning tasks. Instead of training multiple models independently, we propose learning a single, reconfigurable model that effectively captures the spectrum of underlying problems. Once trained, this model can be dynamically reconfigured at inference time, adapting its properties without incurring additional training costs. First, we introduce Scale-Space Hypernetworks, a method for learning a continuum of CNNs with varying efficiency characteristics. This enables us to characterize an entire Pareto accuracy-efficiency curve of models by training a single hypernetwork, dramatically reducing training costs. Then, we characterize a previously unidentified optimization problem in hypernetwork training, and propose a revised hypernetwork formulation that leads to faster convergence and more stable training. Lastly, we present UniverSeg, an in-context learning method for universal biomedical image segmentation. Given a query image and an example set of image-label pairs that define a new segmentation task, it produces accurate segmentation without additional training, outperforming several related methods on unseen segmentation tasks. We empirically demonstrate the validity of our methods in real-world applications, focusing on computer vision and biomedical imaging, where we assess a wide array of tasks and datasets. In all of these works we find that it is not only feasible to train reconfigurable models but that in doing so, we achieve substantial efficiency gains both at training and at inference time.

Date issued

2024-02

URI

https://hdl.handle.net/1721.1/153839

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses