dc.contributor.advisor | Han, Song | |
dc.contributor.author | Stiles, Nicole | |
dc.date.accessioned | 2024-09-24T18:26:22Z | |
dc.date.available | 2024-09-24T18:26:22Z | |
dc.date.issued | 2024-05 | |
dc.date.submitted | 2024-07-11T14:37:30.978Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/157006 | |
dc.description.abstract | The Segment-Anything Model (SAM) is a vision foundation model facilitating promptable and zero-shot image segmentation. SAM-based models have a wide range of applications including autonomous driving, medical image segmentation, VR, and data annotation. However, SAM models are highly computationally intensive and lack a flexible prompting mechanism. On an NVIDIA A100 GPU, SAM runs at 11 frames/second, missing the mark for real-time performance and preventing the usage of SAM on edge devices. To tackle both the latency constraint and the prompt flexibility constraint, we introduce GazeSAM, a new real-time gaze-prompted image segmentation model. GazeSAM uses face and gaze detection to determine the direction of a user's gaze, object detection to find candidate objects of interest, depth estimation to perform background detection, and image segmentation to generate masks. The final output is a mask segmenting the object at the focus of the user's gaze. By performing algorithmic optimizations, employing inference engines, and applying FP16 and INT8 quantization, we achieve a 24x speedup relative to the baseline FP32 PyTorch implementation. GazeSAM runs at a speed of over 30 FPS, enabling real-time performance on an RTX 4070 GPU. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) | |
dc.rights | Copyright retained by author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.title | Efficient Segment Anything on the Edge | |
dc.type | Thesis | |
dc.description.degree | M.Eng. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.identifier.orcid | https://orcid.org/0009-0001-2832-3552 | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |