Show simple item record

dc.contributor.advisorHan, Song
dc.contributor.authorStiles, Nicole
dc.date.accessioned2024-09-24T18:26:22Z
dc.date.available2024-09-24T18:26:22Z
dc.date.issued2024-05
dc.date.submitted2024-07-11T14:37:30.978Z
dc.identifier.urihttps://hdl.handle.net/1721.1/157006
dc.description.abstractThe Segment-Anything Model (SAM) is a vision foundation model facilitating promptable and zero-shot image segmentation. SAM-based models have a wide range of applications including autonomous driving, medical image segmentation, VR, and data annotation. However, SAM models are highly computationally intensive and lack a flexible prompting mechanism. On an NVIDIA A100 GPU, SAM runs at 11 frames/second, missing the mark for real-time performance and preventing the usage of SAM on edge devices. To tackle both the latency constraint and the prompt flexibility constraint, we introduce GazeSAM, a new real-time gaze-prompted image segmentation model. GazeSAM uses face and gaze detection to determine the direction of a user's gaze, object detection to find candidate objects of interest, depth estimation to perform background detection, and image segmentation to generate masks. The final output is a mask segmenting the object at the focus of the user's gaze. By performing algorithmic optimizations, employing inference engines, and applying FP16 and INT8 quantization, we achieve a 24x speedup relative to the baseline FP32 PyTorch implementation. GazeSAM runs at a speed of over 30 FPS, enabling real-time performance on an RTX 4070 GPU.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleEfficient Segment Anything on the Edge
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcidhttps://orcid.org/0009-0001-2832-3552
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record