Hardware Acceleration for Real-Time Compression of 3D
Gaussians

Kahler, Kailas B.

Author(s)

Kahler, Kailas B.

DownloadThesis PDF (2.070Mb)

Advisor

Sze, Vivienne

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

3D Gaussian Splatting (3DGS) is a technique for novel view synthesis, where images of a scene from a specific viewpoint are generated using images from different viewpoints, that has gained popularity for its reduced computational overhead, resulting in faster training and rendering times compared to other methods like Neural Radiance Fields (NeRFs). Its applications outside of strictly novel view synthesis have also been explored, with monocular simultaneous localization and mapping (SLAM) in robotics being an emergent application. However, because of limited on-board battery capacity, the computer hardware used in small robots is much less capable than the high-powered GPUs that the 3DGS algorithm was originally developed on, having both less compute and memory capacity and bandwidth. While there has been work developing specialized compute for the rendering pipeline of 3DGS, memory remains an obstacle to deployment. The Gaussian map can occupy from 1MB − 700MB in memory, which is both too large to store on-chip within micro-robots and such that moving Gaussians from memory to compute can dominate power consumption. While there has been prior work on algorithms for compressing Gaussian representations, they are not yet capable of running in real-time on the hardware present in these robots, as would be required for SLAM. Thus, this thesis explores the limits of these compression methods on current hardware, resulting in an optimized CUDA implementation with better than 100× the throughput of prior work and achieving real-time operation on workstation-class hardware. However, after concluding that custom hardware is necessary for further improvement, this thesis also presents a hardware accelerator that nears real-time compression performance within a reduced power budget, outperforming an NVIDIA Jetson Orin Nano with 64% higher throughput while using 1/16th of the multipliers and drawing 38% of the power when running at 100MHz on an AMD UltraScale+ FPGA.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/162745

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses