Unified Compilation for Lossless Compression and Sparse Computing

Donenfeld, Daniel

dc.contributor.advisor	Amarasinghe, Saman
dc.contributor.author	Donenfeld, Daniel
dc.date.accessioned	2023-03-31T14:38:20Z
dc.date.available	2023-03-31T14:38:20Z
dc.date.issued	2023-02
dc.date.submitted	2023-02-28T14:36:02.481Z
dc.identifier.uri	https://hdl.handle.net/1721.1/150186
dc.description.abstract	Achieving high performance for computations on tensors depends heavily on the formats used to store them. While sparse tensors are very common, there are more general patterns in data which can sometimes be better captured using lossless compression. We show how to extend sparse tensor algebra compilers to support lossless compression techniques, including variants of run-length encoding and Lempel-Ziv compression. We develop new abstractions to represent losslessly compressed data as a generalized form of sparse tensors, with repetitions of values (which are compressed out in storage) represented by non-scalar, dynamic fill values. We then show how a compiler can use these abstractions to emit efficient code that computes on losslessly compressed data. By unifying lossless compression with sparse tensor algebra, our technique is able to generate code that computes with both losslessly compressed data and sparse data, as well as generate code that computes directly on compressed data without needing to first decompress it. We evaluate two implementations of our techniques, using a prototype compiler based on TACO, and an implementation of our formats within Finch. Our evaluation using our TACO compiler shows our technique generates efficient image and video processing kernels that compute on losslessly compressed data. We find that the generated kernels are up to 16.3× faster than equivalent dense kernels generated by TACO, a tensor algebra compiler, and up to 16.1× faster than OpenCV, a widely used image processing library. Using our Finch formats, we see compression ratios up to 25× with run-time speedups up to 3.1× over dense computation for reduction computations.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Unified Compilation for Lossless Compression and Sparse Computing
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: Donenfeld-danielbd-SM-EECS-202 ...
Size:: 2.388Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record