Computing included and excluded sums using parallel prefix

Fraser, Sean(Sean Cameron Burrows)

Author(s)

Fraser, Sean(Sean Cameron Burrows)

Download1192544501-MIT.pdf (2.119Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Charles E. Leiserson.

Terms of use

MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Many scientific computing applications involve reducing elements in overlapping subregions of a multidimensional array. For example, the integral image problem from image processing requires finding the sum of elements in arbitrary axis-aligned subregions of an image. Furthermore, the fast multipole method, a widely used kernel in particle simulations, relies on reducing regions outside of a bounding box in a multidimensional array to a representative multipole expansion for certain interactions. We abstract away the application domains and define the underlying included and excluded sums problems of reducing regions inside and outside (respectively) of an axis-aligned bounding box in a multidimensional array. In this thesis, we present the dimension-reduction excluded-sums (DRES) algorithm, an asymptotically improved algorithm for the excluded sums problem in arbitrary dimensions and compare it with the state-of-the-art algorithm by Demaine et al.

The DRES algorithm reduces the work from exponential to linear in the number of dimensions. Along the way, we present a linear-time algorithm for the included sums problem and show how to use it in the DRES algorithm. At the core of these algorithms are in-place prefix and suffix sums. Furthermore, applications that involve included and excluded sums require both high performance and numerical accuracy in practice. Since standard methods for prefix sums on general-purpose multicores usually suffer from either poor performance or low accuracy, we present an algorithm called the block-hybrid (BH) algorithm for parallel prefix sums to take advantage of data-level and task-level parallelism. The BH algorithm is competitive on large inputs, up to 2.5x faster on inputs that fit in cache, and 8.4x more accurate compared to state-of-the art CPU parallel prefix implementations.

Furthermore, a BH algorithm variant achieves at least a 1.5x improvement over a state-of-the-art GPU prefix sum implementation on a performance-per-cost ratio (using Amazon Web Services' pricing). Much of thesis represents joint work with Helen Xu and Professor Charles Leiserson.

Description

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020

Cataloged from the official PDF of thesis.

Includes bibliographical references (pages 83-85).

Date issued

2020

URI

https://hdl.handle.net/1721.1/127395

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses