Interactive data analytics using GPUs
Author(s)
Shanbhag, Anil(Anil Atmanand)
Download1227757140-MIT.pdf (1.724Mb)
Alternative title
Interactive data analytics using central processing units
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Samuel R. Madden.
Terms of use
Metadata
Show full item recordAbstract
Modern GPUs provide an order-of-magnitude greater memory bandwidth compared to CPUs. In theory, this means data processing systems can process O(TB) of data with sub 100ms latency, thereby enabling interactive query response times on analytical SQL queries. However, the massively parallel architecture of GPUs requires rearchitecting in-memory data analytics systems in order to achieve optimal performance. This thesis describes how we adapted and redesigned in-memory data analytics systems to better exploit the GPU's memory and execution model. We present Crystal, a library of building blocks that can be used for writing high performance SQL query implementations for GPU.We use Crystal to implement basic SQL query operators and an analytical benchmark. We present theoretical models based on memory bandwidth as the critical bottleneck for query performance and show that implementations using Crystal are able to achieve these theoretical limits. We also present a study of the fundamental performance characteristics of GPUs and CPUs for database analytics. Our analysis shows that using modern GPUs vs CPUs can lead to a runtime gain equal to 1.5x bandwidth ratio of GPU to CPU ( 25x in our setup) and be 4x more cost effective than CPUs. Finally, we used Crystal's design principles to develop massively parallel variants of two classic sequential algorithms: top-k and bit-packing based compression. Bitonic Top-K is a top-k algorithm based on bitonic sort that is 4x faster than previous approaches. GPU-FOR is a compression format that can be decompressed efficiently in parallel and can be used to fit more data into the limited GPU memory. In summary, this thesis makes the case for using GPUs as the primary execution engine for interactive data analytics, and shows that implementations are efficient and practical.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020 Cataloged from student-submitted PDF of thesis. Includes bibliographical references (pages 157-163).
Date issued
2020Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.