MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Interactive data analytics using GPUs

Author(s)
Shanbhag, Anil(Anil Atmanand)
Thumbnail
Download1227757140-MIT.pdf (1.724Mb)
Alternative title
Interactive data analytics using central processing units
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Samuel R. Madden.
Terms of use
MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Modern GPUs provide an order-of-magnitude greater memory bandwidth compared to CPUs. In theory, this means data processing systems can process O(TB) of data with sub 100ms latency, thereby enabling interactive query response times on analytical SQL queries. However, the massively parallel architecture of GPUs requires rearchitecting in-memory data analytics systems in order to achieve optimal performance. This thesis describes how we adapted and redesigned in-memory data analytics systems to better exploit the GPU's memory and execution model. We present Crystal, a library of building blocks that can be used for writing high performance SQL query implementations for GPU.We use Crystal to implement basic SQL query operators and an analytical benchmark. We present theoretical models based on memory bandwidth as the critical bottleneck for query performance and show that implementations using Crystal are able to achieve these theoretical limits. We also present a study of the fundamental performance characteristics of GPUs and CPUs for database analytics. Our analysis shows that using modern GPUs vs CPUs can lead to a runtime gain equal to 1.5x bandwidth ratio of GPU to CPU ( 25x in our setup) and be 4x more cost effective than CPUs. Finally, we used Crystal's design principles to develop massively parallel variants of two classic sequential algorithms: top-k and bit-packing based compression. Bitonic Top-K is a top-k algorithm based on bitonic sort that is 4x faster than previous approaches. GPU-FOR is a compression format that can be decompressed efficiently in parallel and can be used to fit more data into the limited GPU memory. In summary, this thesis makes the case for using GPUs as the primary execution engine for interactive data analytics, and shows that implementations are efficient and practical.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020
 
Cataloged from student-submitted PDF of thesis.
 
Includes bibliographical references (pages 157-163).
 
Date issued
2020
URI
https://hdl.handle.net/1721.1/129305
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.