Computing Big-Data Applications Near Flash
Author(s)
Xu, Shuotao![Thumbnail](/bitstream/handle/1721.1/139492/Xu-shuotao-PhD-EECS-2021-thesis.pdf.jpg?sequence=3&isAllowed=y)
DownloadThesis PDF (7.549Mb)
Advisor
Arvind
Terms of use
Metadata
Show full item recordAbstract
Current systems produce a large and growing amount of data, which is often referred to as Big Data. Providing valuable insights from this data requires new computing systems to store and process it efficiently. For a fast response time, Big Data typically relies on in-memory computing, which requires a cluster of machines with enough aggregate DRAM to accommodate the entire datasets for the duration of the computation. Big Data typically exceeds several terabytes, therefore this approach can incur significant overhead in power, space and equipment. If the amount of DRAM is not sufficient to hold the working-set of a query, the performance deteriorates catastrophically. Although NAND flash can provide high-bandwidth data access and has higher capacity density and lower cost per bit than DRAM, flash storage has dramatically different characteristics than DRAM, such as large access granularity and longer access latency. Therefore, there are many challenges for Big-Data applications to enable flash-centric computing and achieve performance comparable to that of in-memory computing.
This thesis presents flash-centric hardware architectures that provide high processing throughput for data-intensive applications while hiding long flash access latency. Specifically we describe two novel flash-centric hardware accelerators, BlueCache and AQUOMAN. These systems lower the cost of two common data-center workloads, key-value cache and SQL analytics. We have built BlueCache and AQUOMAN using FPGAs and flash storage, and show that they can provide competitive performance of computing Big-Data applications with multi-terabyte datasets. BlueCache provides a 10-100X cheaper key-value cache than DRAM-based solution, and can outperform DRAM-based system when the latter has more than 7.4% misses for a read-intensive workloads. A desktop-class machine with single instance of 1TB AQUOMAN disk can achieve performance similar to that of a dual-socket general-purpose server with off-the-shelf SSDs. We believe BlueCache and AQUOMAN can bring down the cost of acquiring and operating high-performance computing systems for data-center-scale Big-Data applications dramatically.
Date issued
2021-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology