Distributed data as a choice in PetaBricks
Author(s)
Watanaprakornkul, Phumpong
DownloadFull printable version (990.6Kb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Saman Amarasinghe.
Terms of use
Metadata
Show full item recordAbstract
Traditionally, programming for large computer systems requires programmers to hand place the data and computation across all system components such as memory, processors, and GPUs. As each system can have sufficiently different compositions, the application partitioning, as well as algorithms and data structures, has to be different for each system. Thus, hardcoding the partitioning not only is difficult but also makes the programs not performance portable. PetaBricks solves this problem by allowing programmers to specify multiple algorithmic choices to compute the outputs, and let the system decide how to apply these choices. Since PetaBricks can determine optimized computation order and data placement with auto-tuning, programmers do not need to modify the programs when migrating to a new system. In this thesis, we address the problem of automatically partitioning PetaBricks programs across a cluster of distributed memory machines. It is complicated to decide which algorithm to use, where to place data, and how to distribute computation. We simplify the decision by auto-tuning data placement, and moving computation to where the most data is. Another problem is using distributed data and scheduler can be costly. In order to eliminate distributed overhead, we generate multiple versions of code for different types of data access, and automatically switch to run a shared memory version when the data is local to achieve better performance. To show that the system can scale, we run PetaBricks benchmark on an 8-node system, with a total of 96 cores, and a 64-node system, with a total of 512 cores. We compare the performance with a non-distributed version of PetaBricks, and, in some cases, we get linear speedups.
Description
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student submitted PDF version of thesis. Includes bibliographical references (p. 69-71).
Date issued
2012Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.