dc.contributor.advisor | Charles E. Leiserson. | en_US |
dc.contributor.author | Pantawongdecha, Payut | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2016-12-22T15:16:50Z | |
dc.date.available | 2016-12-22T15:16:50Z | |
dc.date.copyright | 2016 | en_US |
dc.date.issued | 2016 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/105968 | |
dc.description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. | en_US |
dc.description | This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. | en_US |
dc.description | Cataloged from student-submitted PDF version of thesis. | en_US |
dc.description | Includes bibliographical references (pages 73-75). | en_US |
dc.description.abstract | Divide and conquer is an important concept in computer science. It is used ubiquitously to simplify and speed up programs. However, it needs to be optimized, with respect to parameter settings for example, in order to achieve the best performance. The problem boils down to searching for the best implementation choice on a given set of requirements, such as which machine the program is running on. The goal of this thesis is to apply and evaluate the Ztune approach [14] on serial divide-and-conquer matrix-vector multiplication. We implemented Ztune to autotune serial divide-and-conquer matrix-vector multiplication on machines with different hardware configurations, and found that Ztuneoptimized codes ran 1%-5% faster than the hand-optimized counterparts. We also compared Ztune-optimized results with other matrix-vector multiplication libraries including the Intel Math Kernel Library and OpenBLAS. Since the matrix-vector multiplication problem is a level 2 BLAS, it is not as computationally intensive as level 3 BLAS problems such as matrix-matrix multiplication and stencil computation. As a result, the measurement in matrix-vector multiplication is more prone to error from factors such as noise, cache alignment of the matrix, and cache states, which lead to wrong decision choices for Ztune. We explored multiple options to get more accurate measurements and demonstrated the techniques that remedied these issues. Lastly, we applied the Ztune approach to matrix-matrix multiplication, and we were able to achieve 2%-85% speedup compared to the hand-tuned code. This thesis represents joint work with Ekanathan Palamadai Natarajan. | en_US |
dc.description.statementofresponsibility | by Payut Pantawongdecha. | en_US |
dc.format.extent | 75 pages | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | Autotuning divide-and-conquer matrix-vector multiplication | en_US |
dc.type | Thesis | en_US |
dc.description.degree | M. Eng. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.identifier.oclc | 965614821 | en_US |