MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Autotuning divide-and-conquer matrix-vector multiplication

Author(s)
Pantawongdecha, Payut
Thumbnail
DownloadFull printable version (336.6Kb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Charles E. Leiserson.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Divide and conquer is an important concept in computer science. It is used ubiquitously to simplify and speed up programs. However, it needs to be optimized, with respect to parameter settings for example, in order to achieve the best performance. The problem boils down to searching for the best implementation choice on a given set of requirements, such as which machine the program is running on. The goal of this thesis is to apply and evaluate the Ztune approach [14] on serial divide-and-conquer matrix-vector multiplication. We implemented Ztune to autotune serial divide-and-conquer matrix-vector multiplication on machines with different hardware configurations, and found that Ztuneoptimized codes ran 1%-5% faster than the hand-optimized counterparts. We also compared Ztune-optimized results with other matrix-vector multiplication libraries including the Intel Math Kernel Library and OpenBLAS. Since the matrix-vector multiplication problem is a level 2 BLAS, it is not as computationally intensive as level 3 BLAS problems such as matrix-matrix multiplication and stencil computation. As a result, the measurement in matrix-vector multiplication is more prone to error from factors such as noise, cache alignment of the matrix, and cache states, which lead to wrong decision choices for Ztune. We explored multiple options to get more accurate measurements and demonstrated the techniques that remedied these issues. Lastly, we applied the Ztune approach to matrix-matrix multiplication, and we were able to achieve 2%-85% speedup compared to the hand-tuned code. This thesis represents joint work with Ekanathan Palamadai Natarajan.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
 
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
 
Cataloged from student-submitted PDF version of thesis.
 
Includes bibliographical references (pages 73-75).
 
Date issued
2016
URI
http://hdl.handle.net/1721.1/105968
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.