MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • Computer Science and Artificial Intelligence Lab (CSAIL)
  • CSAIL Digital Archive
  • CSAIL Technical Reports (July 1, 2003 - present)
  • View Item
  • DSpace@MIT Home
  • Computer Science and Artificial Intelligence Lab (CSAIL)
  • CSAIL Digital Archive
  • CSAIL Technical Reports (July 1, 2003 - present)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Exploiting Vector Parallelism in Software Pipelined Loops

Author(s)
Larsen, Sam; Rabbah, Rodric; Amarasinghe, Saman
Thumbnail
DownloadMIT-CSAIL-TR-2005-039.ps (19246Kb)
Additional downloads
Other Contributors
Computer Architecture
Metadata
Show full item record
Abstract
An emerging trend in processor design is the incorporation of short vector instructions into the ISA. In fact, vector extensions have appeared in most general-purpose microprocessors. To utilize these instructions, traditional vectorization technology can be used to identify and exploit data parallelism. In contrast, efficient use of a processor\'s scalar resources is typically achieved through ILP techniques such as software pipelining. In order to attain the best performance, it is necessary to utilize both sets of resources. This paper presents a novel approach for exploiting vector parallelism in a software pipelined loop. At its core is a method for judiciously partitioning operations between vector and scalar resources. The proposed algorithm (i) lowers the burden on the scalar resources by offloading computation to the vector functional units, and (ii) partially (or fully) inhibits the optimizations when full vectorization will decrease performance. ! This results in better resource usage and allows for software pipelining with shorter initiation intervals. Although our techniques complement statically scheduled machines most naturally, we believe they are applicable to any architecture that tightly integrates support for ILP and data parallelism.An important aspect of the proposed methodology is its ability to manage explicit communication of operands between vector and scalar instructions. Our methodology also allows for a natural handling of misaligned vector memory operations. For architectures that provide hardware support for misaligned references, software pipelining effectively hides the latency of these potentially expensive instructions. When explicit alignment is required in software, our algorithm accounts for these extra costs and vectorizes only when it is profitable. Finally, our heuristic can take advantage of alignment information where it is available.We evaluate our methodology using several DSP and SPEC FP benchmarks. Compared to software pipelining, our approach is able to achieve an average speedup of 1.30x and 1.18x for the two benchmark sets, respectively.
Date issued
2005-06-03
URI
http://hdl.handle.net/1721.1/30423
Other identifiers
MIT-CSAIL-TR-2005-039
MIT-LCS-TM-651
Series/Report no.
Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory

Collections
  • CSAIL Technical Reports (July 1, 2003 - present)

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.