Selective Vectorization for Short-Vector Instructions

Amarasinghe, Saman; Rabbah, Rodric; Larsen, Samuel

dc.contributor.advisor	Saman Amarasinghe
dc.contributor.author	Amarasinghe, Saman	en_US
dc.contributor.author	Rabbah, Rodric	en_US
dc.contributor.author	Larsen, Samuel	en_US
dc.contributor.other	Computer Architecture	en
dc.date.accessioned	2009-12-18T19:30:12Z
dc.date.available	2009-12-18T19:30:12Z
dc.date.issued	2009-12-18
dc.identifier.uri	http://hdl.handle.net/1721.1/50235
dc.description.abstract	Multimedia extensions are nearly ubiquitous in today's general-purpose processors. These extensions consist primarily of a set of short-vector instructions that apply the same opcode to a vector of operands. Vector instructions introduce a data-parallel component to processors that exploit instruction-level parallelism, and present an opportunity for increased performance. In fact, ignoring a processor's vector opcodes can leave a significant portion of the available resources unused. In order for software developers to find short-vector instructions generally useful, however, the compiler must target these extensions with complete transparency and consistent performance. This paper describes selective vectorization, a technique for balancing computation across a processor's scalar and vector units. Current approaches for targeting short-vector instructions directly adopt vectorizing technology first developed for supercomputers. Traditional vectorization, however, can lead to a performance degradation since it fails to account for a processor's scalar resources. We formulate selective vectorization in the context of software pipelining. Our approach creates software pipelines with shorter initiation intervals, and therefore, higher performance. A key aspect of selective vectorization is its ability to manage transfer of operands between vector and scalar instructions. Even when operand transfer is expensive, our technique is sufficiently sophisticated to achieve significant performance gains. We evaluate selective vectorization on a set of SPEC FP benchmarks. On a realistic VLIW processor model, the approach achieves whole-program speedups of up to 1.35x over existing approaches. For individual loops, it provides speedups of up to 1.75x.	en_US
dc.format.extent	25 p.	en_US
dc.relation.ispartofseries	MIT-CSAIL-TR-2009-064
dc.rights	Creative Commons Attribution 3.0 Unported	en
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/
dc.subject	SIMD	en_US
dc.subject	Vectorization	en_US
dc.subject	Compiler	en_US
dc.title	Selective Vectorization for Short-Vector Instructions	en_US

Files in this item

Name:: MIT-CSAIL-TR-2009-064.pdf
Size:: 494.0Kb
Format:: PDF

View/Open

Name:: license_rdf
Size:: 591bytes
Format:: Unknown

View/Open

This item appears in the following Collection(s)

CSAIL Technical Reports (July 1, 2003 - present)

Show simple item record