Cache optimizations for stream programs

SermuliÅ Å¡, JÄ nis

dc.contributor.advisor	Saman Amarasinghe.	en_US
dc.contributor.author	SermuliÅ Å¡, JÄ nis	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2006-07-13T15:18:04Z
dc.date.available	2006-07-13T15:18:04Z
dc.date.copyright	2005	en_US
dc.date.issued	2005	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/33359
dc.description	Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.	en_US
dc.description	Includes bibliographical references (leaves 73-75).	en_US
dc.description.abstract	As processor speeds continue to increase, the memory bottleneck remains a primary impediment to attaining performance. Effective use of the memory hierarchy can result in significant performance gains. This thesis focuses on a set of transformations that either reduce cache-miss rate or reduce the number of memory accesses for the class of streaming applications, which are becoming increasingly prevalent in embedded, desktop and high-performance processing. A fully automated optimization algorithm is presented that reduces the memory bottleneck for stream applications developed in the high-level stream programming language StreamIt. This thesis presents four memory optimizations: 1) cache aware fusion, which combines adjacent program components while respecting instruction and data cache constraints, 2) execution scaling, which judiciously repeats execution of program components to improve instruction and state locality, 3) scalar replacement, which converts certain data buffers into a sequence of scalar variables that can be register allocated, and 4) optimized buffer management, which reduces the overall number of memory accesses issued by the program. The cache aware fusion and execution scaling reduce the instruction and data cache-miss rates and are founded upon a simple and intuitive cache model that quantifies the temporal locality for a sequence of actor executions.	en_US
dc.description.abstract	(cont.) The scalar replacement and optimized buffer management reduce the number of memory accesses. An experimental evaluation of the memory optimizations is presented for three different architectures: StrongARM 1110, Pentium 3 and Itanium 2. Compared to unoptimized StreamIt code, the memory optimizations presented in this thesis yield a 257% speedup on the StrongARM, a 154% speedup on the Pentium 3, and a 152% speedup on Itanium 2. These numbers represent averages over our streaming benchmark suite. The most impressive speedups are demonstrated on an embedded processor StrongARM, which has only a single data and a single instruction cache, thus increasing the overall cost of memory operations and cache misses.	en_US
dc.description.statementofresponsibility	by JÄnis. SermuliÅ†Å¡.	en_US
dc.format.extent	75 leaves	en_US
dc.format.extent	3129060 bytes
dc.format.extent	3132114 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	application/pdf
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Cache optimizations for stream programs	en_US
dc.type	Thesis	en_US
dc.description.degree	M.Eng.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	62413820	en_US

Files in this item

Name:: 62413820-MIT.pdf
Size:: 3.274Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record