Login

Latency reduction techniques in chip multiprocessor cache systems

Show full item record




Title: Latency reduction techniques in chip multiprocessor cache systems
Author: Zhang, Michael Ruogu, 1977-
Other Contributors: Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor: Krste Asanović.
Department: Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Publisher: Massachusetts Institute of Technology
Issue Date: 2006
Abstract: Single-chip multiprocessors (CMPs) solve several bottlenecks facing chip designers today. Compared to traditional superscalars, CMPs deliver higher performance at lower power for thread-parallel workloads. In this thesis, we consider tiled CMPs, a class of CMPs where each tile contains a slice of the total on-chip L2 cache storage, and tiles are connected by an on-chip network. Two basic schemes are currently used to manage L2 slices. First, each slice can be used as a private L2 for the tile. Private L2 caches provide the lowest hit latency but reduce the total effective cache capacity because each tile creates a local copy of any block it touches. Second, all slices are aggregated to form a single large L2 shared by all tiles. A shared L2 cache increases the effective cache capacity for shared data, but incurs longer hit latencies when L2 data is on a remote tile. In practice, either private or shared works better for a given workload. We present two new policies, victim replication and victim migration, both of which combine the advantages of private and shared designs. They are variants of the shared scheme which attempt to keep copies of local L1 cache victims within the local L2 cache slice.(cont.) Hits to these replicated copies reduce the effective latency of the shared L2 cache, while retaining the benefits of a higher effective capacity for shared data. We evaluate the various schemes using full-system simulation of single-threaded, multi-threaded, and multi-programmed workloads running on an eight-processor tiled CMP. We show that both techniques achieve significant performance improvement over baseline private and shared schemes for these workloads.
Description: Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 117-122).
URI: http://hdl.handle.net/1721.1/36183
Keywords: Electrical Engineering and Computer Science.

Files in this item

Files Size Format View Description
Preview, non-printable (open to all) 33.94Mb PDF View/Open Preview, non-printable (open to all)
Full printable version (MIT only) 33.94Mb PDF View/Open Full printable version (MIT only)

This item appears in the following Collection(s)

Show full item record

Search DSpace@MIT


Advanced Search

Browse

My Account

Links