Show simple item record

dc.contributor.advisorSaman Amarasinghe
dc.contributor.authorZhao, Qin
dc.contributor.authorRabbah, Rodric
dc.contributor.authorAmarasinghe, Saman
dc.contributor.authorRudolph, Larry
dc.contributor.authorWong, Weng-Fai
dc.contributor.otherComputer Architecture
dc.date.accessioned2006-09-25T21:21:43Z
dc.date.available2006-09-25T21:21:43Z
dc.date.issued2006-09-25
dc.identifier.otherMIT-CSAIL-TR-2006-067
dc.identifier.urihttp://hdl.handle.net/1721.1/34013
dc.description.abstractModern memory systems play a critical role in the performance ofapplications, but a detailed understanding of the application behaviorin the memory system is not trivial to attain. It requires timeconsuming simulations of the memory hierarchy using long traces, andoften using detailed modeling. It is increasingly possible to accesshardware performance counters to measure events in the memory system,but the measurements remain coarse grained, better suited forperformance summaries than providing instruction level feedback. Theavailability of a low cost, online, and accurate methodology forderiving fine-grained memory behavior profiles can prove extremelyuseful for runtime analysis and optimization of programs.This paper presents a new methodology for Ubiquitous MemoryIntrospection (UMI). It is an online and lightweight mini-simulationmethodology that focuses on simulating short memory access tracesrecorded from frequently executed code regions. The simulations arefast and can provide profiling results at varying granularities, downto that of a single instruction or address. UMI naturally complementsruntime optimizations techniques and enables new opportunities formemory specific optimizations.In this paper, we present a prototype implementation of a runtimesystem implementing UMI. The prototype is readily deployed oncommodity processors, requires no user intervention, and can operatewith stripped binaries and legacy software. The prototype operateswith an average runtime overhead of 20% but this slowdown is only 6%slower than a state of the art binary instrumentation tool. We used32 benchmarks, including the full suite of SPEC2000 benchmarks, forour evaluation. We show that the mini-simulation results accuratelyreflect the cache performance of two existing memory systems, anIntel Pentium~4 and an AMD Athlon MP (K7) processor. We alsodemonstrate that low level profiling information from the onlinesimulation can serve to identify high-miss rate load instructions with a77% rate of accuracy compared to full offline simulations thatrequired days to complete. The online profiling results are used atruntime to implement a simple software prefetching strategy thatachieves a speedup greater than 60% in the best case.
dc.format.extent23 p.
dc.format.extent278689 bytes
dc.format.extent1358949 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypeapplication/postscript
dc.language.isoen_US
dc.relation.ispartofseriesMassachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory
dc.subjectPerformance Monitoring
dc.subjectOnline Simulation
dc.subjectRuntime Optimization
dc.subjectCache Modelling
dc.subjectMemory Hierarchy
dc.titleUbiquitous Memory Introspection (Preliminary Manuscript)


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record