dc.contributor.advisor | Sanchez, Daniel | |
dc.contributor.author | Lee, Hyun Ryong | |
dc.date.accessioned | 2022-08-29T16:10:21Z | |
dc.date.available | 2022-08-29T16:10:21Z | |
dc.date.issued | 2022-05 | |
dc.date.submitted | 2022-06-21T19:25:45.947Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/144767 | |
dc.description.abstract | Benchmarks that closely match the behavior of production workloads are crucial to design and provision computer systems. However, current approaches fall short: First, open-source benchmarks use public datasets that cause different behavior from production workloads. Second, black-box workload cloning techniques generate synthetic code that imitates the target workload, but the resulting program fails to capture most workload characteristics, such as microarchitectural bottlenecks or time-varying behavior.
Generating code that mimics a complex application is an extremely hard problem. Instead, this thesis proposes a different and easier approach to benchmark synthesis. The key insight is that for many production workloads the program is publicly available, or there is a reasonably similar open-source program. In this case, generating the right dataset is sufficient to produce an accurate benchmark.
Based on this observation, this thesis presents Datamime, a profile-guided approach to generate representative benchmarks for production workloads. Datamime uses the performance profiles of a target workload to generate a dataset that, when used by a benchmark program, behaves very similarly to the target workload in terms of its microarchitectural characteristics.
We evaluate Datamime on several datacenter workloads. Datamime generates synthetic benchmarks that closely match the microarchitectural features of these workloads, with a mean absolute percentage error of 4% on IPC. Microarchitectural behavior stays close across processor types. Finally, time-varying behaviors are also replicated, making these benchmarks useful to e.g. characterize and optimize tail latency. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | In Copyright - Educational Use Permitted | |
dc.rights | Copyright MIT | |
dc.rights.uri | http://rightsstatements.org/page/InC-EDU/1.0/ | |
dc.title | Generating Representative Benchmarks by Automatically Synthesizing Datasets | |
dc.type | Thesis | |
dc.description.degree | S.M. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.identifier.orcid | 0000-0002-8627-2781 | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Science in Electrical Engineering and Computer Science | |