Generating Representative Benchmarks by Automatically Synthesizing Datasets

Lee, Hyun Ryong

dc.contributor.advisor	Sanchez, Daniel
dc.contributor.author	Lee, Hyun Ryong
dc.date.accessioned	2022-08-29T16:10:21Z
dc.date.available	2022-08-29T16:10:21Z
dc.date.issued	2022-05
dc.date.submitted	2022-06-21T19:25:45.947Z
dc.identifier.uri	https://hdl.handle.net/1721.1/144767
dc.description.abstract	Benchmarks that closely match the behavior of production workloads are crucial to design and provision computer systems. However, current approaches fall short: First, open-source benchmarks use public datasets that cause different behavior from production workloads. Second, black-box workload cloning techniques generate synthetic code that imitates the target workload, but the resulting program fails to capture most workload characteristics, such as microarchitectural bottlenecks or time-varying behavior. Generating code that mimics a complex application is an extremely hard problem. Instead, this thesis proposes a different and easier approach to benchmark synthesis. The key insight is that for many production workloads the program is publicly available, or there is a reasonably similar open-source program. In this case, generating the right dataset is sufficient to produce an accurate benchmark. Based on this observation, this thesis presents Datamime, a profile-guided approach to generate representative benchmarks for production workloads. Datamime uses the performance profiles of a target workload to generate a dataset that, when used by a benchmark program, behaves very similarly to the target workload in terms of its microarchitectural characteristics. We evaluate Datamime on several datacenter workloads. Datamime generates synthetic benchmarks that closely match the microarchitectural features of these workloads, with a mean absolute percentage error of 4% on IPC. Microarchitectural behavior stays close across processor types. Finally, time-varying behaviors are also replicated, making these benchmarks useful to e.g. characterize and optimize tail latency.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Generating Representative Benchmarks by Automatically Synthesizing Datasets
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcid	0000-0002-8627-2781
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: Lee-hr_lee-SM-EECS-2022-thesis.pdf
Size:: 657.2Kb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record