Partial replay of long-running applications

Cheung, Alvin; Solar-Lezama, Armando; Madden, Samuel

Author(s)

Madden, Samuel R.; Solar-Lezama, Armando; Cheung, Alvin K.

DownloadMadden_Partial replay.pdf (324.8Kb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/

Metadata

Show full item record

Abstract

Bugs in deployed software can be extremely difficult to track down. Invasive logging techniques, such as logging all non-deterministic inputs, can incur substantial runtime overheads. This paper shows how symbolic analysis can be used to re-create path equivalent executions for very long running programs such as databases and web servers. The goal is to help developers debug such long-running programs by allowing them to walk through an execution of the last few requests or transactions leading up to an error. The challenge is to provide this functionality without the high runtime overheads associated with traditional replay techniques based on input logging or memory snapshots. Our approach achieves this by recording a small amount of information about program execution, such as the direction of branches taken, and then using symbolic analysis to reconstruct the execution of the last few inputs processed by the application, as well as the state of memory before these inputs were executed. We implemented our technique in a new tool called bbr. In this paper, we show that it can be used to replay bugs in long-running single-threaded programs starting from the middle of an execution. We show that bbr incurs low recording overhead (avg. of 10%) during program execution, which is much less than existing replay schemes. We also show that it can reproduce real bugs from web servers, database systems, and other common utilities.

Date issued

2011-09

URI

http://hdl.handle.net/1721.1/73450

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE '11)

Publisher

Association for Computing Machinery (ACM)

Citation

Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2011. Partial replay of long-running applications. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE '11). ACM, New York, NY, USA, 135-145.

Version: Author's final manuscript

ISBN

978-1-4503-0443-6

Collections

MIT Open Access Articles

DSpace@MIT