Sawmill: From Logs to Causal Diagnosis of Large Systems
Author(s)
Markakis, Markos; Chen, An Bo; Youngmann, Brit; Gao, Trinity; Zhang, Ziyu; Shahout, Rana; Chen, Peter Baile; Liu, Chunwei; Sabek, Ibrahim; Cafarella, Michael; ... Show more Show less
Download3626246.3654731.pdf (1.393Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
Causal analysis is an essential lens for understanding complex system dynamics in domains as varied as medicine, economics and law. Computer systems are often similarly complex, but much of the information about them is only available in long, messy, semi-structured log files. This demo presents Sawmill, an open-source system that makes it possible to extract causal conclusions from log files. Sawmill employs methods drawn from the areas of data transformation, cleaning, and extraction in order to transform logs into a representation amenable to causal analysis. It gives log-derived variables human-understandable names and distills the information present in a log file around a user's chosen causal units (e.g. users or machines), generating appropriate aggregated variables for each causal unit. It then leverages original algorithms to efficiently use this representation for the novel process of Exploration-based Causal Discovery - the task of constructing a sufficient causal model of the system from available data. Users can engage with this process via an interactive interface, ultimately making causal inference possible using off-the-shelf tools. SIGMOD'24 participants will be able to use Sawmill to efficiently answer causal questions about logs. We will guide attendees through the process of quantifying the impact of parameter tuning on query latency using real-world PostgreSQL server logs, before letting them test Sawmill on additional logs with known causal effects but varying difficulty. A companion video for this submission is available online.
Description
SIGMOD-Companion ’24, June 9–15, 2024, Santiago, AA, Chile
Date issued
2024-06-09Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence LaboratoryPublisher
ACM|Companion of the 2024 International Conference on Management of Data
Citation
Markakis, Markos, Chen, An Bo, Youngmann, Brit, Gao, Trinity, Zhang, Ziyu et al. 2024. "Sawmill: From Logs to Causal Diagnosis of Large Systems."
Version: Final published version
ISBN
979-8-4007-0422-2
Collections
The following license files are associated with this item: