Asynchronous Failure Detectors

Cornejo, Alejandro; Lynch, Nancy; Sastry, Srikanth

Author(s)

Cornejo, Alejandro; Lynch, Nancy; Sastry, Srikanth

DownloadMIT-CSAIL-TR-2013-002.pdf (594.4Kb)

Other Contributors

Theory of Computation

Advisor

Nancy Lynch

Metadata

Show full item record

Abstract

Failure detectors -- oracles that provide information about process crashes -- are an important abstraction for crash tolerance in distributed systems. The generality of failure-detector theory, while providing great expressiveness, poses significant challenges in developing a robust hierarchy of failure detectors. We address some of these challenges by proposing (1) a variant of failure detectors called asynchronous failure detectors and (2) an associated modeling framework. Unlike the traditional failure-detector framework, our framework eschews real-time completely. We show that asynchronous failure detectors are sufficiently expressive to include several popular failure detectors including, but not limited to, the canonical Chandra-Toueg failure detectors, Sigma and other quorum failure detectors, Omega, anti-Omega, Omega^k, and Psi_k. Additionally, asynchronous failure detectors satisfy many desirable properties: they are self-implementable, guarantee that stronger asynchronous failure-detectors solve harder problems, and ensure that their outputs encode no information other than the set of crashed processes. We introduce the notion of a failure detector being representative for a problem to capture the idea that some problems encode the same information about process crashes as their weakest failure detectors do. We show that a large class of problems, called bounded problems, do not have representative failure detectors. Finally, we use the asynchronous failure-detector framework to show how sufficiently strong AFDs circumvent the impossibility of consensus in asynchronous systems.

Description

This report is superseded by MIT-CSAIL-TR-2013-025.

Date issued

2013-01-30

URI

http://hdl.handle.net/1721.1/76716

Series/Report no.

MIT-CSAIL-TR-2013-002

Keywords

Asynchronous System, Fault-Tolerance, I/O Automata

DSpace@MIT