Failure Detectors
Tushar Deepak Chandra, and Sam Toueg. Unreliable failure detectors for reliable distributed systems.
Journal of the ACM, 43(2): 225--267, March 1996.
Tushar~Deepak Chandra, Vassos Hadzilacos, and Sam Toueg. The weakest failure detector for solving consensus.
Journal of the ACM, 43(4): 685--722, July 1996.
Wei Chen, Sam Toueg, and Marcos~Kawazoe Aguilera. On the quality of service of failure detectors.
International Conference on Dependable Systems and Networks (DSN), 2000.
Atomic Bcast and Consensus
Vassos Hadzilacos, and Sam Toueg. A modular approach to fault-tolerant broadcasts and related problems.
Technical Report TR94-1425. Department of Computer Science, Cornell University, Ithaca NY. May 1994.
Vassos Hadzilacos, and Sam Toueg. Fault-tolerant broadcasts and related problems. In Sape Mullender, editor, Distributed Systems, chapter~5, pages 97--145. ACM Press and Addison-Wesley, 1993. Second Edition.
Commit Protocols
D. Skeen, and M. Stonebraker. A formal model of crash recovery in a distributed system. IEEE Transactions on Software Engineering, SE-9 NO.3, May 1983.
I. Keidar, and D. Dolev. Efficient message ordering in dynamic networks.
15th ACM Symposium on Principles of Distributed Computing (PODC), pages 68--76, May 1996.
R. Guerraoui. Revisiting the relationship between non-blocking atomic commitment and consensus.
9th International Workshop on Distributed Algorithms (WDAG), volume 972 of Lecture Notes in Computer Science, pages 87--100. Springer Verlag, September 1995.
Consensus and Related Problems
Idit Keidar, and Sergio Rajsbaum. On the cost of fault-tolerant consensus when there are no faults -a tutorial.
Technical Report MIT-LCS-TR-821, May 2001.
L. Lamport. The part-time parliament.
ACM Transactions on Computer Systems, 16(2): 133--169, May 1998.
Butler W. Lampson. The ABCD's of Paxos In Twentieth ACM Symposium on Principles of Distributed Computing (PODC'01), Newport, RI, August 2001.
Clock Synchronization
Hagit Attiya, and Jennifer Welch. Distributed Computing: Fundamentals, Simulations and Advanced Topics McGraw-Hill, 1998.
Leslie Lamport, and P. M. Melliar-Smith. Byzantine clock synchronization. In Proceedings of the Third Annual ACM Symposium on Principles of Distributed Computing, pages 68--74, Vancouver, B.C., Canada, 1984.
Joseph Y. Halpern, Barbara Simons, H. Raymond Strong, and Danny Dolev. Fault-tolerant clock synchronization. In Proceedings of the Third Annual ACM Symposium on Principles of Distributed Computing, pages 89--102, Vancouver, B.C., Canada, 1984.
Michael J. Fischer, Nancy A. Lynch, and Michael Merritt. Easy impossibility proofs for distributed consensus problems. Distributed Computing, 1(1): 26--39, January 1986.
Byzantine Quorum Systems
Dahlia Malkhi, and Michael Reiter. Byzantine quorum systems.
Journal of Distributed Computing, 11(4): 203--213, 1998.
Dahlia Malkhi, Michael Reiter, and Rebecca Wright. Probabilistic quorum systems.
Proceeding of the 16th Annual ACM Symposium on the Principles of Distributed Computing (PODC 97), pages 267--273, Santa Barbara, CA, August 1997.
Lorenzo Alvisi, Dahlia Malkhi, Evelyn Pierce, Michael Reiter, and Rebecca Wright. Dynamic Byzantine Quorum Systems
International Conference on Dependable Systems and Networks (DSN, FTCS-30 and DCCA-8), New York, 2000.
Replicated Data Management
Leslie Lamport. Using time instead of timeout for fault-tolerant distributed systems. ACM
Transactions on Programming Languages and Systems, 6(2): 254--280, April 1984.
F. B. Schneider. Replication management using the state-machine approach. In Sape Mullender, editor, Distributed Systems, chapter~7, pages 169--197. ACM Press and Addison-Wesley, 1993. Second Edition.
Miguel Castro, and Barbara Liskov. Practical Byzantine fault tolerance.
Proceedings of the Third Symposium on Operating Systems Design and Implementation, pages 173--186, New Orleans, Lousianna, USA, February 1999. USENIX Association.
G. Chokler, D. Malkhi, and M. Reiter. Backoff protocols for distributed mutual exclusion and ordering.
21 IEEE International Conference on Distributed Computing Systems (ICDCS, April 2001).
Communication
Mahesh Jayaram, and George Varghese. Crash failures can drive protocols to arbitrary states.
Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing, pages 247--256, Philadelphia, PA, May 1996.
Kenneth Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu, and Yaron Minsky. Bimodal multicast.
ACM Transactions on Computer Systems, 17(2), May 1999.
Ziv Bar-Joseph, Idit Keidar, Tal Anker, and Nancy Lynch. QoS Preserving Totally Ordered Multicast.
5th International Conference On Principles Of Distributed Systems (OPODIS), pages 143--162, Paris, France, December 2000.
Ziv Bar-Joseph, Idit Keidar, and Nancy Lynch. A fault-tolerant dynamic atomic broadcast algorithm with qos guarantees.
MIT Technical Technical Report, December 2000. Work in progress.
Group Communication
Kenneth P. Birman. Building Secure and Reliable Network Applications. Manning Publications, Greenwich, CT, 1996.
K. Birman and T. Joseph. Exploiting virtual synchrony in distributed systems. In 11th Symposium on Operating Systems Principles (SOSP), pages 123--138. ACM, November 1987.
D. Dolev and D. Malki. Transis approach to high availability cluster communications.
Communications of the ACM, 39(4): 64--70, 1996.
Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, and P. Ciarfella. The Totem single-ring ordering and membership protocol.
ACM Transactions on Computer Systems, 13(4), November 1995.
Flaviu Cristian, and Frank Schmuck. Agreeing on processor group membership in asynchronous distributed systems.
Technical Report CSE95-428, University of California-San Diego, La Jolla, CA 92093-0114, 1995.
Y. Amir, G. V. Chockler, D. Dolev, and R. Vitenberg. Efficient state transfer in partitionable environments.
2nd European Research Seminar on Advances in Distributed Systems (ERSADS'97), pages 183--192. BROADCAST (ESPRIT WG 22455), Operating Systems Laboratory, Swiss Federal Institute of Technology, Lausanne, March 1997.
Alan Fekete, Nancy Lynch, and Alex Shvartsman. Specifying and using a partitionable group communication service.
ACM Transactions on Computer Systems, 19(2): 171--216, May 2001.
Idit Keidar, and Roger Khazan. Client-server approach to virtually synchronous group multicast: Specifications and algorithms.
IEEE 20th International Conference on Distributed Computing Systems (ICDCS), pages 344--355, Taipei, Taiwan, April 2000.
R. Vitenberg, I. Keidar, G. V. Chockler, and D. Dolev. Group Communication System Specifications: A Comprehensive Study.
Technical report, Institute of Computer Science, The Hebrew University of Jerusalem, Jerusalem, Israel, 1999.
Self-Stabilization
Edger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Communication of the ACM, 17(11): 643--644, November 1974.
Shlomi Dolev. Self-Stabilization. MIT Press, 2000.