This is an archived course. A more recent version may be available at ocw.mit.edu.

Readings

Amazon logo When you click the Amazon logo to the left of any citation and purchase the book (or other media) from Amazon.com, MIT OpenCourseWare will receive up to 10% of this purchase and any other purchases you make during that visit. This will not increase the cost of your purchase. Links provided are to the US Amazon site, but you can also support OCW through Amazon sites in other regions. Learn more.

A breakdown of supplementary readings per topic follows the session wise listing of reading assignments in the table below.

Unless otherwise noted, reading assignments are from the course text:

Lynch, Nancy A. Distributed Algorithms. San Francisco, CA: Morgan Kaufmann, 1997. ISBN: 1558603484.

LEC # TOPICS READINGS
1 Course Overview

Synchronous Networks

Leader Election in Synchronous Ring Networks
Chapter 1, 2 (Skim)

Chapter 3, sections 3.1 - 3.4
2 Basic Computational Tasks in Synchronous Networks: Leader Election

Breadth-first Search

Shortest Paths

Broadcast and Convergecast
Chapter 3, sections 3.1 - 3.6
3 Breadth-first Search (cont.)

Shortest Paths (cont.)

Broadcast and Convergecast (cont.)

Spanning Trees

Minimum Spanning Trees
Chapter 4, sections 4.1 - 4.4
4 Fault-tolerant Consensus

Link Failures: The Two Generals Problem

Process Failures (Stopping, Byzantine)

Algorithms for Agreement with Stopping Failures
Chapter 5, section 5.1

Chapter 6, sections 6.1, 6.2

Lamport, L., M. Pease, and R. Shostak. "The Albanian Generals Problem." Technical Report CSL-97, SRI International Computer Science Laboratory, August 22, 1979.
5 Exponential Information Gathering

Algorithms for Agreement with Byzantine Failures
Chapter 6, sections 6.2, 6.7
6 Number-of-processes Lower Bounds for Byzantine Agreement

Weak Byzantine Agreement

Time Bounds for Consensus Problems
Chapter 6, sections 6.3 - 6.6

Aguilera, Marcos Kawazoe, and Sam Toueg. "A simple bivalency proof that t-resilient consensus requires t+1 rounds." Information Processing Letters 71, no. 3-4 (August 1999): 155-158.
7 Other Kinds of Consensus Problems: $k$-agreement

Approximate Agreement

Distributed Commit
Chapter 7, sections 7.1 - 7.3 (skip 7.3.4)
8 Asynchronous Distributed Computing

Formal Modeling of Asynchronous Systems using I/O Automata
Chapter 7, section 7.3

Chapter 8
9 Proof Methods

Non-fault-tolerant Algorithms for Asynchronous Networks

Leader Election, Breadth-first Search, Shortest Paths, Broadcast and Convergecast
Chapter 8, sections 8.2 - 8.5

Chapter 14 and 15
10 Non-fault-tolerant Algorithms for Asynchronous Networks (cont.)

Leader Election, Breadth-first Search, Shortest Paths, Broadcast and Convergecast (cont.)
Chapter 15, sections 15.3 - 15.4
11 Spanning Trees

Gallager et al. Minimum Spanning Tree Algorithm
Chapter 15, sections 15.4, 15.5
12 Synchronizers Chapter 16
13 Synchronizer Applications

Time, Clocks, and the Ordering of Events

Vector Timestamps
Chapter 16, section 16.6

Chapter 18

Lamport, Leslie. "Time, Clocks and the Ordering of Events in a Distributed System." Communications of the ACM 21, no. 7 (July 1978): 558-565.
14 State-machine Simulation

Synchronous vs. Asynchronous Distributed Systems

Stable Property Detection

Distributed Termination

Global Snapshots

Deadlock Detection

Asynchronous Shared-memory Systems
Mattern, Friedemann. "Virtual time and global states of distributed systems." In Parallel and Distributed Algorithms: Proceedings of the International Workshop on Parallel and Distributed Algorithms. Edited by Michel Cosnard et al. Gers, France: Chateau de Bonas, October 1988: pp. 215-226. North Holland, 1989. ISBN: 0444873678. (Reprinted in: Yang, Z., and T. A. Marsland, eds. "Global States and Time in Distributed Systems." IEEE (1994): 123-133.)

Chapter 19
15 Mutual Exclusion

Mutual Exclusion Algorithms
Chapter 9

Chapter 10, sections 10.1 - 10.5
16 Mutual Exclusion Algorithms (cont.) Chapter 10, sections 10.5 - 10.8
17 Bounds on Shared-memory for Mutual Exclusion Chapter 10, sections 10.8 - 10.9

Chapter 11 (Skim)
18 Impossibility of Consensus in Asynchronous, Fault-prone, Shared-memory Systems Chapter 12

Chapter 13, section 13.1
19 Atomic Objects

Atomic Snapshot Algorithms
Chapter 13
20 Atomic Read/Write Register Algorithms

Wait-free Computability

The Wait-free Consensus Hierarchy
Chapter 13

Herlihy, Maurice. "Wait-free synchronization." ACM Transactions on Programming Languages and Systems 13, no. 1 (January 1991): 124-149.
21 The Wait-free Consensus Hierarchy (cont.) Herlihy, Maurice. "Wait-free synchronization." ACM Transactions on Programming Languages and Systems 13, no. 1 (January 1991): 124-149.

Jayanti, Prasad. "Robust wait-free hierarchies." Journal of the ACM 44, no. 4 (1997): 592-614. (Skim)

———. "Wait-free computing." In Distributed Algorithms, 9th International Workshop, WDAG '95, Le Mont-Saint-Michel, France, September 13-15, 1995, Proceedings. Edited by Jean-Michel Hélary, and Michel Raynal. Volume 972 of Lecture Notes in Computer Science, Springer 1995. ISBN: 3540602747.
22 Wait-free vs. $f$-fault-tolerant Atomic Objects Borowsky, Elizabeth, Eli Gafni, Nancy Lynch, and Sergio Rajsbaum. "The BG distributed simulation algorithm." Distributed Computing 14 (2001): 127-146.
23 Asynchronous Network Model vs. Asynchronous Shared-memory Model

Impossibility of Consensus in Asynchronous Networks

Failure Detectors and Consensus
Chapter 17, 21
24 Paxos Consensus Algorithm

Reliable Communication using Unreliable Channels
Lamport, L. "The part-time parliament." ACM Transactions on Computer Systems 16, no. 2 (May 1998): 133-169. Also Research Report 49, Digital Equipment Corporation Systems Research Center, Palo Alto, CA, September 1989.

Chapter 22
25 Reliable Communication using Unreliable Channels (cont.)

Self-stabilizing Algorithms
Chapter 22

Dijkstra's self-stabilization paper (PDF)

Dolev, Shlomi. Self-Stabilization. Cambridge, MA: MIT Press, 2000, chapter 2. ISBN: 0262041782.
26 Atomic Memory Algorithms for Dynamic Networks

Rambo
Lynch, Nancy, and Alex Shvartsman. "Rambo: A reconfigurable atomic memory service for dynamic networks." Technical Report MIT-LCS-TR-856. MIT Laboratory for Computer Science, Cambridge, MA, 2002.

Gilbert, Seth, Nancy Lynch, and Alex Shvartsman. "RAMBO II: Rapidly reconfigurable atomic memory for dynamic networks." Proceedings of the International Conference on Dependable Systems and Networks (DSN) (June 22nd-25th, 2003): 259-268. San Francisco, CA. Also, Technical Report MIT-LCS-TR-890, MIT CSAIL, Cambridge, MA, 2004.

Supplementary Reading List

Other General Distributed Algorithms Textbooks

Attiya, Hagit, and Jennifer Welch. Distributed Computing: Fundamentals, Simulations, and Advanced Topics. New York, NY: Wiley-Interscience, 2004. ISBN: 0471453242.

The Number of Rounds Required for Consensus

Aguilera, Marcos Kawazoe, and Sam Toueg. "A simple bivalency proof that t-resilient consensus requires t+1 rounds." Information Processing Letters 71, no. 3-4 (August 1999): 155-158.

Keidar, Idit, and Sergio Rajsbaum. "On the cost of fault-tolerant consensus when there are no faults - a tutorial." Technical Report MIT-LCS-TR-821 (May 2001). Preliminary version in: "Distributed Computing column." SIGACT News 32, no. 2 (June 2001): 45-63. (Published in May 15th.)

Minimum Spanning Tree Protocol

Gallager, R. G., P. A. Humblet, and P. M. Spira. "A distributed algorithm for minimum-weight spanning trees." ACM Trans Programming Language Syst 5 (1983): 66-77.

Vector TimeStamps

Mattern, Friedemann. "Virtual time and global states of distributed systems." In Parallel and Distributed Algorithms: Proceedings of the International Workshop on Parallel and Distributed Algorithms. Edited by Michel Cosnard et al. Gers, France: Chateau de Bonas, October 1988: pp. 215-226. North Holland, 1989. ISBN: 0444873678. (Reprinted in: Yang, Z., and T. A. Marsland, eds. "Global States and Time in Distributed Systems." IEEE (1994): 123-133.)

Fidge, Colin. "Logical time in distributed computing systems." IEEE Computer 24, no. 8 (August 1991): 28-33.

Impossibility of Consensus

Fischer, Michael J., Nancy A. Lynch, and Michael S. Paterson. "Impossibility of Distributed Consensus with One Faulty Process." Journal of the ACM 32, no. 2 (April 1985): 374-382.

Chaudhuri, Soma. "More Choices Allow More Faults: Set Consensus Problems in Totally Asynchronous Systems." Information and Computation 105, no. 1 (July 1993): 132-158.

Wait-free Computability and the Wait-free Consensus Hierarchy

Herlihy, Maurice. "Wait-free synchronization." ACM Transactions on Programming Languages and Systems 13, no. 1 (January 1991): 124-149.

Jayanti, Prasad. "Robust wait-free hierarchies." Journal of the ACM 44, no. 4 (1997): 592-614.

———. "Wait-free computing." In Distributed Algorithms, 9th International Workshop, WDAG '95, Le Mont-Saint-Michel, France, September 13-15, 1995, Proceedings. Edited by Jean-Michel Hélary, and Michel Raynal. Volume 972 of Lecture Notes in Computer Science, Springer 1995. ISBN: 3540602747.

Lo, Wai-Kau, and Vassos Hadzilacos. "All of us are smarter than any of us: nondeterministic wait-free hierarchies are not robust." SIAM Journal on Computing 30, no. 3 (2001): 689-728.

Wait-free vs. f-fault-tolerant Data Objects

Chandra, T. D., V. Hadzilacos, P. Jayanti, and S. Toueg. "Wait-freedom vs. t-resiliency and the robustness of the hrm hierarchy." Proceedings of the 13th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing. Los Angeles, CA, August 1994, pp. 334-343,

———. "Generalized irreducibility of consensus and the equivalence of t-resilient and wait-free implementations of consensus." To appear in SIAM Journal on Computing.

Borowsky, Elizabeth, Eli Gafni, Nancy Lynch, and Sergio Rajsbaum. "The BG distributed simulation algorithm." Distributed Computing 14 (2001): 127-146.

Lo, W. K., and V. Hadzilacos. "On the power of shared object types to implement one-resilient consensus." Distributed Computing 13, no. 4 (2000): 219-238.

Attie, Paul, Rachid Guerraoui, Petr Kouznetsov, Nancy Lynch, and Sergio Rajsbaum. "Impossibility of boosting distributed service resilience." Technical Report MIT-LCS-TR-982, MIT CSAIL, Cambridge, MA, February 2005.

Reliable Broadcast

Hadzilacos, Vassos, and Sam Toueg. "Fault-tolerant broadcasts and related problems." In Distributed Systems. 2nd ed. Edited by Sape Mullender. Reading, MA: ACM Press and Addison-Wesley, 1993, chapter 5, pp. 97-145. ISBN: 0201624273.

———. "A modular approach to the specification and implementation of fault-tolerant broadcasts." Technical Report TR94-1425. Ithaca NY: Cornell University, Department of Computer Science, May 1994.

Failure Detectors and Consensus

Chandra, Tushar Deepak, and Sam Toueg. "Unreliable failure detectors for reliable distributed systems." Journal of the ACM 43, no. 2 (March 1996): 225-267.

Chandra, Tushar Deepak, Vassos Hadzilacos, and Sam Toueg. "The weakest failure detector for solving consensus." Journal of the ACM 43, no. 4 (July 1996): 685-722.

Gafni, Eli. "A Round-by-Round Failure Detector - Unifying Synchrony and Asynchrony." Proceedings of the 17th Annual ACM Symposium on Principles of Distributed Computing. Puerto Vallarta, Mexico, June 1998, pp. 143-152.

Lamport, L. "The part-time parliament." ACM Transactions on Computer Systems 16, no. 2 (May 1998): 133-169. Also Research Report 49, Digital Equipment Corporation Systems Research Center, Palo Alto, CA, September 1989.

Self-stabilization

Dijkstra, Edsger W. "Self-stabilizing systems in spite of distributed control." Communications of the ACM 17, no. 11 (November 1974): 643-644.

Dolev, Shlomi. Self-Stabilization. Cambridge, MA: MIT Press, 2000. ISBN: 0262041782.

Clock Synchronization

Attiya, Hagit, and Jennifer Welch. Distributed Computing: Fundamentals, Simulations, and Advanced Topics. New York, NY: Wiley-Interscience, 2004. ISBN: 0471453242.

Fan, Rui, and Nancy Lynch. "Gradient Clock Synchronization." Proceedings of the Twenty-third Annual ACM PODC. St. Johns, Newfoundland, Canada, July 2004.

Dynamic Distributed Algorithms

Lynch, Nancy, and Alex Shvartsman. "Rambo: A reconfigurable atomic memory service for dynamic networks." Technical Report MIT-LCS-TR-856. Cambridge, MA: MIT Laboratory for Computer Science, 2002.

Gilbert, Seth, Nancy Lynch, and Alex Shvartsman. "RAMBO II: Rapidly reconfigurable atomic memory for dynamic networks." Proceedings of the International Conference on Dependable Systems and Networks (DSN) (June 22nd-25th, 2003): 259-268. San Francisco, CA. Also, Technical Report MIT-LCS-TR-890, MIT CSAIL, Cambridge, MA, 2004.

Dolev, Shlomi, Seth Gilbert, Nancy Lynch, Alex Shvartsman, and Jennifer Welch. "GeoQuorums: Implementing atomic memory in ad hoc networks." Technical Report MIT-LCS-TR-900a, MIT CSAIL, Cambridge, MA, 2004.

Dolev, Shlomi, Seth Gilbert, Nancy A. Lynch, Elad Schiller, Alex A. Shvartsman, and Jennifer L. Welch. "Virtual Mobile Nodes for Mobile Ad hoc Networks." Proceedings of the 18th International Conference on Distributed Computing (DISC), October 2004.

Dolev, Shlomi, Seth Gilbert, Limor Lahiani, Nancy Lynch, and Tina Nolte. "Timed Virtual Stationary Automata for Mobile Networks." Technical Report MIT-LCS-TR-979a, MIT CSAIL, Cambridge, MA, August 2005.

Aspnes, James, Dana Angluin, Zoe Diamadi, Michael J. Fischer, and Rene Peralta. "Computation in networks of passively mobile finite-state sensors." Proceedings of the Twenty-Third ACM Symposium on Principles of Distributed Computing, July 2004, pp. 290-299. Accepted to Distributed Computing (PODC 2004 special issue), September 2004. Last revised May 2005.