Proving Atomicity: An Assertional Approach
Gregory Chockler, Nancy Lynch, Sayan Mitra, and Joshua Tauber
Proving Atomicity: An Assertional Approach

Gregory Chockler, Nancy Lynch, Sayan Mitra, and Joshua Tauber

MIT CSAIL, The Stata Center, Bldg. 22, 32 Vassar Street, Cambridge, MA 02139, USA
{grishac,lynch,mitras,josh}@csail.mit.edu

Abstract. Atomicity (or linearizability) is a commonly used consistency criterion for distributed services and objects. Although atomic object implementations are abundant, proving that algorithms achieve atomicity has turned out to be a challenging problem. In this paper, we initiate the study of systematic ways of verifying distributed implementations of atomic objects, beginning with read/write objects (registers). Our general approach is to replace the existing operational reasoning about events and partial orders with assertional reasoning about invariants and simulation relations. To this end, we define an abstract state machine that captures the atomicity property and prove correctness of the object implementations by establishing a simulation mapping between the implementation and the specification automata. We demonstrate the generality of our specification by showing that it is implemented by three different read/write register constructions: the message-passing register emulation of Attiya, Bar-Noy and Dolev, its optimized version based on real time, and the shared memory register construction of Vitanyi and Awerbuch. In addition, we show that a simplified version of our specification is implemented by a general atomic object construction based on the Lamport’s replicated state machine algorithm.

1 Introduction

Many distributed and network-based services can be modeled as shared objects accessible to (possibly remote) clients through well-defined interfaces. Atomicity [16, 21] (also known as linearizability [10]) is a desirable property for such objects as it allows clients using the objects to perceive the operations that occur in each run as occurring atomically, in some sequential order. This perception makes it easier to understand the behavior of a system using distributed services, and so, simplifies the task of system design.

Atomic services could be implemented simply on single server machines. However, to achieve high availability in a distributed system and to tolerate failures, atomic services are typically implemented by distributed algorithms. Many distributed algorithms have been proposed for implementing atomic objects; see,
for example, [17, 15, 36, 27, 33, 32, 10, 35, 14, 19, 18, 22, 23, 8]. These use a range of techniques to achieve the appearance of total ordering, for example, assigning timestamps and processing operations in timestamp order, or using quorum configurations.

Although atomic object implementations are abundant, proving that algorithms achieve atomicity has turned out to be a challenging problem. Most existing proofs for such algorithms are long, subtle, and difficult to understand and check. As evidence of the difficulty, we note that several published proofs for implementations of atomic shared read/write memory objects have later been shown to be incorrect. We believe that a fundamental reason for the difficulty of these proofs is their style: they are based on detailed, not very systematic, reasoning about events and their ordering. Useful structure in such proofs is often provided by lemmas about partial orders of operations on objects, for example, Proposition 3 of [16] (for single-writer read/write objects) and Lemma 13.16 of [21] (for multi-writer read/write objects). These lemmas provide sufficient conditions for correctness of atomic read/write object implementations, based on a list of properties that a partial ordering of operations must satisfy. However, showing that these properties hold still requires detailed, ad hoc reasoning about events (see, e.g., [22, 23]).

In this paper, we study systematic ways of verifying distributed implementations of atomic objects, beginning with read/write objects (registers). Our general approach is to replace operational reasoning about events and partial orders with assertional reasoning about invariants and simulation relations. The assertional methods differ from the traditional operational arguments in two important ways. First, the system properties are stated precisely in terms of predicates over the system state components. Second, assertional proofs can be checked by examining individual state transitions of the algorithm without reasoning about entire executions. As such they lend themselves to mechanization, i.e., the process of checking a proof can be carried out using interactive tools, such as theorem provers.

Our approach to carrying out assertional atomicity proofs is first to define an abstract state machine that captures the atomicity property and then, prove correctness of the object implementations by establishing a simulation mapping between the implementation and the specification automata. The challenge is to find a specification automaton that is general enough to apply to many existing implementations, and at the same time sufficiently close to the actual implementations to simplify the task of finding the mapping. One example of an atomicity specification that turned out to be too abstract for carrying out simulation proofs is the canonical atomic object automaton of Section 13.1.2 of [21]. The canonical object automaton maintains a buffer used to store incoming client requests. Buffer requests can later be applied to the object state, and the generated responses are returned to their originators. Unfortunately, this specification, though simple, does not provide sufficient detail to allow for easy match with concrete implementations.
We therefore, give more detailed specifications. Namely, we define an abstract state machine, which we call the Partial-Order Machine (PO-Machine), which records information about operations and their orders in its state. The PO-Machine expresses the common behavior of many existing atomic register implementations, in which client operation requests are gradually ordered relative to other operation requests until all the necessary ordering constraints are achieved. The ordering constructed is, in the limit, guaranteed to be a partial order of the requested operations that satisfies sufficient conditions for showing atomicity.

We use the PO-Machine as a formal specification for distributed algorithms that implement atomic memory. We show that it is implemented by three different read/write register constructions: the message-passing emulation of Atiya, Bar-Noy, and Dolev (ABD) [3] (extended to handle multiple writers as in [23]), an optimized version of ABD that takes advantage of synchronized clocks at writers [8], and the unbounded version of the shared memory construction of a multi-writer/multi-reader register from single-writer/single-reader registers of [36]. We also show that a slight modification of the PO-Machine, called the TO-Machine, can be used to prove atomicity of a general (i.e., not necessarily read/write) object implementation based on the replicated state machine protocol of Lamport [15].

We specify the PO-Machine and the algorithms formally using the I/O Automata (IOA) [20] and Timed IOA [12, 11] models, in fact, using formal specification languages that have been defined for these models. The IOA/TIOA specification languages lead to very stylized assertional proofs for invariants and simulation relations that can be partially automated using theorem provers. Moreover, the same IOA specifications can be used by the IOA compiler [31, 30] to produce executable Java code.

Other related work: Our use of a partial order automaton as an abstract specification was inspired by prior work of Fekete et al. on specifying the behavior of an Eventually Sequentializable Data Service [9]. Their specification used a (different) partial-order machine, which expresses weaker consistency requirements than atomicity. The algorithm studied in [9], based on an earlier algorithm of Liskov et al. [13], was shown to achieve this weaker form of consistency.

The only other published simulation-based atomicity proofs we are aware of are those of Bogdanov [5] (replicated state machine), and Doherty et al. (lock-free queue) [7]. The proofs in both these papers are complicated: They involve multiple levels of abstraction as well as both forward and backward simulations. In contrast, every construction considered in this paper is shown to be atomic by exhibiting a single forward simulation directly from the implementation automaton to a specification automaton.

Another example of using assertional reasoning for proving atomicity is the work by Wang and Stoller [37], which uses static analysis combined with model checking to verify atomicity of code blocks involving lock-free synchronization primitives. A more general discussion of assertional proof techniques can be found in [28].
The rest of the paper is organized as follows: In Section 2, we introduce preliminary definitions and notation used throughout the paper. The sufficient condition for proving atomicity is specified in Section 3. The PO-Machine is described in Section 4. The ABD algorithm is presented and proved correct in Section 5. A time-based version of ABD is discussed in Section 6. Section 7 briefly discusses the proofs of the Vitanyi-Awerbuch’s register construction, and of the Lamport’s replicated state machine. Section 8 discusses future directions. For lack of space, we only outline intuition and highlight basic ideas underlying the correctness proofs. The detailed proofs can be found in the full version of the paper [6].

2 Preliminary Definitions

We use the I/O Automata (IOA) model to formally specify services, describe algorithms and carry out proofs. An I/O automaton is a non-deterministic state machine whose state can change atomically through a discrete transition labeled by a discrete action. The set of the automaton’s actions is called the action signature of the automaton. The actions can be either external or internal. The external actions, which can be either input or output, model interaction with the automaton’s environment, and the internal actions model local computation steps. In Section 6, we also use the Timed I/O Automata (TIOA) model [12, 11], which, in addition to discrete transitions, also allows the automata state to evolve by trajectories, which describe evolution of the state over time.

We use forward simulations to carry out atomicity proofs. Informally, a forward simulation is a relationship between the states of two automata requiring that the transitions of one system can in some sense be mimicked by the other. A precise definition of the simulation formalism can be found in [21].

The read/write service A read/write object (a register) type consists of the following components: (1) an arbitrary set of values $V$ with an initial value $v_0$, (2) the set of operations of the form $write(v)$, $v \in V$, and $read$, (3) the set of responses are $ack$ and $v \in V$, and (4) the sequential specification $f$ such that $f(w, write(v)) = (v, ack)$ and $f(w, read) = (w, v)$.

A read/write service implements a shared read/write register. To access the service, a client issues an operation descriptor consisting of a location identifier $loc$, and an operation identifier $id$. In addition, the write operation descriptor also contains a value $val$. We often refer to an operations descriptor $x$ simply as operation $x$, and denote its various components by $x.loc$, $x.id$, and $x.val$. We denote by $O_w$ and $O_r$ the sets of the write and the read operations respectively, and by $O = O_w \cup O_r$ the set of all operations. For a set $X \subseteq O$, we denote by $X.id = \{x.id : x \in X\}$ the set of identifiers of operations in $X$.

Clients use the actions of the form $request(x)$, $x \in O$, and $response(x, v)$, $x \in O$, $v \in V \cup \{ack\}$, to issue operation requests and receive responses respectively. Given a sequence $\beta$ of the request and response actions, an requested operation $x$ is said to be complete in $\beta$ if $\beta$ contains $response(x, v)$ for some $v \in V \cup \{ack\}$, which we call the return value of $x$. 
We say that $\beta$ is well-formed if there exists a function cause mapping each response event to a preceding request event in $\beta$ so that the following is satisfied: (1) For each response event $e = \text{response}(x, y)$, cause$(e) = \text{request}(x)$ (i.e., responses are not spuriously generated); and (2) cause is one-to-one (i.e., responses are not duplicated)\(^1\).

The following definition will be used throughout the paper: Let $\Pi$ be a set of read and write operations, and $R$ be a binary relation over $\Pi$. For an operation $\pi \in \Pi$ we define $\text{last-prec-writes}(\pi, R) = \{ \omega \in \mathcal{O}_w : (\omega, \pi) \in R \land \not\exists \omega' \in \mathcal{O}_w : (\omega, \omega') \in R \land (\omega', \pi) \in R \}$.

### 3 Atomicity

Atomicity (or linearizability) is specified as a property satisfied by the object implementation traces. It is typically defined in terms of the existence of serialization points for operations so that shrinking the operations to occur at their serialization points results in a valid sequential execution of the read/write register (see, e.g., Chapter 13 of [21], Chapter 9 of [4], or [10]). For our purposes in this paper, it is enough to give a sufficient condition for proving atomicity; this condition is equivalent to the one in Lemma 13.16 of [21].

Let $\beta$ be a well-formed sequence of the actions of the read/write service interface that contains no incomplete operations, and $\Pi$ be the set of operations requested in $\beta$. We say that $\beta$ satisfies Partial Order property (henceforth, referred to as PO) if there exists an irreflexive partial ordering $\prec$ of all the operations in $\Pi$, satisfying the following:

**Property 1 (PO Constraints)**

1. If the response event for $\pi$ precedes the request event for $\phi$ in $\beta$, then $\phi \not\prec \pi$.
2. For any two write operations $\pi$ and $\phi$ in $\Pi$, either $\pi \prec \phi$ or $\phi \prec \pi$.
3. If $\pi$ is a write operation in $\Pi$ and $\phi$ is a read operation in $\Pi$ whose request event follows the response event for $\pi$, then $\pi \prec \phi$.
4. If $\pi$ is a read operation in $\Pi$ and $\phi$ is a read operation whose request event follows the response event for $\pi$, then for each $\omega \in \text{last-prec-writes}(\pi, \prec)$, $\omega \prec \phi$.
5. Let $\pi$ be a read operation in $\Pi$, and $v$ be the value returned by $\pi$. If $\text{last-prec-writes}(\pi, \prec) \neq \emptyset$, then $v = \omega \cdot v_0$ for some $\omega \in \text{last-prec-writes}(\pi, \prec)$. Otherwise, $v = v_0$.

The following lemma is proved in [6]:

**Lemma 1.** $\beta$ satisfies PO iff there exists an irreflexive partial ordering of all the operations in $\Pi$, satisfying the (more restrictive) constraints of Lemma 13.16 of [21].

From the above result and Lemma 13.16 of [21], we obtain:

**Lemma 2.** If $\beta$ is well-formed and satisfies PO, then $\beta$ satisfies atomicity.

\(^1\) Note that our notion of well formedness is weaker than that usually found in the literature as it allows requests from the same location to be issued concurrently.
4 The PO-Machine

In this section we define the Partial-Order Machine. First, we formally specify the environment assumptions of the read/write service. This environment is represented by a single automaton, called Users, whose code could be found in [6]. The Users automaton contains a single variable requested to keep track of the ids of requested operations, in order to avoid repeats. An implementation of the environment would not have such a variable, but would use some other mechanism to ensure unique operation ids (e.g., client id and a counter).

Lemma 3. For \(x, y \in \text{requested}, x = y \Leftrightarrow x.id = y.id\).

The PO-Machine signature and state variables appear in Figure 1, and its transitions appear in Figure 2. This automaton maintains a partial order in its state, represented by variables vertices and edges. Vertices correspond to requested operations, and edges to ordering relationships that have been determined for these operations. When a request arrives, it is put into vertices; later, it becomes classified as ordered, then completed, and finally, responded. Edges may be added at any time from ordered write operations to unordered ones (see action add-edge).

An unordered operation \(\pi\) may become ordered at any time after it has acquired incoming edges from all write operations that completed before \(\pi\) began (i.e., all writes in \(\text{prec}(\pi)\)). This ensures that constraints 1 and 3 of Property 1 hold among all writes, and between writes andreads. Constraint 1 is also trivially preserved among reads as edges originating at read requests are disallowed by the PO signature (see Figure 1). When a write operation \(\pi\) becomes ordered, new edges are inserted to ensure that \(\pi\) is ordered with respect to all previously-ordered write operations (see action order) so that constraint 2 of Property 1 is satisfied.

An ordered operation may become completed at any time: when a read operation \(\phi\) completes, it also forces each write operation \(\pi\) immediately preceding \(\phi\) in the partial order to complete. This ensures that every read operation invoked after \(\phi\) completes will find \(\pi\) in its \(\text{prec}\) set, and will therefore, become ordered only after it has an incoming edge from \(\pi\). This guarantees that constraint 4 of Property 1 is satisfied, and also captures the essence of the “helping” mechanism found in many atomic register implementations.

A completed operation is allowed to return a response. The response returned by a read operation is the value written by the last preceding (in the partial order) write operation, or the initial value if no such write exists (see action response). Thus, constraint 5 of Property 1 is satisfied.

In [6], we prove that the limit of the transitive closure of (vertices, edges), maintained in the derived variable dag, satisfies Property 1. Since every trace of PO-Machine is obviously well-formed, by Lemma 2, PO-Machine implements an atomic register:

Theorem 1. Each trace of the PO-Machine satisfies atomicity.


5 The Attiya, Bar-Noy, and Dolev Algorithm

In this section, we present a distributed wait-free implementation of an atomic multi-writer/multi-reader register based on the well-known message-passing algorithm of Attiya, Bar-Noy, and Dolev [3] (which we call ABD). We prove correctness of ABD by showing that ABD implements PO-Machine, which by Theorem 1, implies that ABD implements an atomic register.

The original ABD protocol implements a wait-free atomic read/write register using a collection of n processes communicating among themselves through reliable point-to-point channels. The implementation is resilient to up to n/2 process crashes. Each process in ABD is responsible for both: handling the client operation requests, and storing and updating the local copy of the register value.
Here, we present a generalized version of ABD where we let the two roles in
the ABD protocol be performed by two classes of agents: clients and replicas.
This design allows for flexibility in assigning roles to actual network locations
thus simplifying the algorithm deployment in real systems. We also use a sepa-
rate client to handle each user request so that the actual clients can handle any
number of requests and in whatever order (for example, requests can be par-
tioned among several threads, or executed sequentially). Our implementation
also supports multiple writers using the technique of [23].

We now describe the ABD implementation (the ABD automaton) in more
detail. Let $P$ be a finite set of replicas. We define a quorum system $Q$ on $P$ to
be the union of a set of write quorums $Q_w$ and the set of read quorums $Q_r$.
$Q_w$ and $Q_r$ are sets of subsets of $P$ such that for each $Q_w \in Q_w$ and $Q_r \in Q_r$,
$Q_w \cap Q_r \neq \emptyset$. The ABD automaton is the composition of the Users automaton
of Section 4, the client automata $C_x$, $x \in Q$, the replica automata $R_p$, $p \in P$,
and the reliable point-to-point channel automata connecting each client $C_x$ with
replica $R_p$ and vice versa. The client’s interface and state variables appear in
Figure 3. The code of the reader client, the writer client and the replica appear in
Figures 4, 5, and 6 respectively. We do not present the specification for the
channel automata as their functionality is obvious.

The value stored at each replica is associated with a tag. Tags are two-field
records consisting of a sequence number $sn$, which is a non-negative integer, and
a request identifier $id$. Tags are ordered lexicographically with the precedence
to the sequence number field.

Clients access read (resp. write) quorums by first sending a message to all
the replicas, and then awaiting responses from a write (resp. a read) quorum.
The request handling at clients involves two rounds of quorum accesses, called
the read phase and the write phase respectively, such that a read quorum is
contacted during the read phase, and a write quorum is contacted during the
write phase. A client keeps track of the request progress through the phases
using the variable status. The operation’s status is initially idle. It is changed
to pending (p) at the beginning of the read phase. It becomes sending (s) at the
beginning of the write phase. It is changed to committed (c) upon completion of
the write phase, and finally to responded (r) after a response is returned.

Specifically, to handle a write request $x$, the client $C_x$ (see Figure 5) performs
a read phase to determine the highest tag $t$ associated with the values stored at
some read quorum. It then performs a write phase to store the value $v$ associated
with tag $(lsn, x, id)$ at a write quorum. It then responds with ack. To handle a
read request $y$, client $C_y$ (see Figure 4) first performs a read phase to determine
the value $v$ associated with the highest tag $t$ among those associated with the
values stored at some read quorum. It then performs a write phase to guarantee
that the pair $(t, v)$ is stored at a write quorum. It then responds with $v$.

The replica’s algorithm (see Figure 6) is simple: In response to a read phase
message, a replica $p$ either responds with its current tag (for write requests), or
the current tag and the value (for read requests). In response to a write phase
message carrying a tag which is bigger than $p$’s current tag, $p$ overwrites its
current tag and the value with those in the message. Otherwise, the \( p \)'s state is left unchanged. In both cases, \( p \) responds with \( ack \).

---

**Fig. 3.** The state and signature of client automata \( C_x \), \( x \in \mathcal{O} \) for ABD.

---

**Fig. 4.** Transitions of reader \( C_x \), \( x \in \mathcal{O} \) for ABD.

**Correctness of ABD:** We now prove that ABD implements an atomic register. Our strategy will be to show that ABD implements PO-Machine by exhibiting a forward simulation from ABD to PO-Machine. In the following, for each \( x \in \mathcal{O} \), we will use subscript \( x \) to refer to the state variables of \( C_x \). It is convenient for the ABD correctness proof to define several derived variables for the ABD automaton. These are summarized in Figure 7.
Input $\text{recv}(x)$

Effect:
\begin{align*}
\text{status} & \leftarrow p \\
\text{for each } p' \in P & \text{ append } (x, 0) \text{ to } \text{repbuffer}_p
\end{align*}

Input $\text{recv}(\text{ack})_{\text{rep}}$

Effect:
\begin{align*}
\text{status} & \leftarrow p \\
\text{mkbuffer} & \leftarrow \text{mkbuffer} \cup \{p\}
\end{align*}

\text{Internal $\text{send}(q)_{\text{rep}}$

Precondition:
\begin{align*}
\text{status} & = q \\
\text{mkbuffer} & \supseteq q
\end{align*}

Effect:
\begin{align*}
\text{status} & \leftarrow \text{nil} \\
\text{mkbuffer} & \leftarrow \text{nil}
\end{align*}

Output $\text{recv}(x, \text{ack})$

Precondition:
\begin{align*}
\text{status} & = \text{nil} \\
\text{mkbuffer} & \supseteq \{p\}
\end{align*}

Effect:
\begin{align*}
\text{status} & \leftarrow \text{nil} \\
\text{mkbuffer} & \leftarrow \{p\}
\end{align*}

Output $\text{send}(x, p, g)$

Precondition:
\begin{align*}
\text{status} & = \text{nil} \\
\text{mkbuffer} & \supseteq \{p\}
\end{align*}

Effect:
\begin{align*}
\text{status} & \leftarrow \text{nil} \\
\text{mkbuffer} & \leftarrow \{p\}
\end{align*}

Fig. 5. Transitions of writer $C_2$, $x \in C_2$ for ABD.

\begin{align*}
\text{Signatures:} & & \text{States:} \\
\text{Input} & & \text{Output} \\
\text{recv}(\text{ack})_{\text{rep}}, x \in C_2, p \in (\text{for} \cup \{\text{ack}\} \cup \{\text{tag} \times V\}) & & \text{send}(\text{ack})_{\text{rep}}, x \in C_2, p \in (\text{for} \cup \{\text{tag} \times V\}) \\
\text{recv}(x, \text{ack})_{\text{rep}}, x \in C_2, p \in (\text{for} \cup \{\text{tag} \times V\}) & & \text{send}(x, p, g) \quad x \in C_2, p \in P, g \in \{0, 1\} \\
\text{recv}(x, p, g), x \in C_2, p \in (\text{for} \cup \{\text{ack} \cup \{\text{tag} \times V\}) & & \text{send}(\text{ack})_{\text{rep}} \quad x \in C_2, p \in (\text{for} \cup \{\text{tag} \times V\})
\end{align*}

\begin{align*}
\text{Transitions:} & & \text{Output} & & \text{States:} \\
\text{Input} & & \text{Precondition} & & \text{Input} & & \text{Output} \\
\text{recv}(\text{ack})_{\text{rep}}, x \in C_2, p & & \text{mkbuffer} \supseteq \{p\} & & \text{recv}(\text{ack})_{\text{rep}}, x \in C_2, p & & \text{mkbuffer} \supseteq \{p\} \\
\text{recv}(x, \text{ack})_{\text{rep}}, x \in C_2, p & & \text{mkbuffer} \supseteq \{p\} & & \text{recv}(x, \text{ack})_{\text{rep}}, x \in C_2, p & & \text{mkbuffer} \supseteq \{p\} \\
\text{recv}(x, p, g) & & \text{mkbuffer} \supseteq \{p\} & & \text{recv}(x, p, g) & & \text{mkbuffer} \supseteq \{p\}
\end{align*}

Fig. 6. Replica automaton $R_p$, $p \in P$ for ABD.

Among these variables, the most interesting one is $\text{min-tag}$ which is used to keep track of the lowest possible tag that could ever be determined by a client at the end of the read phase. At the beginning and before any replica has responded, $\text{min-tag}$ is the smallest tag among the maximum tags carried by replicas in the read phase. As the client is progressing through the read phase it might get a response from a replica whose tag is bigger than the current value of $\text{min-tag}$. In this case, the definition of $\text{min-tag}$ ensures that $\text{min-tag}$ is assigned to that higher value. Finally, upon completion of the read phase, the value of $\text{min-tag}$ is fixed to be the maximum tag received during the phase. The simulation proof relies on the following key property of $\text{min-tag}$:

\textbf{Lemma 4. For each } x \in C, \text{ min-tag}(x) \text{ is non-decreasing.}

The simulation mapping from the states of ABD to the states of the PO-Machine appears in Figure 8. The first four components of the mapping are
- $\text{pending} = \{ x \in \mathcal{O} : \text{status}_x \geq p \}$
- $\text{ordered} = \{ x \in \mathcal{O} : \text{status}_x \geq p \}$
- $\text{completed} = \{ x \in \mathcal{O} : \text{status}_x \leq c \}$
- $\text{responded} = \{ x \in \mathcal{O} : \text{status}_x \geq r \}$
- For $r \in \mathcal{O}_r$:
- $\text{responder}(r) = \{ w \in \mathcal{O}_w \cap \text{ordered} : w \text{ has } \text{tag}_w = \text{tag}_r \}$
- For $x \in \mathcal{O}_r$, $p \in P$:

\[
\begin{align*}
\text{num-tag}(x, p) &= \begin{cases} 
1, & \text{if } \exists y \in V : (y, t) \in \text{read set} \cup \text{channel}_{p,x} \\
\text{tag}_p, & \text{otherwise}
\end{cases} \\
\text{num-min}(x) &= \begin{cases} 
\text{max[\text{tag}_x, \text{min}_Q \text{ Q}], max[\text{num-tag}(x, p) : p \in Q \setminus \text{read set}],} & \text{if } \exists y \in \mathcal{O}_w \text{ such that } x \text{ and } y \text{ are connected} \\
\text{tag}_x, & \text{otherwise}
\end{cases}
\end{align*}
\]

Fig. 7. Derived variables for the ABD automaton

straightforward: All the operations that have ever been requested (indicated by status > $\text{idle}$) are mapped to vertices; the operations that have completed the read phase and acquired final tags (indicated by status > $p$) are mapped to $\text{ordered}$; and the operations that have responded (indicated by status > $c$) are mapped to $\text{responded}$.

The set of edges consists only of edges among operations that have completed their read phases ($\mathcal{O}_r$). The edges among these operations are determined by their tag order and type. Specifically, any two writes $x$ and $y$, such that $\text{tag}_x < \text{tag}_y$, are connected by edge $(x, y)$ ($\mathcal{E}$); and each read $x$ and write $y$ such that $\text{tag}_x = \text{tag}_y$, are connected through edge $(y, x)$ ($\mathcal{E}$). To maintain the mapping for edges, each $\text{re-collected}(x)$ for $x \in \mathcal{O}_r$ is simulated by a sequence of $\text{add-edge}(y, x)$ for each ordered write operation $y$ such that $\text{tag}_y \leq \text{tag}_x$, followed by $\text{order}(x)$ and each $\text{re-collected}(x)$ for $x \in \mathcal{O}_w$ is simulated by a sequence of $\text{add-edge}(y, x)$ for each ordered operation $y$ such that $\text{tag}_y = \text{tag}_x$. No actions involving unordered operations (i.e., the operations with status < $s$) result in adding new edges.

If $f$ is the relation over $\text{status}(\text{PO-Machine}) \times \text{status}(\text{ABD})$ such that each $(x, u) \in f$.

\begin{itemize}
\item $\text{unordered} = s_\text{unordered}$
\item $\text{wordered} = s_\text{wordered}$
\item $\text{ordered} = s_\text{ordered}$
\item $\text{uncompleted} = \bigcup_{r \in \mathcal{O}_r} \text{uncompleted}_{r, \text{wordered}(r)}$
\item $\text{responded} = s_\text{responded}$
\item For all $r \in \text{wordered}$, if $y \in \text{read set}$, then $\text{tag}_y \leq \text{num-min}(x)$
\item $\text{order} \leq \text{wordered} \cap \text{unordered}$.
\item For all $x, y \in \mathcal{O}_r \cap \text{wordered}$, if $(x, y) \in \mathcal{E}$, then $\text{tag}_x < \text{tag}_y$
\item For all $x \in \mathcal{O}_r \cap \text{wordered}$ and $y \in \mathcal{O}_w \cap \text{wordered}$, $(x, y) \in \mathcal{E}$, if $\text{tag}_x = \text{tag}_y$
\end{itemize}

Fig. 8. Forward simulation from ABD to PO-Machine

The most interesting part of the proof is to show that $\text{order}(x)$ becomes enabled once all the $(y, x)$ edges have been added. For that we need to show that the tag acquired by $x$ at the end of the read phase is at least as big as the tag of every operation that had completed before $x$ began. Since at the
end of the read phase, $\minTag(x)$, the necessary enabling condition is
provided by part 8.6 of the mapping that requires that for each $y \in prec(x)$,
$\minTag(y) \leq \minTag(x)$.

To show that 8.6 is maintained throughout the read phase of $x$, $\text{request}(x)$
is simulated by the $\text{request}(x)$ action of the PO-Machine, and each $\text{receive}$
is simulated by the empty sequence. Since at the time $x$ is invoked, the tag of every
$y \in prec(x)$ has been stored at a write quorum of replicas, and because every pair
of write and read quorums intersects, $\min_{Q \in Q_x} \max_{y \in \{tag_x\}} \geq \min_{Q \in Q_x} \max_{y \in \{tag_y\}}$. Hence 8.6
is preserved by $\text{request}(x)$. Finally, since $\minTag(x)$ is non-decreasing (Lemma 4)
and $\text{prec}(x)$ is not affected by any action except $\text{request}$, 8.6 is preserved by
$\text{receive}$. Hence, by the end of the read phase of $x$, for each $y \in prec(x)$,
$\minTag(y) \leq \minTag(x)$ as required.

We argued informally that the mapping in Figure 8 is a forward simulation
from ABD to the PO-Machine. A detailed proof appears in [6].

**Lemma 5.** The mapping in Figure 8 is a forward simulation from ABD to the
PO-Machine.

Since by Theorem 1, each trace of the PO-Machine satisfies atomicity, the
same is true for every trace of ABD:

**Theorem 2.** Each trace of ABD satisfies atomicity.

**Automated Tools Support:** We have used the TIOA to PVS translator and TAME
library [2] to generate descriptions of the PO-Machine and the ABD algorithm
in the language of the Prototype Verification System (PVS) [26]. We used PVS
to substantially increase the level of detail and assurance of some of our previous
hand proofs. In fact, we discovered several gaps and bugs in our hand proofs.
Automatic translation enabled us to easily tweak the simulation relations and rerun
the proof scripts. We also used the IOA code generator tool [31, 30] to compile
the verified ABD automaton into an executable Java code. This way,
a single formal representation of the ABD algorithm was used for specification,
verification, and execution.

### 6 Timed ABD

In this section, we present an optimized version of the ABD protocol, called
Timed-ABD, that takes advantage of perfectly synchronized clocks at the writers
to eliminate the read phase of the write implementation (see [8]).

The Timed-ABD is the composition of the following timed automata: the
replica and reader client automata in Figures 6 and 4 respectively augmented
with arbitrary trajectories that keep their state unchanged; and the writer client
automata whose code appears in Figure 9. To model synchronized clocks, each
writer maintains a local variable $clock$ whose trajectory is $d(clock) = 1$ (i.e.,
the clock value grows continuously, at the same rate as the real time).

The writer algorithm is as follows: To write a value, the writer first takes its
current clock reading, and then delays its execution until its clock exceeds the
### Fig. 9. Writer client $C_x, x \in C_w$ for Timed-ABD

Initial reading. The second clock reading is used as the tag with which the client performs the write phase.

The simulation mapping from the states of Timed-ABD to the states of Timed-PO (i.e., the PO-Machine augmented with arbitrary trajectories that do not change its state) appears in Figure 10. To see that the mapping is preserved, we observe that a write operation becomes ordered once it is verified that a non-zero amount of time has elapsed since it was requested. We therefore simulate each Timed-ABD trajectory corresponding to a non-zero time interval by a trajectory of Timed-PO of the same length, followed by a sequence of *add-edge* actions, followed by *order*. The rest of the simulation proof is straightforward (see [6] for details).

\[ f \] is the relation over states($\text{Timed-ABD}$) x states($\text{Timed-PO}$) such that $(s, w) \in f$ iff

\[ \text{if } x \in \text{wordered} \cap \text{C}_{\text{ABD}} \text{, then } s_{\text{ABD}} \leq s_{\text{PO}}(x) \]

\[ \text{if } y \in \text{wordered} \cap \text{C}_{\text{ABD}}, \text{ then } s_{\text{ABD}}(y) \leq s_{\text{PO}}(x) \]

\[ \text{if } x \in \text{wordered} \text{, then } s_{\text{ABD}}(x) < s_{\text{PO}}(x) \]

\[ \text{if } y \in \text{wordered} \text{, then } s_{\text{ABD}}(y) < s_{\text{PO}}(y) \]

### Fig. 10. Forward simulation from Timed-ABD to Timed-PO
7 Other Algorithms

We discuss briefly how to prove atomicity of the unbounded multi-writer/multi-reader register construction of Vitanov and Awerbuch [36] (referred to henceforth as VA), and of a general atomic object implementation based on the replicated state machine algorithm of Lamport [15] (referred to henceforth as RSM).

First, we observe that VA can be recast as a special case of ABD with the write quorums being the rows and the read quorums being the columns of the matrix. Therefore, the simulation proof of VA is almost identical to that of ABD. In particular, it is easy to see that the simulation from ABD to PO-Machine in Figure 8 is also a forward simulation from VA to PO-Machine.

To prove atomicity of RSM, we use a simplified version of the PO-Machine, called TO-Machine. The TO-Machine constructs a single total order of all the requested operations. In particular, every operation becomes ordered only after it is ordered relative to all the other ordered operations. The TO-Machine is parameterized by the emulated object sequential specification and initial state which are used to compute responses. The simulation proof is based on the observation that in RSM, an operation \( x \) becomes ordered once the local timestamp at each replica becomes greater than that of \( x \). The full proof appears in [6].

8 Conclusions and Future Work

Our work with four algorithms so far suggests to us that our PO-Machine (or small variants) may be general enough to capture many of the existing atomic register algorithms. We plan to use these methods to study a wider variety of algorithms, such as bounded-timestamp-based constructions (see e.g., [34]), whose proofs have been notoriously difficult and bug-prone. An interesting challenge will be to extend our framework to capture implementations that are not explicitly based on timestamps, for example, the construction that creates atomic bits from safe bits [32]. Another interesting direction deals with adapting the PO-Machine to capture weaker register semantics, such as safe registers, regular registers (including the multi-writer regular registers of Welch [29]), and sequentially consistent registers. There is an increased recent interest in these semantics as they capture the guarantees provided by many Byzantine-resilient storage systems [24, 25, 1] based on Byzantine quorum systems [24].

Yet another interesting application domain for our techniques is the verification of multi-threaded programs based on lock-free synchronization primitives (such as CAS, LL/SC, etc.). This area has recently been receiving an increased attention due to the growing popularity of multi-processor computing platforms, and the introduction of lock-free synchronization primitives into the Java concurrency package.

Finally, we are interested in identifying common patterns behind many diverse implementations of atomic objects. This will make it easier to understand and compare different algorithms. We expect that such patterns should be expressible in terms of common specification automata (e.g., a unified version of the PO- and TO-Machines).
References


