Strong Logic for Weak Memory: Reasoning About Release-Acquire Consistency in Iris

Jan-Oliver Kaiser, Hoang-Hai Dang, Derek Dreyer, Ori Lahav, and Viktor Vafeiadis

MPI-SWS, Saarbrücken and Kaiserslautern, Germany∗
{janno,haidang,dreyer,orilahav,viktor}@mpi-sws.org

Abstract

The field of concurrent separation logics (CSLs) has recently undergone two exciting developments: (1) the Iris framework for encoding and unifying advanced higher-order CSLs and formalizing them in Coq, and (2) the adaptation of CSLs to account for weak memory models, notably C11’s release-acquire (RA) consistency. Unfortunately, these developments are seemingly incompatible, since Iris only applies to languages with an operational interleaving semantics, while C11 is defined by a declarative (axiomatic) semantics. In this paper, we show that, on the contrary, it is not only feasible but useful to marry these developments together. Our first step is to provide a novel operational characterization of RA+NA, the fragment of C11 containing RA accesses and “non-atomic” (normal data) accesses. Instantiating Iris with this semantics, we then derive higher-order variants of two prominent RA+NA logics, GPS and RSL. Finally, we deploy these derived logics in order to perform the first mechanical verifications (in Coq) of several interesting case studies of RA+NA programming. In a nutshell, we provide the first foundationally verified framework for proving programs correct under C11’s weak-memory semantics.

1 Introduction

Separation logic [24] is a refinement of Hoare logic with an intrinsic notion of ownership: whereas an assertion in Hoare logic denotes a fact about the global machine state, an assertion in separation logic denotes ownership of (and knowledge about) a piece of that state, and the separating conjunction $P \ast Q$ denotes that the assertions $P$ and $Q$ own disjoint pieces of state. This ownership reading of assertions is useful for giving “local” (or “small-footprint”) specifications for primitive commands, which are much easier to compose soundly into specifications for larger programs. Moreover, as O’Hearn was the first to observe [23], separation logic is also eminently suitable for concurrent programs. In particular, ownership provides a direct and convenient way of explaining how synchronization mechanisms serve to transfer control of shared state between threads. Although O’Hearn’s original concurrent version of separation logic was geared toward reasoning about coarse-grained synchronization via semaphores, the subsequent decade of research into concurrent separation logics (CSLs) has shown that ownership and separation are just as useful for reasoning about more fine-grained

∗ Saarland Informatics Campus.

© Jan-Oliver Kaiser, Hoang-Hai Dang, Derek Dreyer, Ori Lahav, and Viktor Vafeiadis;
licensed under Creative Commons License CC-BY.

Editor: Peter Müller; Article No. 17; pp.17:1–17:50
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
and low-level synchronization mechanisms, such as those employed in the implementations of non-blocking data structures [34, 10, 6, 31, 29, 22, 5].

In this paper, we consider two of the most recent, boundary-pushing developments in concurrent separation logics: (1) the Iris framework for encoding and unifying advanced higher-order CSLs and formalizing them in Coq [13, 12, 15, 16], and (2) the adaptation of CSLs to account for weak memory models, notably C11’s release-acquire (RA) consistency [33, 32, 7, 19]. Although these developments have thus far (for reasons explained below) appeared to be incompatible, we show that in fact they are not! Quite the contrary: we demonstrate that it is not only feasible but useful to marry them together, and in so doing, provide the first foundationally verified framework for proving programs correct under C11’s weak memory semantics.

### 1.1 Iris: A Unifying Framework for Concurrent Separation Logics

After O’Hearn’s original CSL, there came a steady stream of “new and improved” CSLs appearing on at least a yearly basis. Unfortunately, as these new CSLs grew ever more expressive, they also grew increasingly complex, baking in increasingly sophisticated proof rules as primitive, with the relationships and compatibility between different proof rules (e.g., whether they could be soundly combined in one logic) remaining unclear.

The central source of complexity in most existing CSLs lies in their mechanisms for controlling interference between threads accessing shared state, which have evolved from Jones’s rely-guarantee [11] to the much more sophisticated and elaborate protocol mechanisms appearing in logics like CaReSL [31], iCAP [29], and TaDA [5]. In an attempt to consolidate the field, Jung et al. developed Iris [13, 12, 15], a logic with the express goal of showing that even the fanciest of these interference-control mechanisms could be encoded via a combination of two orthogonal “off-the-shelf” ingredients: (1) partial commutative monoids (PCMs) for formalizing protocols on shared state, and (2) invariants for enforcing them. Invariants are an old and ubiquitous concept in program verification, and PCMs have been used in a number of prior logics to represent different kinds of ghost (or auxiliary) state i.e., logical state that is manipulated as part of the proof of a program but is not manipulated directly by the program itself. Jung et al.’s observation was that in fact these two simple mechanisms are all you need: using just PCMs and invariants, one can derive a variety of powerful forms of protocol-based reasoning from prior CSLs within Iris, and by virtue of working in a unified framework, these derived mechanisms are automatically compatible (different mechanisms can be used soundly to verify different modules in a program). Iris also goes beyond most prior CSLs by supporting higher-order quantification and impredicative invariants—invariants that can talk recursively about the existence of (other) invariants—which are crucial for reasoning about languages with higher-order state (e.g., Rust).

In the past, the complexity of CSLs was further exacerbated by the fact that (until very recently [26]) they only supported manual and error-prone “pencil-and-paper” proofs. The initial version of Iris [13] was no exception: the soundness of the core logic was verified in Coq, but the Coq development provided no support for using the logic (either to encode other logics or to verify programs interactively). However, in the past year, Krebbers et al. [16] have developed IPM, an interactive proof mode geared toward using Iris as a proof development environment for verifying concurrent programs within Coq. With IPM, Iris has begun the transition to a more practically useful proof tool, and is already being deployed effectively for larger verification efforts, e.g., in the RustBelt project [9].
1.2 Separation Logics for Release-Acquire Consistency

Iris is a “generic” logical framework in that it is parameterized over the programming language in question—it merely requires, like the vast majority of prior work on concurrent program verification, that the language have an operational, interleaving semantics, typically known as a sequentially consistent (SC) semantics [20]. Under SC, threads take turns accessing the shared memory, and updates to memory are immediately visible to all other threads.

SC semantics has the benefit that it is easy to define and manipulate formally, but it is also woefully unrealistic: no serious language guarantees a fully SC semantics, because of the significant performance costs associated with maintaining the fiction of a single, globally consistent view of memory on modern multi-core architectures. One of the reasons for this discrepancy between the theory and the reality of concurrent programming is that, until relatively recently, formal accounts of more realistic—so-called weak (or relaxed)—memory models for concurrent programming languages were not available. However, in the past decade, great progress has been made on formalizing weak memory models, with a notable high point being the formalization of the C/C++11 memory model (hereafter, C11) [3].

In response to this development, a number of verification researchers have followed suit by building new verification tools—program logics, model checkers, testing frameworks, etc.—that account for these more realistic memory models. In particular, Vafeiadis and collaborators have thus far developed several different separation logics for C11, including RSL [33] and GPS [32]. The main focus of these logics is on RA+NA, an important fragment of C11 consisting of release-acquire (RA) accesses and non-atomic (NA) accesses. RA accesses are useful because they support a common idiom of message-passing synchronization at low cost compared to SC. NA accesses are intended for “normal” data accesses and are even more efficiently implementable than RA accesses, with the proviso that they are not permitted to race (i.e., races on non-atomics cause the entire program to have undefined behavior).

A major challenge that Vafeiadis et al. had to overcome was the fact that C11 is defined using a radically different semantics than SC. Specifically, it is defined by a declarative (or axiomatic) semantics, in which the allowed behaviors of a program are defined by enumerating candidate executions (represented as “event graphs”) and then restricting attention to the executions that obey various coherence axioms. In building separation logics for C11, Vafeiadis et al. were thus not able to use the standard model of Hoare-style program specifications from prior separation logics because notions like “the machine states before and after executing a command C” do not have a clear meaning in C11’s declarative semantics.

To account for this radically different type of semantics, they were instead forced to essentially throw away the “separation-logic textbook” and come up with an entirely new, non-standard model of separation logic in terms of predicates on event graphs. While groundbreaking, this approach has had several downsides. Firstly, certain essential mechanisms of SC-based separation logic (such as ghost state), which are easy to justify in standard models, became very difficult to justify in the new event-graph-based models of RA+NA logics. Secondly, the complexity of these new models has made them challenging to adapt and extend, and their non-standard nature has posed a major accessibility hurdle for researchers accustomed to traditional models of separation logic. Last but not least, although the soundness of these logics has been verified formally in Coq, there has thus far been no tool support for using the logics to prove programs correct under RA+NA semantics.
1.3 Our Contributions

Given our above description, it may seem that the Iris framework’s reliance on interleaving semantics renders it fundamentally inapplicable to reasoning about C11’s weak-memory semantics. In this paper, we show that this is not the case at all—not only is it possible to derive RA+NA logics like GPS and RSL within Iris, but there are several tangible benefits to doing so. Deriving such logics within Iris:

- Lets us take advantage of the rich features of the Iris host logic (e.g., separation, invariants) when proving soundness of the derived logics, thereby significantly lifting the abstraction level at which those soundness proofs are carried out (compared to prior work).
- Allows us to support some very useful features in our derived logics by directly importing them from Iris. Such features include PCM-based ghost state, higher-order impredicative quantification, and Iris’s interactive proof mode in Coq. By virtue of being encoded in Iris, our derived logics inherit these features for free.
- Makes it easy to experiment with the derived logics and quickly develop new and useful extensions (e.g., single-writer protocols, see below).
- Makes it possible to soundly compose proofs from different derived logics, since they are all carried out in the uniform framework of Iris.

Our first step (§2) is to avoid the essential complicating factor—C11’s declarative account of RA+NA—and instead work with an operational account. Building closely on Lahav et al.’s recently proposed “strong release-acquire” (SRA) semantics [18], we define a novel, operational, interleaving semantics for RA+NA. Our operational account of the RA fragment of the language is very similar to Lahav et al.’s operational account of SRA in that it models writing and reading of memory via the sending and receiving of timestamped messages; the main difference is that the RA rule for assigning timestamps is slightly more liberal. Our account of NA is new, though; it uses timestamps to model races on non-atomics as stuck (unsafe) machine states. We have proven that, under the reasonable restriction that programs do not mix RMWs (atomic updates) and non-atomic reads at the same location, our semantics is equivalent to the standard declarative semantics of the RA+NA fragment of C11.

Next, since our new semantics for RA+NA is an interleaving semantics, we can instantiate Iris with it. In §3, we review the basic reasoning mechanisms of Iris, and show how to use them to derive small-footprint proof rules for reasoning about RA+NA programs. We apply these rules to verify a simple message-passing example of RA+NA programming in Iris. However, as will become clear, reasoning directly with the Iris primitive mechanisms is rather too low-level and a more abstract logic is needed.

In §4, we present iGPS, a higher-order variant of Turon et al.’s GPS logic [32], which supports much higher-level reasoning about RA+NA programs. Unlike the original GPS, iGPS is derived within Iris on top of the small-footprint proof rules from §3. It also extends GPS with single-writer protocols, an extremely useful feature that simplifies proofs of RA+NA programs in the common case where there are no write-write races on atomic accesses.

In §5, we briefly describe some other contributions, including iRSL, a higher-order variant of RSL [33] derived within Iris, and several case studies that we have verified using iGPS and iRSL in Coq. These examples showcase one of the major advantages of working in the Iris framework: our ability to verify weak-memory programs, foundationally and mechanically, with the same degree of ease that was previously only possible for SC programs.

Finally, in §6, we conclude with related work.
2 Release-Acquire and Non-Atomics

In this section, we introduce our operational semantics for RA+NA, which we then use as the machine for our working language $\lambda_{RN}$. Subsequent sections will show how to build a logic for $\lambda_{RN}$ using Iris.

C11 provides several memory access modes, each ensuring a different degree of consistency. In this paper we focus on RA+NA, the fragment of C11 consisting only of release-acquire (RA) and non-atomic (NA) accesses. Non-atomic accesses (which we denote with “[na]”) are the default type of memory accesses, intended to be used for normal data rather than for synchronization. Thus, C11 forbids any data races on non-atomic accesses, and programs that may have such races are considered buggy (they have undefined semantics). In contrast, RA accesses (which we denote with “[at]” for atomic) are permitted to race, but provide just enough consistency guarantees to enable the well-known message passing (MP) idiom:

\[
x_{[\text{na}]} := 0; y_{[\text{na}]} := 0; \\
x_{[\text{na}]} := 37; \quad \text{repeat } y_{[\text{at}]}; \\
y_{[\text{at}]} := 1 \quad x_{[\text{na}]}^{}
\]

Initially, both variables $x$ and $y$ are set to 0. The first thread will initialize $x$ to 37 (non-atomically) and then set the variable $y$ to 1 (via a release write) as a way of sending a message to the second thread that $x$ has been properly initialized and is ready for consumption. The second thread will repeatedly read $y$ (via an acquire read) until it observes $y \neq 0$, at which point—thanks to release-acquire semantics—it will know that it can safely access $x$. Summing up, the use of RA here ensures that the non-atomic write to $x$ in the first thread “happens before” the non-atomic read of $x$ in the second thread—i.e., that they do not race—and furthermore that the read of $x$ will return 37.

The formal semantics of RA+NA is “declarative”, formulated as a set of constraints on execution graphs. We will instead now present an alternative operational semantics of RA+NA. Our operational semantics is not completely coherent with C11’s for programs that mix atomic and non-atomic accesses to the same location (although the semantics of such programs is already known to be problematic [2]—see §6 for further discussion of this point). However, for the large class of programs that do not mix atomic updates (like CAS) and non-atomic reads at the same location, our semantics is provably equivalent to C11’s declarative semantics. This class of programs includes all C11 programs considered (and verified) in this paper. (For formal details of the correspondence between our semantics and C11’s, see Appendix A.) We will first start with the pure RA fragment, and then add a “race detector” for non-atomic accesses.

2.1 Release-Acquire

Our operational semantics for RA starts from the observation that in RA—in contrast to a standard heap language—different threads have a different view of what the state is. Accordingly, we need to keep track of past write events as they might still be relevant for some subset of threads. Moreover, we need to keep writes to the same location in a total order enforced in C11 under the name modification order (mo for short). Finally, we also need to keep track of each thread’s “progress” in terms of which writes are visible to it, as this determines what a thread may read and where its writes may end up.

For the $\text{mo}$ order, the RA machine manages for each location a totally ordered set of timestamps $t \in \text{Time} \triangleq \mathbb{N}$. Each write of some value $v$ to a location $\ell$ gets assigned a
The full semantics also supports allocation, which induces an allocation value \( A \). We do not mention it here for the sake of simplicity.

1 Here, we use the join operator \( \sqcup \) on views: \( (V_1 \sqcup V_2)(l) = \max\{V_1(l), V_2(l)\} \) if \( l \in \text{dom}(V_1) \cap \text{dom}(V_2) \); \( (V_1 \sqcup V_2)(l) = V_1(l) \) if \( l \in \text{dom}(V_1) \setminus \text{dom}(V_2) \); and \( (V_1 \sqcup V_2)(l) \) is undefined if \( l \notin \text{dom}(V_1) \cup \text{dom}(V_2) \).

2 A write (\textsc{Thread-Write}) picks an unused timestamp \( t \) for location \( l \) that is greater than the thread’s view of \( l \), updates the thread’s view to the new view \( V’ \) that includes \( t \), and adds a corresponding message to the memory. A read (\textsc{Thread-Read}) incorporates the view \( V’ \) of the message that it reads into the thread’s own view. Note that the message being read

\[\begin{align*}
\text{Thread-Read} & \quad (\ell, v, t, V) \in M \quad T(\pi)(\ell) \leq t \\
(M, T) \xrightarrow{(\text{Read}, \ell, v)}^* (M, T[\pi \mapsto T(\pi) \sqcup V])
\end{align*}\]

\[\begin{align*}
\text{Thread-Write} & \quad \neg \exists \ell’, \nu, \nu’, t, V \in M \quad T(\pi)(\ell) < t \quad V’ = T(\pi)[\ell \mapsto t] \\
(M, T) \xrightarrow{(\text{Write}, \ell, v)}^* (M \cup \{(\ell, v, t, V)\}, T[\pi \mapsto V'])
\end{align*}\]

\[\begin{align*}
\text{Thread-Update} & \quad \neg \exists \ell, \nu, t, V, (\ell, v, t + 1, V) \in M \quad V' = T(\pi)[\ell \mapsto t + 1] \sqcup V \\
(M, T) \xrightarrow{(\text{Update}, \ell, v, \nu)}^* (M \cup \{(\ell, v, t + 1, V')\}, T[\pi \mapsto V'])
\end{align*}\]

\[\begin{align*}
\text{Thread-Fork} & \quad \rho \notin \text{dom}(T) \\
(M, T) \xrightarrow{(\text{Fork}, \rho)}^* (M, T[\rho \mapsto T(\pi)])
\end{align*}\]

\[\begin{align*}
\text{Thread-Uninitialized} & \quad \varepsilon \in \{(\text{Read}, \ell, v), (\text{Update}, \ell, v_\nu, v_n), (\text{Fork}, \rho)\} \\
(M, T) \xrightarrow{}^* \bot_{\text{uninit}}
\end{align*}\]
is required to have a timestamp that is not smaller than the thread’s view of the relevant location. Updates (Thread-Update) combine reading and writing in one step. In addition, updates must “pick” $t + 1$ as a timestamp for the new message, where $t$ is the timestamp of the read message. This implies, in particular, that two different updates cannot read the same message, and corresponds to C11’s atomicity condition, which requires every update to read from its $mo$-immediate predecessor. Thread-Fork adds a new thread whose view is copied from its parent. Finally, Thread-Uninitialized detects reads from uninitialized locations, and moves to the error state $⊥_{uninit}$.

**Functional correctness of MP**

With the operational semantics of RA, we can now sketch why MP (assuming for now that all its accesses are RA) is functionally correct, i.e., why the read of $x$ by the second thread will return 37 when the program terminates. The write of 37 to $x$ is recorded at a view $V_{37}$, which is then included in the view $V_1$ of the write of 1 to $y$ by the first thread. When the second thread reads 1 from $y$, its local view is updated to incorporate $V_1$ (and thus also $V_{37}$). A read from $x$ is now guaranteed to read from the message setting $x$ to 37 or from a more recent one, but no more recent one exists. Consequently, the return value will be 37.

2.2 **Non-Atomics**

Formally, C11 defines a data race as two memory accesses to the same location—of which at least one is a write and at least one is non-atomic—that are not ordered by “happens-before.” A program that exhibits data races in some of its execution graphs is called racy, and its behavior is considered undefined. We now show how to account for non-atomics and data races in the context of our operational semantics.

Let us first extend the set of physical states by another error state $⊥_{race}$, whose intent is captured by the following correspondence: a program is racy if and only if at least one of its machine executions can reach $⊥_{race}$ (stated and proved formally in Appendix A).

To detect data races during the execution of a program, we add an additional component to the physical state: the non-atomic view $N$, which tracks the timestamp of the most recent non-atomic write to every location. Then, we place the following restrictions on all atomic and non-atomic operations (if violated, the program will enter the $⊥_{race}$ state):

- To perform any access (atomic or non-atomic) to a location $\ell$, a thread $\pi$ must have observed the most recent non-atomic write to $\ell$, i.e., $N(\ell) \leq T(\pi)(\ell)$.
- A thread $\pi$ can only perform a non-atomic read from a location $\ell$ if it has observed the most recent (atomic or non-atomic) write to $\ell$, i.e., $\not\exists t, (\ell, \_t, t, \_) \in M. T(\pi)(\ell) < t$.

In addition to these restrictions, we require non-atomic writes to pick timestamps greater than all existing timestamps of messages of the same location. Intuitively, these restrictions enforce that each non-atomic write to $\ell$ starts a new “era” in $\ell$’s timestamps, after which any attempt to access writes from a previous era (or to write with a timestamp from a previous era) constitutes a race. Note that there is an asymmetry between non-atomic reads and writes: non-atomic writes to $\ell$ are allowed even when the thread has not observed the most recent write to $\ell$, as it is only required to observe the most recent non-atomic write to $\ell$.

One might fear that this fails to detect the case when a non-atomic write is racing with a concurrent atomic write (and the atomic write happens first); but in this case the race will be detected in a different execution where the non-atomic write happens first (and the atomic write enters the $⊥_{race}$ state), so the program will nevertheless be declared racy.
Figure 2 Per-thread reductions for the RA+NA machine.

Revisiting MP, we note that it is safe to have non-atomic accesses to \( x \): the write is performed while the left thread is necessarily aware of the most recent non-atomic write to \( x \) (the initialization); and the read is performed while the right thread is necessarily aware of the most recent non-atomic write to \( x \), whose timestamp was incorporated into the right thread’s view when it read \( y = 1 \).

Figure 2 presents the full operational semantics. It is based on the following definition of a physical state.

**Definition 2 (Physical State).**

Let \( \sigma \in \Sigma \triangleq \left( \mathcal{P}(\text{Msg}) \times \text{ThreadId} \overset{\text{fin}}{\rightarrow} \text{View} \right) \times \{ \bot, \bot_{\text{uninit}} \} \) represent physical machine states. We write \( M, T, \) and \( N \) to denote the components of a non-error state. The initial physical state, denoted \( \sigma_{\text{init}} \), is given by \( (\emptyset, [0 \mapsto \emptyset], \emptyset) \).

2.3 The \( \lambda_{\text{RN}} \) language

\( \lambda_{\text{RN}} \) is a standard lambda calculus with recursive functions, forks, and references with atomic and non-atomic accesses. The \text{repeat} construct that we have used in MP can be defined in terms of recursive functions. The interesting part of the language and its expression reductions is given in Figure 3. The expression reduction relation \((e \overset{\gamma}{\rightarrow} e', \sigma_f)\) has four components: the original expression \( e \), an (optional) machine memory event \( \varepsilon \), the resulting
expression $e'$, and a list of newly created threads $\pi_f$. Only the rule for $\textsf{fork } e$ creates a new thread (i.e., a singleton list $[e]$), while all other reductions produce an empty list (i.e., nil).

The per-thread language reductions $(\sigma; e; e' ; \ell, \pi_f)$ are then the combination of the expression reductions and the machine reductions, given by the $\textsc{Combined-}^\pi$ rules in Figure 4. Non-stateful reductions ($\textsc{Combined-Pure}$) simply defer to the expression reductions, while stateful reductions ($\textsc{Combined-Mem}$ and $\textsc{Combined-Fork}$) use the event label $\varepsilon$ and the thread id $\pi$ to tie the expression and machine reductions together correctly. These per-thread reductions then are lifted in a straightforward manner to the full (threadpool) reductions.

### Figure 3

- $\textsc{Combined-Mem}$
- $\textsc{Combined-Fork}$

### Figure 4

- $\textsc{Threadpool-Red-Pure}$
- $\textsc{Threadpool-Red-Mem}$
- $\textsc{Threadpool-Red-Fork}$

---

### 3 Iris

Iris is a generic framework for constructing concurrent separation logics. One can instantiate the framework with any language that has an operational interleaving semantics, and then easily derive time-tested reasoning principles for one’s target logic, including various “protocol” mechanisms for controlling interference. Figure 5 provides an excerpt of Iris syntax.
Iris supports the common connectives (False, True, ⇒, ∧, ∨, →, *, , ∃, ∀, μ) and proof rules standard in higher-order separation logics. Iris’s extended set of constructs includes physical state ownership Phys(σ), ghost state ownership |[σ]|, the later □ and always □ modalities, invariants |P|, view shifts P N1 ⇒ N2 Q, and Hoare triples {P} e {x. Q} N. We will first explain these constructs via a running example, in which we use Iris to verify the MP example in a simple, sequentially consistent language called λSC. This will not only illustrate how one can derive within Iris a target logic for a language defined by an operational semantics, but will also serve as a warm-up for our subsequent explanation of how we can instantiate Iris to reason about weak memory.

Road map

The process of instantiating Iris to derive new logics follows a simple pattern, which is worth articulating up front:

1. When we first instantiate Iris, the only primitive assertion we get about the state of the program is the physical state ownership assertion Phys(σ), which asserts that σ is the current global state of the machine. Together with this assertion we also get for free a bunch of large-footprint specifications for the primitive commands of the language, based directly on their operational semantics. For example, the primitive specification we get for updating a location in λSC will be {Phys(σ)} ℓ := v {Phys(σ[ℓ → v])}.

2. Of course, one of the main points of separation logic is to be able to reason modularly using local assertions about the machine state, such as the points-to assertion, ℓ −→ v, and correspondingly give small-footprint specifications of the primitive commands, such as {ℓ −→ w} ℓ := v {ℓ −→ v}. In Iris, such local assertions are not baked into the logic, but rather are encodable using ghost state ownership assertions, and the user of the logic has a great deal of flexibility concerning how these assertions are defined. In the case of the points-to assertion, ℓ −→ v, we will define this assertion so as to represent the knowledge that ℓ currently points to v and the rights to read and write ℓ.

3. On its own, a local, user-defined ghost state assertion like the points-to assertion is merely a representation of knowledge and rights. In order to give meaning to such a ghost state assertion—i.e., to make sure it is in sync with the primitive physical state assertion—we establish an invariant tying the assertions together. In the case of the points-to assertion, this invariant will enforce that when a thread owns the ghost state assertion ℓ −→ v, its “knowledge” that ℓ currently points to v in the physical machine state is actually correct.

In short, ghost state assertions represent local knowledge and rights concerning the machine state, and invariants enforce that ghost state assertions mean what they say they mean.

3.1 Iris by Example

Our example programming language λSC is a standard lambda calculus with references. It is basically the same as λRN, except that all accesses are sequentially consistent, and races are permitted (they do not induce stuckness). The language’s physical state is a heap σ, which is a finite map from allocated locations to values. The main heap-related reductions of λSC are given in Figure 6. When we instantiate Iris with λSC’s operational semantics,

\[(\ell, \sigma) \rightarrow (v, \sigma) \quad \text{if} \quad \sigma(\ell) = v\]
\[(\ell := v, \sigma) \rightarrow ((\), \sigma[\ell \mapsto v]) \quad \text{if} \quad \ell \in \text{dom}(\sigma)\]

\(\ldots\)

**Figure 6** Main heap-related reductions of the \(\lambda_{SC}\) language.

(as explained in the above road map) what we get automatically from Iris are the following *large-footprint* Hoare triples concerning the physical state ownership assertion \(\text{Phys}(\sigma)\):

- **Phys-Heap-Read**
  \[
  \{\text{Phys}(\sigma) \ast \sigma(\ell) = v\} \! \ell \{z. \ z = v \ast \text{Phys}(\sigma)\}
  \]
- **Phys-Heap-Write**
  \[
  \{\text{Phys}(\sigma)\} \ell := v \{\text{Phys}(\sigma[\ell \mapsto v])\}
  \]

Note that \(z\) in the first triple binds the return value of the expression \(\! \ell\). In the second triple, the expression returns the unit value, so we elide the binder.

### 3.1.1 Encoding Separation Logic for \(\lambda_{SC}\)

We would now like to encode these *small-footprint* Hoare triples for \(\lambda_{SC}\):

- **Heap-Read**
  \[
  \{\ell \mapsto v\} \! \ell \{z. \ z = v \ast \ell \mapsto v\}
  \]
- **Heap-Write**
  \[
  \{\ell \mapsto w\} \ell := v \{\ell \mapsto v\}
  \]

The first step is to define the points-to assertion, \(\ell \mapsto v\), using Iris’s *ghost state*.

**Ghost state and partial commutative monoids**

Ghost state is non-physical state that is only used as part of a program verification but is not itself part of the machine state. In Iris, ghost state is formalized using partial commutative monoids (PCMs).\(^3\) The assertion \(a : M^\gamma\) asserts the ownership of the ghost resource \(a\) for an instance \(\gamma\) of the PCM \(M\). Separating conjunction for ghost state assertions simply lifts the PCM composition operation to the assertion level:

\[
\gamma_1 \cdot M_1^\gamma \ast \gamma_2 \cdot M_2^\gamma \iff \gamma_3 \cdot M_3^\gamma.
\]

If two PCM fragments are not compatible (i.e., their composition is not defined), then it is not possible to own both of them at the same time, i.e., if \(a \cdot b = \perp\) then \(a : M_1^\gamma \ast b : M_2^\gamma \Rightarrow False.\)

In order to maintain consistency of the logic, therefore, changes to ghost state are restricted to *frame-preserving* updates, in which a PCM fragment \(a\) can only be updated to \(b\) if \(b\) preserves compatibility with any other fragments in the environment (the *frame*):

\[
\text{Ghost-Update}\]

\[
\forall a_f. \ a \cdot a_f \neq \perp \Rightarrow b \cdot a_f \neq \perp
\]

\[
\frac{\gamma_{1,2}^{a_f} \Rightarrow \gamma_{1,2}^{b}}{\gamma_{1,2}^{a_f} \Rightarrow \gamma_{1,2}^{b}}
\]

Ghost updates belong to the set of *logical computations*, or in Iris terminology, *view shifts*. A view shift \(P \Rightarrow Q\) represents the capability of transforming a resource satisfying \(P\) into a resource satisfying \(Q\) without changing the physical state.

\(^3\) Actually, ghost state in Iris is based on the more general mechanism of “cameras” (aka step-indexed resource algebras), which can support a more general form of higher-order ghost state [12].

\(^4\) In the rest of the paper we also suppress the PCM \(M\) in \(a : M_1^\gamma\) when it can be inferred in context.
A PCM for heaps

As a step towards defining $\ell \mapsto v$, let us now construct a PCM called $\text{Heap}$ that has the same basic structure as the physical heap, but allows splitting and recomposition. (We will ultimately need a slightly more sophisticated PCM to define $\ell \mapsto v$, but $\text{Heap}$ is an important part of the construction.) $\text{Heap}$ is a finite partial map from locations to values, with the empty heap as its unit element, and the composition on heaps is defined as disjoint union ($i.e.$, union if the heaps have disjoint domain, and undefined otherwise). The composition implies that the singleton heap $[\ell := v]$ does not combine with itself, so it can only be uniquely owned, and it also represents the permission required to update $\ell$:

$$\text{Ghost-Heap-Exclusive} \quad [\ell := v] \ast [\ell := w] \vdash \text{False}$$

$$\text{Ghost-Heap-Update} \quad [\ell := w] \Rightarrow [\ell := v]$$

The singleton heap $[\ell := v]$ therefore has the desired properties for defining the local assertion $\ell \mapsto v$, but unfortunately it is still not quite enough: we also need some way to tie this ghost state assertion to the underlying physical state of the program. Toward this end, we employ Iris’s invariants.

Invariants

Invariants in Iris can be thought of as assertions that hold of some shared resource at all times, although the choice of which shared resource satisfies them is allowed to vary over time. The Iris invariant assertion $[N] P$ stipulates that $P$ is an invariant. The resource that satisfies it is shared with all threads, and thus any thread can access it freely in a single physical step:\footnote{In Iris terminology, a resource in an invariant can be accessed within an atomic operation, which is an operation that takes only a single physical step of execution. We do not use the term here to avoid confusion with C11 atomics.}: it can open the invariant and gain local ownership of the resource for the duration of the operation, so long as it can close the invariant by relinquishing ownership of some (potentially different) resource satisfying $P$ at the end of the operation. For bookkeeping purposes—specifically, to ensure that we do not unsoundly open the same invariant more than once in a nested fashion—invariants in Iris are named, and the $N$ in the above invariant assertion is a namespace (set of names) from which the name of the invariant must come.

Invariants belong to the set of persistent assertions, denoted with the always modality $\Box$. The assertion $\Box P$ establishes the knowledge that $P$ holds without any ownership, and therefore holds forever after. Putting resources into an invariant is thus a common way to share or transfer ownership through the use of freely distributable knowledge.

Meanwhile, the actions of opening and closing invariants belong to the set of logical computations, or view shifts. To account for invariants, view shifts are extended with namespaces as well: $P \overset{N_1}{\Rightarrow} N_2 Q$ asserts that, assuming the invariants named in $N_1$ hold before the view shift, then the invariants named in $N_2$ hold after the view shift. Opening and closing of invariants are then formalized as follows:

$$\text{Inv-Open} \quad \begin{array}{c}
[N] P \vdash \text{True} \\
\Rightarrow^{\emptyset} N \quad P
\end{array}$$

$$\text{Inv-Close} \quad \begin{array}{c}
[N] P \vdash P \overset{}{\Rightarrow^{N}} \text{True}
\end{array}$$

$\text{Inv-Open}$ allows a thread to open the invariant $P \overset{N}{\Rightarrow} P$ and gain ownership of $P$ but prevents it from doing so more than once. Only after applying $\text{Inv-Close}$ and re-establishing the invariant will the thread be able to open it again.
Note: The Inv-Open rule as stated here is only sound if \( P \) talks about ownership (of physical or ghost state) and not about the existence of other invariants. In general, however, Iris makes no such restriction; rather, it supports impredicative invariants, meaning that \( P \) can be an arbitrary assertion. In order to avoid paradoxes caused by impredicative circularities (like the one described in [15]), the fully general version of this rule in Iris requires that \( P \) be guarded by the step-indexed later modality (\( \triangleright \)). Fortunately, in most cases the \( \triangleright \)'s can be stripped away automatically (the Iris proof mode in Coq provides support for doing this) and do not play an interesting role in proofs. To focus the presentation of this paper, we will therefore suppress further discussion of the \( \triangleright \) modality.

Hoare triples in Iris are also annotated with invariant namespaces, since Hoare triples combine both physical and logical computations. A Hoare triple \( \{ P \} e \{ x. Q \} \) with a namespace \( N \) implies that if the invariants in \( N \) hold before the expression’s execution, then they will be preserved between every step and also after its execution. Consequently, when verifying any single physical step of computation, we are free to open the invariants in \( N \) so long as we immediately close them. This reasoning is encapsulated in the following “atomic rule of consequence”:

\[
\begin{align*}
AConsq & \quad P^{N \cup N'} \Rightarrow N' \quad P' \quad \{ P' \} e \{ v. Q' \}^{N'} & \quad \forall v. Q' \Rightarrow N^{\cup N'} \quad Q \quad e \text{ takes 1 physical step} \\
\end{align*}
\]

Since bookkeeping of namespaces is largely a tedious detail (and one which Coq will force us to get right), we will for the remainder of the paper suppress namespaces from definitions and proofs. We will always use disjoint namespaces to ensure correctness in opening invariants.

**Linking physical and ghost state using invariants and the “authoritative” PCM**

Now, returning to our example, the key idea is to use an invariant to tie the physical state assertion together with local ghost state assertions, thereby giving them meaning. To achieve this, we will employ an extremely useful construction called the authoritative PCM [13].

Given a base PCM \( M \), the authoritative PCM Auth(\( M \)) has two types of elements: authoritative \( \bullet a \) and non-authoritative \( o a \) (for \( a \in M \)). For any instance \( \gamma, N \) is exclusive (i.e., \( \bullet a^{\gamma} + o a^{\gamma} + \mathit{False} \)), and is the main point of reference for all the non-authoritative fragments, in the sense that any ownable fragment \( o a^{\gamma} \) must have \( b \) included in \( a \), that is \( \exists c. a = b \cdot c \). The PCM’s update therefore requires more: if one wants to update \( b \) to \( b' \), it has not only to ensure \( b' \) is compatible with \( c \), but also has to update \( a \) to \( a' = b' \cdot c \). These properties are summarized in the following two rules:

\[
\begin{align*}
\text{Auth-Agree} & \quad \exists c. a = b \cdot c \\
\text{Auth-Update} & \quad b' \cdot c \neq \bot
\end{align*}
\]

With these two rules in hand, we can derive the following rules for operations on Auth(Heap):

\[
\begin{align*}
\text{AGhost-Heap-Exclusive} & \quad \{ \ell \mapsto v \} \cdot \boxed{\ell}^{\gamma} \cdot \mathit{False} \\
\text{AGhost-Heap-Agree} & \quad \{ \sigma \gamma \} \cdot \boxed{\ell}^{\gamma} + \sigma(\ell) = v \\
\text{AGhost-Heap-Update} & \quad \boxed{\sigma \gamma} \cdot o \boxed{\ell \mapsto w}^{\gamma} \Rightarrow \boxed{\sigma \gamma} \cdot \boxed{\ell \mapsto v}^{\gamma} \end{align*}
\]

We are now ready to establish the invariant \( \boxed{\sigma}. \mathit{Phys}(\sigma) \cdot \boxed{\sigma \gamma} \), which binds together the physical state ownership and the authoritative ghost heap ownership. With the invariant
in place, AGHOST-HEAP-AGREE implies that if a thread owns the singleton ghost heap $\ell := v$ locally, then, in combination with the invariant, it is guaranteed that $\ell$ currently has value $v$ in the physical heap. AGHOST-HEAP-EXCLUSIVE and AGHOST-HEAP-UPDATE ensure that only the one thread who owns $\ell := v$ can make updates to the contents of $\ell$.

The points-to assertion is then defined as $\ell \xrightarrow{\nu} v \triangleq \ell := v$, and we can easily prove the small-footprint triples from the beginning of this section by combining the above rules for authoritative ghost heaps with those for opening and closing invariants.

### 3.1.2 Verifying MP in $\lambda_{SC}$

Using the small-footprint triples, we are ready to verify MP in $\lambda_{SC}$. We discuss the proof in a bit of detail here, since later on we will show how to adapt this proof to verify MP under weak-memory semantics.

The proof of MP is given in Figure 7. As a proof convention, we only mention persistent assertions (like invariants) once and use them freely later, since they are always true after being established. The proof works essentially as follows.

First of all, both threads want to operate on $y$ simultaneously, so we need to put ownership of $y$ into an invariant $\text{Inv}_y$. This invariant says that $y$ is in one of two states—0 or 1. We can establish the invariant right after the initialization of $y$ (the write of 0 to $y$), because $y$ is in state 0 at that moment. The first thread is responsible for setting $y$ to state 1. When the second thread observes that $y$ is in state 1, it will expect to be able to gain ownership of $x \xrightarrow{37}$. To achieve this, in state 1, $\text{Inv}_y$ asserts the existence of another invariant $\text{Inv}_x$ concerning $x$, and it is this latter invariant that we use to transfer ownership of location $x$ from the first thread to the second thread.

To understand $\text{Inv}_x$, it helps to have seen the film Raiders of the Lost Ark, or at least the first few minutes of it, in which Indiana Jones (played by Harrison Ford) attempts to steal a precious golden idol from an ancient Peruvian temple—without setting off booby traps—by swapping it for a similarly weighted bag of sand. Unfortunately for him, the temple detects his ruse and tries to kill him. But we can play a similar trick, and Iris will be perfectly happy! In our case, the “golden idol” is $x \xrightarrow{37}$, which is transferred into the invariant $\text{Inv}_x$ when it is established by the first thread. The “bag of sand” is a “token” $\bigcirc$ (a uniquely ownable piece of ghost state) that is given to the second thread at the beginning of its execution. $\text{Inv}_x$ simply asserts that it owns either the golden idol or the bag of sand. Thus, when the second thread learns of the existence of $\text{Inv}_x$, it can safely use the invariant opening and closing rules to swap the bag of sand in its possession for the golden idol owned by $\text{Inv}_x$, and thereafter claim local ownership of $x \xrightarrow{37}$. 

![Figure 7](image-url)
We now consider an instantiation of Iris with our language from §2.3. A key difference between \( \lambda_{RN} \) and \( \lambda_{SC} \) is that the expression reductions of \( \lambda_{SC} \) do not depend on which thread is executing the expression, since every thread has the same global view of the memory, whereas the reductions of \( \lambda_{RN} \) depend on the current thread’s subjective view of the memory. Thus, we need to be able to talk about thread ids in our logic as well. To this end, we pair up the per-location history, we define our first local assertion: The \( \text{Hist}(\ell, h) \) asserts ownership of \( \ell \) and knowledge of its write history \( h \).

Since the ability to read or write values in \( \lambda_{RN} \) depends on threads’ local views of memory, we would also like to support an assertion of thread-view ownership, \( \text{Seen}(\pi, V) \), which asserts ownership of the current view \( V \) of the thread \( \pi \) and is required to update \( \pi \)'s view. Since any operation by a thread \( \pi \) on a location \( \ell \) relies on both \( \pi \)'s current view and \( \ell \)'s history, the pair of history and thread-view ownership assertions comprise the general ghost ownership required for accessing the location.

To tie these local assertions to the global physical state, we use a construction very similar to the one for \( \lambda_{SC} \). The local assertions are defined as before by wrapping finite-map PCMs with the authoritative PCM construction, and the invariant enforces that these ghost maps are coherent with the physical state. The definitions of the invariant \( \text{PSInv} \) and the two local assertions are given in Figure 8. The \( \text{Hist}(\ell, h) \) and \( \text{Seen}(\pi, V) \) assertions also have the exact same rules for exclusiveness, agreement, and updating as the \( \ell \to v \) assertion of \( \lambda_{SC} \).

\[\begin{align*}
\text{Hist}(\ell, h) &\triangleq \exists v. \left[ \begin{array}{c}
\{ v \mapsto h \} \vdash \ell \to v \\\n\{ v \mapsto h \} \vdash \ell \to v
\end{array} \right] \\
\text{Seen}(\pi, V) &\triangleq \exists v. \left[ \begin{array}{c}
\{ v \mapsto V \} \vdash \pi \leftrightarrow V \\
\{ v \mapsto V \} \vdash \pi \leftrightarrow V
\end{array} \right]
\end{align*}\]

\( \text{PSInv} \triangleq \exists \sigma. \forall H \in \text{Loc}. \exists M(\text{Phys}(\sigma) + \left[ \begin{array}{c}
\{ \sigma \mapsto H \} \vdash \text{Hist}(\sigma, H) \\
\{ \sigma \mapsto H \} \vdash \text{Hist}(\sigma, H)
\end{array} \right]) \\
\text{Hist}(\sigma, H) \triangleq \text{dom}(H) = \{ (m, \ell) \mid m \in M \} \land (\forall m. m.t = m.V(m.\ell)) \\
\land \forall \ell \in \text{dom}(H). \ H(\ell) = \{ (m.v, m.t, m.V) \mid m \in M \in \ell \land m.t = \sigma.N(\ell) \}
\]

Figure 8 Ghost state and invariant setup for \( \lambda_{RN} \).

3.2 Instantiating Iris with \( \lambda_{RN} \)

We now consider an instantiation of Iris with our \( \lambda_{RN} \) language from §2.3. A key difference between \( \lambda_{RN} \) and \( \lambda_{SC} \) is that the expression reductions of \( \lambda_{SC} \) do not depend on which thread is executing the expression, since every thread has the same global view of the memory, whereas the reductions of \( \lambda_{RN} \) depend on the current thread’s subjective view of the memory. Thus, we need to be able to talk about thread ids in our logic as well. To this end, we pair up expressions with \( \lambda_{RN} \) with thread ids, making them visible in our specifications. Eventually, in §4, we will see how we can reason about \( \lambda_{RN} \) without talking explicitly about thread ids.

3.2.1 Encoding Separation Logic for \( \lambda_{RN} \)

After instantiating Iris with \( \lambda_{RN} \), as in the case of \( \lambda_{SC} \), Iris provides us with large-footprint specifications of the primitive commands for free, which concern the physical state assertion and mirror the rules of \( \lambda_{RN} \)'s operational semantics. Recall that in \( \lambda_{RN} \) the physical state is a tuple \( (M, T, N) \) of the message pool \( M \), the current view map \( T \), and the non-atomic timestamp map \( N \). As before, we aim to develop “local” assertions using ghost state, establish an invariant that connects those local assertions to the physical state assertion, and then derive small-footprint specifications of the primitive commands for use in modular verification. But what kind of “local” assertions do we want?

For \( \lambda_{SC} \), we had the points-to assertion \( \ell \to v \), but in \( \lambda_{RN} \) we no longer have a simple mapping from locations to values. Rather, associated with each location \( \ell \) is a set of messages corresponding to writes to \( \ell \). We will represent that associated information as a history \( h \), consisting of a set of triples \( (v, t, V) \), where \( v \) is a value written to \( \ell \), \( t \) is the timestamp at which that value was written, and \( V \) is the view of the writing thread at the time the write occurred. Note that \( V \) incorporates the new timestamp, i.e., \( V(\ell) = t \). To reflect the per-location history, we define our first local assertion: The history ownership assertion \( \text{Hist}(\ell, h) \) asserts ownership of \( \ell \) and knowledge of its write history \( h \).

Since the ability to read or write values in \( \lambda_{RN} \) depends on threads’ local views of memory, we would also like to support an assertion of thread-view ownership, \( \text{Seen}(\pi, V) \), which asserts ownership of the current view \( V \) of the thread \( \pi \) and is required to update \( \pi \)'s view. Since any operation by a thread \( \pi \) on a location \( \ell \) relies on both \( \pi \)'s current view and \( \ell \)'s history, the pair of history and thread-view ownership assertions comprise the general ghost ownership required for accessing the location.
Strong Logic for Weak Memory: Reasoning About Release-Acquire Consistency in Iris

\[
\begin{align*}
\text{BASE-NA-READ} & \quad \text{PSInv} \vdash \{ \text{Seen}(\pi, V) \land \text{Hist}(\ell, h) \land \text{init}(h, V) \land \text{na}(h, V) \} \\
& \quad \ell_{[\text{na}]} : = v, \pi \\
& \quad \{ v. \ \text{Seen}(\pi, V) \land \text{Hist}(\ell, h) \land \text{na}(h, V) \land \max(h).v = v \}
\end{align*}
\]

\[
\begin{align*}
\text{BASE-NA-WRITE} & \quad \text{PSInv} \vdash \{ \text{Seen}(\pi, V) \land \text{Hist}(\ell, h) \land \text{alloc}(h, V) \} \\
& \quad \ell_{[\text{na}]} : = v, \pi \\
& \quad \{ 2V' \supseteq V, t, h' = \{ (v, t, V') \}. \ \text{Seen}(\pi, V') \land \text{Hist}(\ell, h') \land \text{na}(h', V') \land \text{init}(h', V') \}
\end{align*}
\]

\[
\begin{align*}
\text{BASE-AT-READ} & \quad \text{PSInv} \vdash \{ \text{Seen}(\pi, V) \land \text{Hist}(\ell, h) \land \text{init}(h, V) \} \\
& \quad \ell_{[\text{at}]} : = v, \pi \\
& \quad \{ v. \ \exists t_1, V_1, V' \supseteq V \cup V_1. \ \text{Seen}(\pi, V') \land \text{Hist}(\ell, h) \land (v, t_1, V_1) \in h \land t_1 \geq V(\ell) \}
\end{align*}
\]

\[
\begin{align*}
\text{BASE-AT-WRITE} & \quad \text{PSInv} \vdash \{ \text{Seen}(\pi, V) \land \text{Hist}(\ell, h) \land \text{alloc}(h, V) \} \\
& \quad \ell_{[\text{at}]} : = v, \pi \\
& \quad \{ 3V' \supseteq V, t, h' = h \cup \{ (v, t, V') \}. \ \text{Seen}(\pi, V') \land \text{Hist}(\ell, h') \land \text{init}(h', V') \}
\end{align*}
\]

\[\text{Figure 9} \quad \text{Selected Hoare triples of } \lambda_{\text{RN}} \text{ base logic.}\]

It is important to note that the history ownership assertion Hist(\ell, h) does not record the full history of \ell, but only write events of the current era (see §2.2), i.e., only events as recent as or more recent than the last non-atomic write to \ell. This is reflected in the last condition of HInv: m.t \geq \sigma.N(\ell). Defining Hist(\ell, h) this way helps to simplify the job of the user of the logic in establishing the absence of races: by construction, it is impossible to even attempt to read (racyly) from write events before the current era. In order to preserve this property of Hist(\ell, h), a non-atomic write (v_{na}, t_{na}, V_{na}) must completely remove all write events currently in h (which would be racy to access now), and replace it with a new history h' that contains only the newly created non-atomic write event: h' = \{(v_{na}, t_{na}, V_{na})\}.

With the physical state shared and its ghost counterpart splittable, we are ready to derive the small-footprint Hoare triples, which constitute a base logic that is powerful enough to verify programs in \lambda_{\text{RN}}. A selected set of these triples is given in Figure 9. The general pattern of these rules is that a thread \pi needs to own the history h of a location \ell and its own thread view V in order to operate on \ell. Additionally, \pi needs to show certain relations between h and V in order to guarantee the safety of its operations. These relations are represented by the alloc, init, and na predicates. The alloc(h, V) predicate (resp. init(h, V)) asserts that V has observed an event in h which ensures that \ell is allocated (resp. initialized). The na(h, V) predicate asserts that V has observed the mo-latest write event of h, which is needed to do a non-atomic read. Note that all of these predicates require that V has seen some event from h—i.e., an event from the current era of \ell—which, as discussed above, is a prerequisite for non-raciness.

These base logic rules provide a very concise explanation of \lambda_{\text{RN}}’s operational semantics. BASE-AT-READ, for example, requires the current view’s knowledge of \ell being initialized and ensures that the new updated view V' is at least the join of the local view V and the view V_1 of the write event that the thread reads from. This event must be from h and not mo-earlier
Invariants:

\[ \text{Inv}_y(V_0) \triangleq \exists h. \text{Hist}(y, h) * (0, \_, V_0) \in h \land \forall V_1, v_1 \neq 0. (v_1, \_, V_1) \in h \Rightarrow \exists V_{37} \subseteq V_1. \text{Inv}_x(V_{37}) \]

\[ \text{Inv}_x(V_{37}) \triangleq \exists z \forall \text{Hist}(x, [(37, \_, V_{37})]) \]

Thread 1 proof outline:

\[
\begin{align*}
\{ \text{Seen}(\pi, V_0) * \text{Hist}(x, [(0, \_, V_x)]) * V_x \subseteq V_0 * \text{Inv}_y(V_0) \} \\
x_{[na]} := 37 \\
\{ \exists V_{37} \subseteq V_0. \text{Seen}(\pi, V_{37}) * \text{Hist}(x, [(37, \_, V_{37})]) \} \\
\{ \text{Seen}(\pi, V_{37}) * \text{Inv}_x(V_{37}) \} \\
\text{open Inv}_y \\
\begin{cases}
\{ \text{Seen}(\pi, V_{37}) * \exists h. \text{Hist}(y, h) * \ldots \} \\
y_{[iat]} := 1 \\
\{ \exists V_1 \sqsubseteq V_{37}. \text{Seen}(\pi, V_1) * \text{Hist}(y, h \sqcup [(1, \_, V_1)]) * \text{Inv}_y(V_{37}) \} \\
\{ \text{Seen}(\pi, V_1) * \text{Inv}_y(V_0) \} \\
\end{cases}
\end{align*}
\]

Thread 2 proof outline:

\[
\begin{align*}
\{ \text{Seen}(\pi, V_0) * \text{Inv}_y(V_0) * \square \} \\
\text{repeat } y_{[iat]}: \\
\{ \exists V_1, V_{37}, V_2. V_2 \sqsubseteq V_1 \sqsubseteq V_{37} * \text{Seen}(\pi, V_2) * \text{Inv}_x(V_{37}) * \square \} \\
\{ \text{Seen}(\pi, V_2) * V_{37} \subseteq V_2 \sqsubseteq \text{Hist}(x, [(37, \_, V_{37})]) \} \\
x_{[na]} \\
\{ z, \text{Seen}(\pi, V_2) * z = 37 * \text{Hist}(x, [(37, \_, V_{37})]) \} \\
\end{align*}
\]

**Figure 10** Verification of MP in \( \lambda_{RN} \) base logic.

than the write event observed by the thread previously. **Base-NA-Read** is similar, except that it requires that the current view must have observed the mo-latest write event to \( \ell \), and therefore reads from that write event, which we denote by max(\( h \)). **Base-NA-Write** preserves the Hist invariant by proactively dropping from the history all the old write events, which are mo-before this non-atomic write. Notice that, unlike **Base-NA-Read**, **Base-NA-Write** does not require na(\( h, V \)), as explained in §2.2.

### 3.2.2 MP in \( \lambda_{RN} \)

We show that the base logic is enough to verify MP in \( \lambda_{RN} \). The invariants and the proof, given in Figure 10, follow closely those used for MP in \( \lambda_{SC} \). The singleton heap ownership in \( \lambda_{SC} \) is replaced with the history ownership, and extra conditions on view extension are added to reflect the view updates inherent in \( \lambda_{RN} \). More specifically, in Inv\_y, we enforce that any write of a non-zero value\(^6\) to \( y \) be at a view \( V_1 \) that extends \( V_{37} \), which is the view at which the write of 37 to \( x \) is made by the first thread. Consequently, when the second thread observes \( V_1 \) (by reading \( y \) to be non-zero), it must have also observed \( V_{37} \), and thus can read \( x = 37 \), using the Indiana Jones invariant Inv\_x. The extra conditions on \( V_0 \) ensure that \( y \) is initialized with 0 at \( V_0 \), so that the second thread (having observed \( V_0 \)) can safely read \( y \).

We have shown that the base logic is powerful enough to verify MP in \( \lambda_{RN} \), and in principle it is powerful enough to verify many other realistic weak memory programs that

\(^6\) In the MP example this value is always 1.
are expressible in \( \lambda_{RN} \) as well. However, it is also clear that the base logic is not abstract enough: one has to burden oneself with keeping track of the low-level details of changes to locations’ histories and threads’ views. What we really want is a way of abstracting away from those low-level details and finding simple high-level reasoning principles for \( \lambda_{RN} \), the type of reasoning principles supported by RA+NA logics like GPS and RSL. We will now see how such high-level principles can in fact be derived on top of our low-level base logic.

\section{iGPS}

Vafeiadis et al. have introduced several logics for C11, and two in particular that were focused on RA+NA: GPS \cite{gps} and RSL \cite{rsl}. The key difference between these logics and the RA+NA base logic presented in §3.2.1 is that, in GPS and RSL, the user does not reason explicitly about views—instead, the assertions of these logics are \textit{implicitly} predicated over the view of the thread asserting them. This helps to hide much tedious reasoning about views, and leads naturally to a model of assertions as predicates over views (§4.3).

We have encoded iGPS and iRSL, variants of both GPS and RSL, in Iris, and we will focus here on iGPS, since it is the more sophisticated of the two logics. (We briefly describe iRSL in §5.) GPS introduced several useful abstractions that were not present in RSL: PCM-based ghost state, single-location protocols, and escrows. In encoding GPS in Iris, as noted in the introduction, we get PCM-based ghost state completely for free, just by working in Iris. Below, we will describe the other features, and how we account for them in Iris.

Note that our goal with iGPS is not to slavishly imitate all details of GPS, but to provide proof rules very similar to GPS’s that are both strong enough to support all the examples from the GPS papers \cite{gps, gps2} and significantly easier to prove sound within Iris. To this end, we slightly restrict select rules, but only in a way that does not impact their utility on known case studies. We discuss the differences from the original rules in detail in §6. Furthermore, in §4.2, we show how iGPS supports an additional feature—\textit{single-writer protocols}—that was not supported by the original GPS and that significantly simplifies proofs.

\subsection{Key Features of GPS}

In this section, we explain the key features of GPS and how they are formalized in iGPS (besides PCM-based ghost state, which is directly imported into iGPS from Iris). A selected set of iGPS proof rules is given in Figure 11.

\textbf{Non-atomics}

Since non-atomic locations may not be raced on, GPS (and iGPS) reason about them much in the same way that locations are reasoned about in standard separation logic: using the points-to assertion, \( \ell \xrightarrow{v} \). Note that the proof rules for reading (iGPS-NA-Read) and writing (iGPS-NA-Write) and the exclusivity property (iGPS-NA-Exclusive) are equivalent to those from the logic for \( \lambda_{SC} \) (§3.1.1). Additionally, GPS (and iGPS) support fractional ownership of non-atomics to allow such locations to be read (but not written) by multiple threads at once. We omit the rules of fractional ownership for brevity.

\textbf{Protocols}

To reason about RA atomics, we need a mechanism for controlling interference on such accesses. Toward this end, CSLs for SC have supported a variety of protocol mechanisms, which control how shared state may evolve over time, and several of the more recent logics \cite{gps2, gps3, gps4}
employ state transition systems (STSs) to formalize such protocols. Crucially, protocols enforce irreversibility: the state of an STS protocol can only make forward progress over the course of a proof. For example, in §3.1.2, a protocol could enforce that the variable $y$ could only progress from 0 to 1 but not back again. (We did not need to enforce that property to verify the MP example, but it is useful to be able to in general.) In Iris, protocols are encoded using a combination of invariants and ghost state.

Under weak memory, invariants and protocols are unsound in general because they require a single coherent history of updates to all locations. GPS showed how to partially restore protocol reasoning for weak memory with single-location protocols: protocols which restrict the evolution of a single shared location. Intuitively, single-location protocols are sound due to the per-location coherence property of C11 (often called “SC per location”): the writes to any single location are totally ordered (by $\mu$). In particular, they maintain the invariant that the order of protocol states associated with writes is consistent with their timestamp order. If write event $x$ to location $\ell$ with associated protocol state $s_x$ is $\mu$-before write event $y$ (to the same location) with protocol state $s_y$, then $s_x$ will be before $s_y$ in protocol order. Thus, once a thread has observed that the protocol on $\ell$ has reached state $s_x$, it can from that point on only observe the protocol on $\ell$ to be in states that are accessible from $s_x$. This fulfills the expectation that protocol transitions are irreversible.

GPS protocols come equipped with an interpretation predicate which specifies the resources held by the protocol depending on the protocol’s state and the location’s value. The primitive operations on an atomic location serve to transfer resources in and out of its protocol:

- Writes may transfer resources into the protocol, but may not transfer anything out.
- Reads may not transfer any resources into the protocol, but they may transfer “knowledge” (i.e., duplicable resources) out of it. They are restricted to transferring out duplicable resources because there may be many reads of the same write event.
- Updates (RMWs), by virtue of the physical synchronization they provide, may transfer resources both in and out of the protocol.

In iGPS, we represent protocols in a slightly different way, using two interpretations: a “full” interpretation, and a duplicable “read” interpretation that is implied by the full

<table>
<thead>
<tr>
<th>iGPS-NA-Read</th>
<th>iGPS-NA-Write</th>
<th>iGPS-NA-Exclusive</th>
</tr>
</thead>
<tbody>
<tr>
<td>${ \ell \rightarrow v } \ell_{[\mu]} { w = v \rightarrow v }$</td>
<td>${ \ell \rightarrow v } { \ell \rightarrow v }$</td>
<td>$\ell \rightarrow v \land \ell \rightarrow w \Rightarrow \bot$</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>iGPS-Read</th>
<th>iGPS-Cas</th>
<th>iGPS-Write</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\forall s', v. P \land \tau_{\text{read}}(s', v) \Rightarrow Q$</td>
<td>$\forall s'. s' \supseteq s. (\ell : s \tau \not\in [\mu] \Rightarrow [v \exists s' \tau \not\in \nu])$</td>
<td>$(\forall s''. s'' \supseteq s'). P \Rightarrow \tau_{\text{full}}(s'', v) \land Q$</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>iGPS-Persistent</th>
<th>iGPS-Agree</th>
<th>iGPS-Escrow-Intro</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\ell : s \tau \Rightarrow \bot$</td>
<td>$\ell : s_1 \tau \Rightarrow \ell : s_2 \tau \Rightarrow s_1 \subseteq s_2 \lor s_2 \subseteq s_1$</td>
<td>$Q \Rightarrow \lbrack P \Rightarrow Q \rbrack$</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>iGPS-Escrow-Elm</th>
<th>iGPS-Escrow-Persistent</th>
</tr>
</thead>
<tbody>
<tr>
<td>$P \land \lbrack P \Rightarrow Q \rbrack \Rightarrow Q$</td>
<td>$\lbrack P \Rightarrow Q \rbrack \Rightarrow [\square \lbrack P \Rightarrow Q \rbrack]$</td>
</tr>
</tbody>
</table>

Figure 11 iGPS proof rules for non-atomics, protocols, and escrows.
interpretation. The intuition is that the read interpretation is used for reads (since they may only transfer duplicable resources out of the protocol) and the full interpretation is used for the other operations.

We will now present the formal definition of protocols and the corresponding proof rules.

Definition 3 (Protocols). A protocol \( \tau \) comprises a non-empty state set \( S \), a reflexive, transitive transition relation \( \sqsubseteq \subseteq S \times S \), and two interpretation predicates \( \tau_m(\cdot, \cdot) \in S \times \text{Val} \rightarrow \text{Prop} \) with \( m \in \{ \text{read}, \text{full} \} \) representing read and full interpretations, respectively. The interpretation predicate has to fulfill the following two laws:

\[
\tau_{\text{full}}(s, v) \implies \tau_{\text{full}}(s, v) \ast \tau_{\text{read}}(s, v)
\]

\[
\tau_{\text{read}}(s, v) \implies \tau_{\text{read}}(s, v) \ast \tau_{\text{read}}(s, v)
\]

We write \( [\ell : s \tau] \) (as in GPS) to denote the persistent assertion that \( \ell \) is governed by protocol \( \tau \) and that the protocol has been observed in state \( s \).

The first rule, IGPS-Agree, represents the guarantee that every protocol is always in a state consistent with all observations, i.e., that all observed states can be linearly ordered w.r.t. the transition relation \( \sqsubseteq \).

IGPS-Read enables reading from a location governed by protocol \( \tau \)—allowing the user to observe a future state \( s' \) of whatever state \( s \) it has previously observed, and providing the associated read interpretation \( \tau_{\text{read}} \).

Writes to the location are subject to IGPS-Write, which allows the user to move the protocol to a “final state” \( s' \)—i.e., a state accessible from every state in the protocol—so long as they can provide \( \tau_{\text{full}} \) for \( s' \). This rule may seem very weak, since it forces the protocol into a final state, but this weakness derives from the need to handle the general case where there can be write-write races. Write-write races allow only very limited reasoning: both writes have to prove that their protocol state is later in protocol order to the other one. The write rule presented here solves this problem in a very simple way (see §6 for a comparison with the original GPS rule). In §4.2, we present a much stronger write rule that is optimized for the common case where there are no write-write races on the location.

Finally, IGPS-CAS governs updates. Its two premises represent the success and failure case, respectively. If the operation succeeds, the value read, \( v_o \), is guaranteed to belong to a future state \( s' \). The user picks the new state \( s'' \) depending on \( s' \) and establishes \( \tau_{\text{full}}(s'', v_n) \), making use of \( \tau_{\text{full}}(s', v_o) \). In case of a failure, the rule degenerates to IGPS-Read.

Escrows

A limitation of GPS protocols is that they offer no way to transfer ownership of (non-duplicable) resources from one thread to another unless the receiving thread performs physical synchronization via an update operation. For example, in our MP example, there is no update operation, and yet we want to transfer ownership of the non-atomic location \( x \) from the first thread to the second. For such an example, an additional mechanism for ownership transfer is required.

This motivates escrows, a mechanism for logical synchronization which, unlike protocols, is not tied to physical locations.

Definition 4 (Escrows). An escrow \( [P \rightsquigarrow Q] \) consists of a guard resource \( P \) and a payload resource \( Q \) to be transferred. The guard resource \( P \) must be exclusive, i.e. \( P \ast P \Rightarrow \bot \). The escrow assertion itself is persistent knowledge (freely duplicable).

The idea of escrows is really just a slight generalization of the “Indiana Jones invariant” \( \text{Inv}_x \) that we used in the proof of the MP example from §3.1.2. Following the explanation
there, the payload resource \( Q \) is the “golden idol”, the guard resource \( P \) is the “bag of sand”, and the escrow allows one to swap \( P \) for \( Q \). The restriction on exclusivity of \( P \) ensures that this swap can only be performed once.

The proof rules for escrows follow the above intuition. \texttt{iGPS-Escrow-Intro} places the payload resource \( Q \) in escrow. Any thread that learns of the existence of this escrow can then use \texttt{iGPS-Escrow-Elim} to trade ownership of the guard resource \( P \) for \( Q \).\footnote{The rule given for elimination is only sound if \( Q \) is “timeless”, meaning that it only describes ownership of (ghost) state, not knowledge about protocols or escrows, as is the case in our message passing example. A more general rule, which returns \( Q \) under the later modality, is given in Appendix C.}

\section*{Message passing in iGPS}

The verification of MP using iGPS is given in Figure 12. Although the verification is sound for \( \lambda_{RN} \), it is much simpler than the proof we carried out in the base logic of Iris, and is in fact very close in structure to the SC verification of MP in \( \lambda_{SC} \). In particular, the Indiana Jones invariant \( \text{Inv} \) has now become an escrow \( \text{XE} \), and the invariant \( \text{Inv}_y \) has now become a (2-state) iGPS protocol \( \text{YP} \), but otherwise the steps are almost the same. The abstraction of iGPS has relieved us from the burden of reasoning with history and view updates explicitly.

\subsection*{4.2 Single-Writer Protocols}

As we observed above, the iGPS protocol write rule suffers from a restriction: the user has to transition to a final state. This restriction is not present in CSLs for SC, which let the user pick the future state depending on the current one, much as \texttt{iGPS-CAS} does. Fortunately, in the common case when there are no write-write races to the location, this restriction can be lifted by \textit{single-writer protocols}, a novel invention of iGPS.

A single-writer protocol splits the protocol assertion into two parts: an exclusive writer assertion \( \ell : [s : \tau] \), and a persistent reader assertion \( \ell : [s : \tau] \). Owning the writer assertion provides both the permission to change the state as well as the guarantee that no one else can change it. Owning the reader assertion only allows reads. Thus, the reader assertion represents a lower bound on the protocol state whereas the state contained in the writer assertion is exactly the most recent state of the protocol. The full proof rules for single-writer protocols are given in Figure 13, and of these, the write rule \texttt{iGPS-SW-Write} is the most important. The writer knows exactly what the current state is, and is free to pick the next state accordingly.
Applications of single-writer protocols

Single-writer protocols provide more explicitly intuitive and concise proofs over normal protocols when there are no write-write races, i.e., only one thread is writing to the location. This may mean that there is exactly one writer in the whole program, or (perhaps more typically) that the programmer is using sufficient synchronization to ensure that there is exactly one active writer at a time. Such is the case for several headlining examples verified in GPS, including circular buffer, bounded ticket lock, and read-copy update [32, 30].

In the original GPS proofs for these examples, the lack of single-writer protocols meant that the proofs had to employ a significant amount of tedious ghost state (mostly in the form of so-called “protocol tokens”) to formalize the fact that the writing thread knew exactly which state the protocol had to be in at any given time. Using single-writer protocols, this reasoning is immediate from the iGPS-SW-Write rule. By removing the need for such boilerplate ghost state, single-writer protocols simplify and clarify the proofs of these examples.

An intriguing feature of the iGPS-SW-Write rule is that it is possible for the writer to relinquish ownership of the exclusive writer permission while doing the write itself. (This is why the writer permission appears in the precondition of the premise.) This extra flexibility is particularly useful when reasoning about RA implementations of locks (such as the bounded ticket lock). When the lock holder releases the lock (typically with a release write), this feature allows them to also give up their permission to do further release writes to the lock, so that it can be transferred to the next thread that acquires the lock.

4.3 The Model of iGPS

We now briefly describe the model of iGPS assertions. Figure 14 contains an excerpt of the encoding of the standard assertions and connectives of CSL as well as non-atomics and escrows. The somewhat more involved model of protocols is detailed in Appendix F.

We model iGPS assertions as monotone predicates over views. The view parameter represents the current view of the thread making the assertion. The monotonicity requirement is motivated by the observation that the view of a thread only grows over the execution of a program. To ensure properties like the frame rule, it is therefore crucial that simply adding information to a view does not invalidate previously held (e.g., framed) assertions. As a consequence of this requirement, we explicitly monotonize the encoding when necessary.
We now discuss these in more detail. The model of iGPS protocols consists of two parts: a protocol invariant, and local protocol assertions. The protocol invariant also owns the exclusive right to change the state. This construction also implies knowledge about the location’s history. In the case of single-writer protocols, the history and lower bounds on the current state of the protocol, and by the protocol invariant, indirectly access the full interpretation of the write event that it reads from (see \textit{iGPS-CAS}).

Protocols

The model of iGPS protocols consists of two parts: a protocol invariant, and local protocol assertions given out to clients. The protocol invariant owns the location’s history as well as the \textit{logical history} \(\Delta\), which tracks the transition history of protocol states and is always kept in agreement with the location’s history. Additionally, the invariant owns \(\tau_{\text{read}}\) for all writes in the history and \(\tau_{\text{full}}\) for select ones, depending on the protocol’s type. For example, in normal \(\tau_{\text{full}}\) is only stored for CAS-\textit{able} write events, justifying that only an update can access the full interpretation of the write event that it reads from (see \textit{iGPS-CAS}). Meanwhile, local protocol assertions hold knowledge about the logical history, which gives effectively lower bounds on the current state of the protocol, and by the protocol invariant, indirectly implies knowledge about the location’s history. In the case of single-writer protocols, the writer assertion also owns the exclusive right to change the state. This construction also relies on the authoritative PCM (see §3.1.1). More details are given in Appendix F.

Soundness of iGPS

The soundness of iGPS is expressed by the following theorem.
Theorem 1 (Adequacy). For any expression $e$, physical state $\sigma'$, and meta-level predicate $\Phi$ on values, we have

$$\left( \vdash \{ \top \} e \{ v, \Phi(v) \} \right) \Rightarrow \forall TS. \sigma_{\text{init}}; [0 \mapsto e] \Rightarrow^* \sigma'; TS \Rightarrow$$

$$\left( TS(0) \in \text{Val} \Rightarrow \Phi(TS(0)) \right)$$

$$\land \forall \rho \in \text{dom}(TS). TS(\rho) \in \text{Val} \lor \exists \sigma'', e'', e'' f, TS(\rho) \Rightarrow^* \sigma'', e'', e'' f$$

The theorem connects iGPS program specifications ($\vdash \{ \top \} e \{ v, \Phi(v) \}$) and the program’s possible executions, and provides two guarantees:

1. If the original thread with id 0 terminates with a value $v$, we know that $\Phi(v)$ holds.
2. For any pair of a state $\sigma'$ and a threadpool $TS$ reachable from the initial state $\sigma_{\text{init}}$ (see Definition 2) and the initial threadpool $[0 \mapsto e]$, we know that any thread $\rho$ in $TS$ either has terminated with a value or can still be reduced in the state $\sigma'$.

The proof follows from the adequacy theorem of Iris.

5 Other Contributions

In our Coq development accompanying this paper, we make several additional contributions that we briefly summarize here.

An RSL encoding

Using the same base logic from §3.2.1, we have mechanized iRSL, a higher-order variant of RSL [33]. RSL focuses on the message-passing style transferring of resources through release-write/acquire-read pairs. The two main assertions of RSL are Rel($\ell, Q$) and Acq($\ell, Q$), representing the permission to write to and read from $\ell$, respectively. The resource $Q(v)$ is released by writing $v$ to $\ell$, and then acquired by reading $v$ from $\ell$. Consequently, to support this MP-like mechanism, the encoding’s model shares a great deal with the model of iGPS, namely the full vs. read interpretation construction for protocols.

One particular feature of RSL, however, demands special attention. Although simpler than GPS, RSL has the extra ability to split the receiver predicate $Q$ into smaller predicates, so that different acquire reads of the same value $v$ can acquire different parts of the transferred resource: Acq($\ell, \lambda v. Q_1(v) * Q_2(v)$) $\Rightarrow$ Acq($\ell, Q_1$) * Acq($\ell, Q_2$). It is not obvious how to prove this sound when the splitting is completely arbitrary. Fortunately, a similar pattern, called the barrier pattern, has been addressed by Jung et al. [12], who propose the mechanism of “higher-order ghost state” to support such splitting. Our iRSL model basically extends Jung et al.’s barrier proof with a more complex (Iris) protocol to carefully manage resource splitting.

The encoding in Iris also provides us with useful extensions to the logic. Without extra work, iRSL naturally supports PCM-based ghost state and higher-order assertions, which were not available in the original RSL. The encoding shows that our approach has the right, reusable foundations to construct different logics for RA+NA.

In our RSL encoding, assertions are encoded as view predicates and proof rules are proven sound with respect to the base logic—in the same way as our GPS encoding. This allows us to soundly combine RSL and GPS reasoning principles in the same proof at no additional cost. It is even possible to design iGPS protocols whose state interpretation mentions iRSL assertions and vice versa. Of course, at a single point in time a location can only be governed by either iGPS or iRSL, as they represent incompatible modes of ownership transfer.
Allocation and deallocation

We have also incorporated support for memory allocation and deallocation into our RA+NA operational semantics. Since C11 is not clear about the semantics of allocation and deallocation, we take the liberty of defining them as reasonably as possible: in short, allocation and deallocation behave as non-atomic writes with special values $A$ and $D$, respectively.

Fractional protocols

So far, all protocols presented are permanent: once the protocols are established, they govern their locations forever. This poses two interesting questions: 1) Can we change the protocol which governs a location? and 2) How can we deallocate a location governed by a protocol? To support these features, we derive, with little modification to the iGPS model, fractional protocols, whose protocol assertions also assert the permission to even use the protocol. Initially, a protocol $\tau$ for a location $\ell$ will be established with the full fraction, and then it will be distributed to those who want to use $\tau$. Later, when the full fraction is recollected, one can disable the protocol (since no one else can use it), regain the raw ownership of $\ell$, and then deallocate $\ell$—or more interestingly, establish a new protocol for $\ell$! These recollectable protocols open up possibilities for verifying programs that do custom memory reclamation, e.g., RCU (see below). In the current Coq development, we have created fractional versions of both normal and single-writer protocols.

Mechanization and Case Studies

Our Coq mechanization employs a shallow embedding of iGPS (and iRSL), making critical use of the Iris Proof Mode [16]. In its current form, the proof mode is specific to the algebra of Iris and offers no additional support for embedded logics like our encoding of iGPS assertions. There are two consequences to this: 1) Unlike in the paper presentation of iGPS, where thread views are completely hidden in the (Iris) model of the logic, in our Coq proofs thread views are visible. However, they are also unobtrusive: all assertions in a given proof context always hold at the current thread’s local view, and the view only changes when the thread takes a step. Thus, while the views are visible, they are always manipulated and kept in sync in a very straightforward way, which we mostly automate with a set of simple tactics. 2) iGPS assertions cannot always be manipulated directly by the proof mode. We sometimes have to unfold our embedding of iGPS assertions—but not in the statements of our lemmas and theorems—to make explicit the underlying Iris assertions so that the proof mode can operate on them. As our embedding is very simple, this has little additional overhead. In particular, all lemmas and theorems stated at the iGPS level remain easily applicable even to the unfolded definitions at the Iris level.

We have verified all of the standard examples that have been proven in previous work. The simplest of these is the spin-lock example, proven in iRSL. More interestingly, using iGPS, we have also mechanized the message passing, circular buffer, bounded ticket lock, and Michael-Scott lock-free queue examples, which were only verified by hand in the original GPS paper. We have also verified a variant of the read-copy update (RCU) technique employed in the Linux kernel, following the proof of Tassarotti et al. [30]. The RCU proof is the most substantial example in iGPS, which simplifies the original proof in GPS by using fractional single-writer protocols that allow garbage collection. To our knowledge, our development provides the very first mechanized proofs of the circular buffer, bounded ticket lock, Michael-Scott queue, and RCU examples in a weak-memory setting.
6 Related Work

This paper demonstrates one of the first major applications of the Iris framework. Other recent applications include Krebbers et al. [16], who developed the interactive Coq proof mode for Iris that we rely on heavily in this paper, and Krogh-Jespersen et al. [17], who use Iris to encode a logical-relations model of a relational model of a type-and-effect system for a higher-order, concurrent programming language. Neither of those papers considers weak memory.

There are a number of program logics for weak memory models [27, 4, 1, 19], some of which have mechanized soundness proofs [33, 7, 25], but none of which provide real support for mechanized proofs of weak-memory programs in the way that we do.

FSL++ [8], an extension of FSL [7] (with ghost state) and RSL (with relaxed accesses), was used to mechanize a proof of an implementation of atomic reference counters based on the one in Rust’s Arc library. The proof is done by applying the basic laws of separation logic, resulting in painful manual work. Our approach alleviates a great deal of such tedium using the Iris proof mode. As of now, however, iGPS cannot reason about relaxed accesses.

More recently, RSL, FSL, and FSL++ have been encoded in Viper [21] to provide an automated verification approach to weak memory programs [28]. The encodings, however, axiomatize all proof rules without providing soundness guarantees, and are specific to the style of RSL and its FSL descendants. Our base logic, in contrast, is not tied to any specific surface logic and can be used to develop and prove sound different surface logics. It remains to be seen if the more expressive GPS protocols can be encoded in Viper.

iGPS is based on the GPS logic [32] and supports all reasoning mechanisms of GPS. However, the exact rules of iGPS differ in various small ways from the original ones in GPS. These differences stem from pragmatic choices made to simplify the soundness proof of iGPS.

A particularly noteworthy difference from GPS appears in the premises of the \textit{GPS-Read} and \textit{GPS-Write} rules: the split of the protocol interpretations \(\tau\) into \(\tau_{\text{read}}\) and \(\tau_{\text{full}}\). We show the original GPS rules below for comparison.

\textbf{GPS-Read}
\[
\forall s' \sqsubset s. \; P \ast \tau(s', v) \Rightarrow \Box Q
\]
\[
\left\{ \ell : s \tau \ast P \right\} \ell_{[a]} : \left\{ v, \ell : s' \tau \ast Q \right\}
\]

\textbf{GPS-Write}
\[
\forall s' \sqsubset s. \; P \ast \tau(s', v) \Rightarrow s' \sqsubset s'' P \Rightarrow \tau(s'', w) \ast Q
\]
\[
\left\{ \ell : s \tau \ast P \right\} \ell_{[a]} := w \left\{ \ell : s'' \tau \ast Q \right\}
\]

The interpretation \(\tau(s, v)\) in these two GPS rules is equivalent to \(\tau_{\text{full}}(s, v)\) in iGPS. In GPS, the user can gain access to the full interpretation of the “current” protocol state \(s'\), but only to obtain some knowledge, not to consume (i.e., transfer out of the protocol) any non-duplicable resources owned by that interpretation. This is enforced in \textit{GPS-Read} by guarding \(Q\) with an always modality \(\Box\), and in \textit{GPS-Write} by requiring the user to establish the interpretation of the new write using only the local resource \(P\). In contrast, iGPS does not provide the user with the full interpretation, but only the read interpretation \(\tau_{\text{read}}\). This weakens, for example, the \textit{iGPS-Write} rule in comparison with \textit{GPS-Write}, because the user cannot use \(\tau_{\text{full}}\) to show \(s' \sqsubset s''\).

The reason for this discrepancy between GPS and iGPS is that the soundness proof of GPS reasons about an entire program execution graph at once. With its bird’s-eye view of the entire execution, GPS can, for the duration of a step, assemble resources that have been transferred elsewhere in the graph to re-construct \(\tau_{\text{full}}\) for the user. The soundness of iGPS, on the other hand, is established in a simpler and more local manner, without involving reasoning about the full execution of the program. We avoid the global soundness argument of GPS and instead provide a pragmatic solution which—judging from our success in porting
GPS examples—is effectively as strong as GPS and, at the same time, makes for a very simple soundness proof: we maintain the duplicable \( \tau_{\text{read}} \) for all (past) events and can thus easily provide it to the client of \texttt{iGPS-Read}.

Essentially, the reason our rules are as effective (if not more so) than those of GPS is that GPS provides one-size-fits-all rules, which are applicable to both programs with write-write races and those without, whereas we provide special support for the common case where there are no write-write races. For programs without those races, the rules provided by GPS are quite cumbersome to use and often require additional ghost state for bookkeeping. iGPS instead supports an optimized write rule for the common case in which there are no such races, via single-writer protocols. For the remaining cases, the rather simple-minded rule \texttt{iGPS-Write} appears to suffice in all the examples we have considered thus far.

Our operational semantics for RA draws heavily on Lahav et al.’s semantics for SRA [18]—a stronger variant of RA, which is equivalent to it in the absence of write-write races. SRA was developed to provide an intuitive operational characterization which is as efficiently implementable as RA. However, as we observe here, moving to an operational characterization does not in fact require any strengthening of the RA semantics (even the slight strengthening of SRA). The main difference between the operational semantics of SRA and the one we give for RA is that writes in SRA always take a globally maximal timestamp, whereas in RA they need not do so. The canonical example demonstrating this difference is the 2+2W example (see Lahav et al. [18] for more details).

Going beyond Lahav et al., we offer the first operational account of the interaction of RA and non-atomic accesses. Our semantics corresponds to C11’s for programs that do not mix atomic RMW operations and non-atomic reads at the same location. We feel this is a reasonable restriction, given that C11’s treatment of programs mixing atomic and non-atomic accesses is already known to be problematic [2]. Our semantics does not correspond to C11’s for arbitrary programs, as evidenced by the following example:

\[
\begin{align*}
\text{cas} & (x, 0, 1) & x_{[\text{at}]} & := 2; \\
& & a & := x_{[\text{na}]}
\end{align*}
\]

C11 considers this program racy because, if the CAS succeeds, the first thread’s update of \( x \) to 1 and the second thread’s non-atomic read of \( x \) are not in a happens-before relation. In contrast, our semantics does not consider this a race because the non-atomic read is always guaranteed to read from the previous write with value 2. We find the C11 semantics for this program to be rather unintuitive, but leave a more thorough investigation of the issue to future work.

Our RA semantics may also be considered a close derivative of Kang et al.’s “promising” semantics [14], which is geared toward solving a broader problem with the full C11 model (the so-called “out-of-thin-air” problem). We look forward to using Iris to construct program logics for this promising semantics.

Acknowledgements

We would like to thank Mohit Vyas for spotting a mistake in our original proof of correspondence to C11, and Mark Batty for helpful conversations.

This research was supported in part by a European Research Council (ERC) Consolidator Grant for the project “RustBelt”, funded under the European Union’s Horizon 2020 Framework Programme (grant agreement no. 683289).
References


A Operational Semantics for RA+NA

We assume that programs do not include atomic update (RMW) to a location that is also read non-atomically.

A.1 Declarative Semantics

We define threadpool reduction that generates traces:

\[
\begin{align*}
\text{TRACE-RED-SILENT} & : TS(\pi) \rightarrow e, \text{nil} \\
\text{TRACE-RED-MEM} & : TS(\pi) \nrightarrow e, \text{nil} \\
\text{TRACE-RED-FORK} & : \langle \text{Fork}, \rho \rangle \rightarrow e, e_f \\
\end{align*}
\]

We write \( TS \Rightarrow^T TS' \) if \( TS \Rightarrow^T TS' \) for some transition label \( x \); \( TS \Rightarrow^T TS' \) if \( TS \Rightarrow^T TS' \) for some thread identifier \( \pi \); and \( TS \Rightarrow^* TS' \) if \( TS \Rightarrow^* TS' \) for some transition label \( x \) and thread identifier \( \pi \). A threadpool is called final if \( TS(\pi) \in \text{Val} \) for every \( \pi \in \text{dom}(TS) \).

**Definition 2.** A trace is a sequence of pairs \( \langle \varepsilon_1, \pi_1 \rangle, \ldots, \langle \varepsilon_n, \pi_n \rangle \). We say that \( tr = \langle \varepsilon_1, \pi_1 \rangle, \ldots, \langle \varepsilon_n, \pi_n \rangle \) is a trace of an expression \( e \) if

\[
[0 \rightarrow e] \xrightarrow{\varepsilon_1} \xrightarrow{\pi_1} \ldots \xrightarrow{\varepsilon_n} \xrightarrow{\pi_n} TS
\]

for some thread \( \pi \) and threadpool \( TS \). When \( TS \) is final, we call \( tr \) a full trace.

**Lemma 3 (Receptiveness).** Let \( \langle \varepsilon_1, \pi_1 \rangle, \ldots, \langle \varepsilon_n, \pi_n \rangle \) be a trace of an expression \( e \).

- If \( \varepsilon_n = \langle \text{Read}, \ell, v \rangle \), then for every value \( v' \), \( \langle \varepsilon_1, \pi_1 \rangle, \ldots, \langle \text{Read}, \ell, v' \rangle, \pi_n \rangle \) is a trace of \( e \).
- If \( \varepsilon_n = \langle \text{Update}, \ell, v_n, v_n' \rangle \), then for every value \( v_n' \), either \( \langle \varepsilon_1, \pi_1 \rangle, \ldots, \langle \text{Read}, \ell, v_n' \rangle, \pi_n \rangle \) is a trace of \( e \), or there exists a value \( v_n' \) such that \( \langle \varepsilon_1, \pi_1 \rangle, \ldots, \langle \text{Update}, \ell, v_n', v_n' \rangle, \pi_n \rangle \) is a trace of \( e \).

**Definition 4.** A trace \( tr = \langle \varepsilon_1, \pi_1 \rangle, \ldots, \langle \varepsilon_n, \pi_n \rangle \) induces a partial order \( \text{sb}(tr) \) (sequence-before) on \( \{1, \ldots, n\} \) that is given by:

\[
\begin{align*}
&\frac{k_1 < k_2}{\langle k_1, k_2 \rangle \in \text{sb}(tr)} \quad \frac{k_1 < k_2}{\varepsilon_{k_1} = \langle \text{Fork}, \pi_{k_2} \rangle} \\
&\frac{\langle k_1, k_2 \rangle \in \text{sb}(tr)}{\langle k_2, k_3 \rangle \in \text{sb}(tr)} \quad \frac{\langle k_1, k_3 \rangle \in \text{sb}(tr)}{\langle k_1, k_3 \rangle \in \text{sb}(tr)}
\end{align*}
\]

**Lemma 5.** Let \( tr \) be a trace of an expression \( e \). Then:

- Any prefix of \( tr \) is a trace of \( e \).
- Any permutation \( tr' \) of \( tr \) that respects \( \text{sb}(tr) \) (i.e., \( \text{sb}(tr') = \text{sb}(tr) \)) is a trace of \( e \).

**Definition 6.** An execution graph \( G \) consists of:

1. a finite set of nodes \( E \subseteq \mathbb{N} \) (called events).
2. a function \( \text{tid} \) assigning a thread identifier to every event in \( E \). We denote by \( E^\pi \) the set of events in \( a \in E \) with \( \text{tid}(a) = \pi \).
3. a function \( \text{lab} \) assigning a label to every event in \( E \). Labels take one of the following forms: \( \langle \text{Read}_a, \ell, v \rangle \), \( \langle \text{Write}_a, \ell, v \rangle \), \( \langle \text{Update}_a, \ell, v_o, v_n \rangle \), or \( \langle \text{Fork}, \pi \rangle \), where \( a \in \{ \text{at}, \text{na} \} \) is the access mode, \( \ell \) is the location accessed, and \( v, v_o, v_n \) are read/written values, and \( \pi \) is a thread identifier. The function \( \text{lab} \) naturally induces functions \( \text{typ}, \text{mod}, \text{loc}, \text{val}_a, \text{val}_r \) that return (when applicable) the type (\( \text{FORK}, \text{R}, \text{W} \) or \( \text{U} \)), access mode (in particular, \( \text{mod}(a) = \text{at} \) if \( \text{typ}(a) = \text{U} \)), location, and read/written values of an event. For \( T \in \{ \text{S, R, W, U} \} \), \( T \) denotes the set \( \{ a \in E \mid \text{typ}(a) = T \} \). We also concatenate the event sets with \( \text{s} \) and \( \text{r} \) to denote the accessed location, and superscripts for access modes (\( \text{e.g.}, \text{W}_R^T \) denotes all events \( a \in \text{W} \cup \text{U} \) where \( \text{loc}(a) = \ell \) and \( \text{mod}(a) = \text{at} \)).

4. a strict partial order \( \text{sb} \subseteq E \times E \), called sequenced-before, (a.k.a. program order).

5. a binary relation \( \text{rf} \subseteq E \times E \) called reads-from, satisfying: (i) \( a \in \text{W} \) and \( b \in \text{R} \) for every \( (a, b) \in \text{rf} \); (ii) \( \text{loc}(a) = \text{loc}(b) \) and \( \text{val}_a(a) = \text{val}_r(b) \) for every \( (a, b) \in \text{rf} \); and (iii) \( a_1 = a_2 \) whenever \( (a_1, b), (a_2, b) \in \text{rf} \).

6. a strict partial order \( \text{mo} \) on \( \text{W}_U \), called modification order, which is a disjoint union of relations \( \{ \text{mod}_E \}_{E \in \text{Loc}_C} \), such that each \( \text{mod}_E \) is a strict total order on \( \text{W}_U \).

In what follows, to resolve ambiguities, we may include a prefix “\( G \)” to refer to the components of an execution graph \( G \).

\textbf{Definition 7}. An execution graph \( G \) follows a trace \( tr = \langle \varepsilon_1, \pi_1 \rangle, ..., \langle \varepsilon_n, \pi_n \rangle \) if the following hold:

- \( E \) consists of \( n \) events \( a_1, ..., a_n \) with \( \text{lab}(a_k) = \varepsilon_k \) and \( \text{tid}(a_k) = \pi_k \) for every \( 1 \leq k \leq n \).
- \( \text{sb} = \{ \langle a_k, a_k \rangle \mid (k_1, k_2) \in \text{sb}(tr) \} \).

We call \( G \) an execution graph of an expression \( e \) if \( G \) follows some trace of \( e \).

Note that \( \text{rf} \) and \( \text{mo} \) are not restricted.

\textbf{Definition 8}. Given an execution graph \( G \), the relations \( \text{sw} \) (synchronization) and \( \text{hb} \) (happens-before) are given by:

\[ \text{sw} = \{ (a, b) \in \text{rf} \mid \text{mod}(a) = \text{mod}(b) = \text{at} \} \quad \text{hb} = (\text{sb} \cup \text{sw})^+ \]

\textbf{Definition 9 (Consistency)}. An execution graph \( G \) is consistent if the following hold:

- For every \( b \in \text{RU} \), if there exists some \( a \in \text{W}_\text{loc}(b) \) such that \( (a, b) \in \text{hb}^+ \); \( \text{sb} \).
- \( \text{sb} \cup \text{rf} \) is acyclic.
- \( \text{rf} \) \( \text{hb} \) is irreflexive.

\textbf{Definition 10}. A trace \( tr \) is allowed for an expression \( e \) if \( tr \) is a trace of \( e \) and there exists a consistent execution graph \( G \) that follows \( tr \).

\textbf{Definition 11 (Conflict)}. Two events \( a \) and \( b \) are called conflicting in an execution graph \( G \) if \( a, b \in E \), \( \{\text{typ}(a), \text{typ}(b)\} \cap \{\text{W}, \text{U}\} \neq \emptyset \), \( a \neq b \), and \( \text{loc}(a) = \text{loc}(b) \).

\textbf{Definition 12 (Races)}. A pair \( (a, b) \) is called a race in \( G \) if \( a \) and \( b \) are conflicting events in \( G \), and \( (a, b) \not\in \text{hb} \cup \text{hb}^{-1} \). An execution graph \( G \) is called racey if there is some race \( (a, b) \) in \( G \) with \( a \) \( \text{na} \in \{ \text{mod}(a), \text{mod}(b) \} \).

\textbf{Definition 13 (Initialization)}. A location \( \ell \) is called uninitialized in an execution graph \( G \) if there exists some \( a \in \text{RU} \) such that \( (a, b) \not\in \text{hb} \); \( \text{sb} \) for every \( a \in \text{W}_U \). An execution graph \( G \) is called uninitialized if some location is uninitialized in \( G \).
Definition 14 (Buggy expression). An expression $e$ is called buggy if some consistent execution graph of $e$ is either racy or uninitialized.

The following definition and lemma are useful in the sequel.

Definition 15. Given an execution graph $G$, we denote by $\text{mo}^{\text{na}}$ the relation $\text{mo} \cap ((\text{w}^{\text{na}} \times E) \cup (E \times \text{w}^{\text{na}}))$.

Lemma 16. Let $G$ be a non-racy consistent execution graph. Then, $\text{hb} = (\text{sb} \cup \text{rf} \cup \text{mo}^{\text{na}})^+$.

Proof. Since $G$ is not racy, we have $\text{rf} \cup \text{mo}^{\text{na}} \subseteq \text{hb} \cup \text{hb}^{-1}$. Since $G$ is consistent, $\text{rf}; \text{hb}$ and $\text{mo}^{\text{na}}; \text{hb}$ are irreflexive, and so $\text{rf} \cup \text{mo}^{\text{na}} \subseteq \text{hb}$. It easily follows that $\text{hb} = (\text{sb} \cup \text{rf} \cup \text{mo}^{\text{na}})^+$. □

A.1.1 Alternative declarative semantics

Next, we present an alternative declarative semantics in which $\text{hb}$ is substituted by $(\text{sb} \cup \text{rf})^+$, and races require stronger conditions as defined below. This semantics simplifies our proof below of the equivalence to the operational semantics.

Definition 17 (Strong consistency). Strong consistency is defined exactly as consistency with $(\text{sb} \cup \text{rf})^+$ instead of $\text{hb}$.

Observing that $\text{hb} \subseteq (\text{sb} \cup \text{rf})^+$ in every execution graph, it is easy to see that strong consistency implies consistency. In addition, strong consistency can be simplified as follows:

Lemma 18. An execution graph $G$ is strongly consistent iff the following hold:

- For every $b \in \text{RU}$, if there exists some $a \in \text{W}_{\text{loc}(b)}$ such that $\langle a, b \rangle \in (\text{sb} \cup \text{rf})^+; \text{sb}$, then $c, b \in \text{rf}$ for some $c \in E$.
- $\text{sb} \cup \text{rf}$ is acyclic.
- $\text{mo}; (\text{sb} \cup \text{rf})^+$ is irreflexive.
- $(\text{rf}; [\emptyset]) \cap (\text{mo}; \text{mo}) = \emptyset$.

Definition 19 (Strongly racy). An execution graph $G$ is called strongly racy if there exist $a, b \in E$ such that the following hold:

- $a \neq b$.
- $\text{loc}(a) = \text{loc}(b)$.
- $\langle a, b \rangle \notin \text{mo}^2; (\text{sb} \cup \text{rf})^+$ and $\langle b, a \rangle \notin (\text{sb} \cup \text{rf})^+$.
- Either $a \in \text{w}^{\text{na}}$ and $b \in \text{w}^{\text{na}}$ or $a \in \text{w}^{\text{na}}$.

Definition 20 (Strong initialization). A location $\ell$ is called strongly uninitialized in an execution graph $G$ if there exists some $b \in \text{RU}_{\ell}$ such that $\langle a, b \rangle \notin (\text{sb} \cup \text{rf})^+; \text{sb}$ for every $a \in \text{W}_{\ell}$. An execution graph $G$ is called strongly uninitialized if some location is strongly uninitialized in $G$.

Notation 1. Let $G$ be an execution graph. For a set $E \subseteq E$, we denote by $G \cap E$ the execution graph $G'$ given by: $G'.E = E$, $G'.\text{tid} = G.\text{tid}|_{E}$, $G'.\text{lab} = G.\text{lab}|_{E}$, $G'.\text{sb} = G.\text{sb} \cap (E \times E)$, $G'.\text{rf} = G.\text{rf} \cap (E \times E)$, and $G'.\text{mo} = G.\text{mo} \cap (E \times E)$. Given $a \in E$, we write $G \setminus \{a\}$ for $G \cap (E \setminus \{a\})$. Given $R \subseteq E \times E$, by $R$-prefix of $G$ we mean any execution graph $G'$ such that $G' = G \cap E$ for some set $E$ satisfying $a \in E$ whenever $b \in E$ and $\langle a, b \rangle \in R$.

Lemma 21 (Adding writes). Let $G$ be an execution graph and $a \in \text{W}$ be an $(\text{sb} \cup \text{rf})^+$-maximal event in $E$. Suppose that $G \setminus \{a\}$ is strongly consistent. Let $G'$ be the execution graph that is obtained from $G$ by making $a$ an $\text{mo}$-maximal event. Then, $G'$ is strongly consistent.
Lemma 22 (Adding reads). Let $G$ be an execution graph and $a \in R$ be an $(sb \cup rf)^+$-maximal event in $E$. Suppose that $G \setminus \{a\}$ is strongly consistent, and that $(b, a) \in (sb \cup rf)^+; sb$ for some $b \in \calW_{loc(a)}$. Then, there exists a strongly consistent execution graph $G'$ that satisfies the following:

- $G'.E = G.E$, $G'.\text{tid} = G.\text{tid}$, $G'.sb = G.sb$, and $G'.mo = G.mo$.
- $G'.\text{lab}(b) = G.\text{lab}(b)$ for every $b \in G.E \setminus \{a\}$ and $G'.\text{lab}(a) = \langle \text{Read}_{G.\text{mod}(a)}, G.\text{loc}(a), v \rangle$ for some value $v$.
- $G'.rf \cap (G.E \times (G.E \setminus \{a\})) = G.rf \cap (G.E \times (G.E \setminus \{a\}))$.
- $G'.rf \cap (G.E \times \{a\}) \subseteq ((G'.sb \cup G'.rf)^+; G'.sb) \cup (G.WU^* \times \{a\})$.
- $G'.mo \cap ((G.E \setminus \{a\}) \times (G.E \setminus \{a\})) = G.mo \cap ((G.E \setminus \{a\}) \times (G.E \setminus \{a\}))$.

Proof. Let $B = \{b \in \calW_{loc(a)} \mid (b, a) \in (sb \cup rf)^+; sb\}$, and let $b = \max_{\text{mo}} B$. Let $G'$ be the execution graph obtained from $G$ by replacing the value read at $a$ by val$_a(b)$, and adding the $rf$-edge $(b, a)$ (instead of the $rf$-edge entering $a$ in $G$, if such exists). It is straightforward to see that $G'$ is a strongly consistent.

Lemma 23 (Adding updates). Let $G$ be an execution graph and $a \in U$ be an $(sb \cup rf)^+$-maximal event in $E$. Suppose that $G \setminus \{a\}$ is strongly consistent, and that $(b, a) \in (sb \cup rf)^+; sb$ for some $b \in \calW_{loc(a)}$. Then, there exists a strongly consistent execution graph $G'$ that satisfies the following:

- $G'.E = G.E$, $G'.\text{tid} = G.\text{tid}$, $G'.sb = G.sb$.
- $G'.\text{lab}(b) = G.\text{lab}(b)$ for every $b \in G.E \setminus \{a\}$ and $G'.\text{lab}(a) = \langle \text{Update}, G.\text{loc}(a), v, v' \rangle$ for some values $v, v'$.
- $G'.rf \cap (G.E \times (G.E \setminus \{a\})) = G.rf \cap (G.E \times (G.E \setminus \{a\}))$.
- $G'.rf \cap (G.E \times \{a\}) \subseteq ((G'.sb \cup G'.rf)^+; G'.sb) \cup (G.WU^* \times \{a\})$.
- $G'.mo \cap ((G.E \setminus \{a\}) \times (G.E \setminus \{a\})) = G.mo \cap ((G.E \setminus \{a\}) \times (G.E \setminus \{a\}))$.

Proof. Let $B = (\calW^*_{loc(a)} \setminus \{a\}) \cup \{b \in \calW^*_{loc(a)} \mid (b, a) \in (sb \cup rf)^+; sb\}$, where $\ell = \text{loc}(a)$. Let $b = \max_{\text{mo}} B$, and let $G'$ be the execution graph obtained from $G$ by (i) replacing the value read at $a$ by val$_a(b)$; (ii) adding the $rf$-edge $(b, a)$ (instead of the $rf$-edge entering $a$ in $G$, if such exists); and (iii) making $a$ to be the mo-successor of $b$. It is straightforward to see that $G'$ is a strongly consistent.

Lemma 24 (Modifying updates). Let $G$ be a strongly consistent execution graph, and let $a \in U$ be an $(sb \cup rf)^+$-maximal event in $E$. Then, any execution graph $G'$ that is obtained from $G$ by changing the value written at $a$, or by making $a$ a read event, is strongly consistent.

Lemma 25. Let $G$ be a racy consistent execution graph of an expression $e$. Then, some strongly consistent execution graph of $e$ is either strongly racy or strongly uninitialized.

Proof. Let $a_1, \ldots, a_n$ be an enumeration of $E$ that respects $sb \cup rf$, and let $k$ be the minimal index such that $G \cap \{a_1, \ldots, a_k\}$ is racy.

First, we claim that for every event $a$, if $(b, a) \in (sb \cup rf \cup mo)^+; sb$, then $(b, a) \in \calH$. Indeed, suppose that $(b, a) \in (sb \cup rf \cup mo)^+; sb$. Let $G' = G \cap \{a_1, \ldots, a_{k-1}\}$. The minimality of $k$ ensures that $G'$ is not racy, and by Lemma 16, it follows that $(b, a) \in G.hb^*; G.sb \subseteq G.hb$.

Next, let $j$ be the minimal index such that $(a_j, a_k)$ is a race in $G$ and $na \in \{\text{mod}(a_j), \text{mod}(a_k)\}$. Let $E = \{a \in E \mid (a, a_k) \in (sb \cup rf)^* \lor (a, a_j) \in (sb \cup rf)^*\}$, and let $G_0 = G \cap E$. Note that $a_j$ has no outgoing $sb \cup rf$-edges in $G_0$ except (possibly) an $rf$-edge to $a_k$ (otherwise, $(a_j, a_k) \in (G.sb \cup G.rf)^*; G.sb$, and so $(a_j, a_k) \in G.hb$, which contradicts the fact that $(a_j, a_k)$ is a race in $G$). It is also easy to verify that $G_0$ is a strongly consistent execution graph of $e$. If $G_0$ is strongly uninitialized, then we are done. Otherwise, if $a_k \in G_0.\text{RU}$ then,
we have \((b,a_k) \in (G_0.\text{sb} \cup G_0.\text{rf})^*; G_0.\text{sb}\) for some \(b \in G_0.\text{WU}_{\text{loc}(a_k)}\). Consider the possible cases:

- \(a_k \in \mathbb{R}\): Then, we must have \(a_j \in \mathbb{W}\). Let \(G'\) be a strongly consistent execution graph that satisfies the properties mentioned in Lemma 22 applied on \(G_0\) and \(a_k\). The receptiveness property ensures that \(G'\) is an execution graph of \(e\). Let \(G''\) be a strongly consistent execution graph that satisfies the properties mentioned in Lemma 21 applied on \(G'\) and \(a_j\). Since \(G''\cdot \text{E} = G'\cdot \text{E}, G''\cdot \text{lab} = G'\cdot \text{lab}\) and \(G''\cdot \text{sb} = G'\cdot \text{sb}\), we have that \(G''\) is an execution graph of \(e\) as well. It follows that \(\langle a_j, a_k \rangle \notin G''\cdot \text{mo}^*; (G''\cdot \text{sb} \cup G''\cdot \text{rf})^*\), and \(\langle a_k, a_j \rangle \notin (G''\cdot \text{sb} \cup G''\cdot \text{rf})^+\). In addition, since \(\langle a_j, a_k \rangle\) is a race in \(G\), we have \(G''\cdot \text{loc}(a_j) = G''\cdot \text{loc}(a_k)\). Hence, \(G''\) is a strongly racy strongly consistent execution graph of \(e\).

- \(a_k \in \mathbb{W}\): Let \(G'\) be a strongly consistent execution graph that satisfies the properties mentioned in Lemma 21 applied on \(G_0\) and \(a_k\). As in the previous case, \(G'\) is an execution graph of \(e\). It follows that \(\langle a_k, a_j \rangle \notin G'\cdot \text{mo}^*; (G'\cdot \text{sb} \cup G'\cdot \text{rf})^*\), and \(\langle a_j, a_k \rangle \notin (G'\cdot \text{sb} \cup G'\cdot \text{rf})^+\). In addition, since \(\langle a_j, a_k \rangle\) is a race in \(G\), we have \(G'\cdot \text{loc}(a_j) = G'\cdot \text{loc}(a_k)\). Hence, \(G'\) is a strongly racy strongly consistent execution graph of \(e\).

- \(a_k \in \mathbb{U}\): Then, we must have \(a_j \in \mathbb{W}^\text{na}\). Let \(G'\) be a strongly consistent execution graph that satisfies the properties mentioned in Lemma 23 applied on \(G_0\) and \(a_k\). The receptiveness property ensures that some execution graph \(G''\) obtained from \(G'\) by possibly making \(a_k\) a read event is an execution graph of \(e\). By Lemma 24, \(G''\) is also strongly consistent. Note also that we do not have \(\langle a_j, a_k \rangle \in G'\cdot \text{rf}\). Let \(G''\) be a strongly consistent execution graph that satisfies the properties mentioned in Lemma 21 applied on \(G''\) and \(a_j\). Since \(G''\cdot \text{E} = G''\cdot \text{E}, G''\cdot \text{lab} = G''\cdot \text{lab}\) and \(G''\cdot \text{sb} = G''\cdot \text{sb}\), we have that \(G''\) is an execution graph of \(e\) as well. It follows that \(\langle a_j, a_k \rangle \notin G''\cdot \text{mo}^*; (G''\cdot \text{sb} \cup G''\cdot \text{rf})^*\), and \(\langle a_k, a_j \rangle \notin (G''\cdot \text{sb} \cup G''\cdot \text{rf})^+\). In addition, since \(\langle a_j, a_k \rangle\) is a race in \(G\), we have \(G''\cdot \text{loc}(a_j) = G''\cdot \text{loc}(a_k)\). Hence, \(G''\) is a strongly racy strongly consistent execution graph of \(e\).

\[\square\]

**Lemma 26.** Let \(G\) be an uninitialized consistent execution graph of an expression \(e\). Then, some strongly consistent execution graph of \(e\) is either strongly racy or strongly uninitialized.

**Proof.** Let \(\ell\) be an uninitialized location in \(G\), and let \(b \in \text{R \ell}\) such that \(\langle a, b \rangle \notin \mathbb{hb}^*; \text{sb}\) for every \(a \in \text{WU}_\ell\). If \(G\) is racy, then we are done by Lemma 25. Otherwise, by Lemma 16, we have \(\mathbb{hb} = (\mathbb{sb} \cup \mathbb{rf} \cup \text{mo}^\text{na})^+\). Hence, \(\langle a, b \rangle \notin (\mathbb{sb} \cup \mathbb{rf})^*; \text{sb}\) for every \(a \in \text{WU}_\ell\), and so \(G\) is strongly uninitialized.

\[\square\]

**Corollary 27.** Let \(e\) be an expression. Then:

- If \(e\) is not buggy, then a trace \(tr\) is allowed for \(e\) iff \(tr\) is a trace of \(e\) and there exists a strongly consistent execution graph \(G\) that follows \(tr\).

- \(e\) is buggy iff some strongly consistent execution graph of \(e\) is either strongly racy or strongly uninitialized.

**Proof.** Easily follows from our definitions, Lemma 25, Lemma 26, and Lemma 16:

- Suppose that \(e\) is not buggy. If \(tr\) is a trace of \(e\), and \(G\) is a strongly consistent execution graph that follows \(tr\), then \(G\) is also consistent, and, by definition, \(tr\) is a trace that is allowed for \(e\). For the converse, let \(tr\) be a trace that is allowed for \(e\). Then, by definition, \(tr\) is a trace of \(e\) and there exists a consistent execution graph \(G\) that follows \(tr\). Since \(e\) is not buggy, \(G\) is not racy. By Lemma 16, we have \((G\cdot \text{sb} \cup G\cdot \text{rf})^+ \subseteq G\cdot \mathbb{hb}\), and hence \(G\) is strongly consistent.
If some strongly consistent execution graph \( G \) of \( e \) is either strongly racy or strongly uninitialized, then \( G \) is also a consistent execution graph of \( e \) which is either racy or uninitialized, and hence \( e \) is buggy. For the converse, suppose that \( e \) is buggy. By definition, there exists a consistent execution graph \( G \) of \( e \) that is either racy or uninitialized. Then, by Lemma 25 and Lemma 26, some strongly consistent execution graph of \( e \) is either strongly racy or strongly uninitialized.

**Lemma 28.** Let \( G \) be a strongly consistent execution graph that is not strongly racy. Then, \( \text{mo}^{\text{na}} \subseteq (\text{sb} \cup \text{rf})^+ \).

**Proof.** Let \( \langle a, b \rangle \in \text{mo}^{\text{na}} \). First, if \( b \in W^{\text{na}} \), then since \( G \) is not strongly racy, and \( \text{mo}; (\text{sb} \cup \text{rf})^+ \) is irreflexive, we must have \( \langle a, b \rangle \in (\text{sb} \cup \text{rf})^+ \). Otherwise, we have \( a \in W^{\text{na}} \). Let \( b_1, ..., b_m \) be an enumeration of \( E \) that respects \( \text{sb} \cup \text{rf} \). We prove by induction on \( k \) that \( \langle a, b_k \rangle \in \text{mo}^{\text{na}} \) implies \( \langle a, b_k \rangle \in (\text{sb} \cup \text{rf})^+ \). Suppose the claim hold for all \( k < m \), then we prove it for \( m \). Assume that \( \langle a, b_m \rangle \in \text{mo}^{\text{na}} \). Since \( G \) is not strongly racy and \( \text{mo}; (\text{sb} \cup \text{rf})^+ \) is irreflexive, we have \( \langle a, b_m \rangle \in \text{mo}; (\text{sb} \cup \text{rf})^+ \). Let \( b_i \in E \) such that \( \langle a, b_i \rangle \in \text{mo}^+ \) and \( \{b_i, b_m\} \in (\text{sb} \cup \text{rf})^+ \). If \( a = b_i \), then we are done. Otherwise, by the induction hypothesis, \( \langle a, b_i \rangle \in (\text{sb} \cup \text{rf})^+ \), and again it follows that \( \langle a, b_m \rangle \in (\text{sb} \cup \text{rf})^+ \).

**A.2 First operational semantics**

We present an operational semantics, equivalent to the declarative RA+NA semantics, in the form of an abstract machine whose states are execution graphs, incrementally growing during the run. The initial execution graph, denoted \( G_0 \), is the empty execution graph.

**Notation 2.** For two execution graphs \( G \) and \( G' \), thread identifiers \( \pi, \rho \), and a label \( \varepsilon \), we write \( G' \in \text{Add}(G, \pi, \rho, \varepsilon) \) if there exists an event \( a \) such that the following hold:

- \( G'.E = G.E \cup \{a\} \), \( G'.t \!	ext{id} = G.t \!	ext{id} \cup \{a \mapsto \rho\} \), and \( G'.l \!	ext{ab} = G.l \!	ext{ab} \cup \{a \mapsto \varepsilon\} \).
- \( G'.\text{sb} = (G.\text{sb} \cup (G.E^* \times \{a\}))^+ \).
- \( G'.\text{rf} \supseteq G.\text{rf} \).
- \( G'.\text{mo} \supseteq G.\text{mo} \) and if \( \varepsilon = (\text{Write}_{\text{na}}, -,-) \), then \( a \) is \( G'.\text{mo}\)-maximal.

The machine steps are based on execution graph steps:

\[
\varepsilon \in \{\langle \text{Read}_{\alpha}, \ell, v \rangle, \langle \text{Write}_{\alpha}, \ell, v \rangle, \langle \text{Update}, \ell, v, v_n \rangle\} \\
\forall a \in W^{\text{na}}. \exists b \in E^*. \langle a, b \rangle \in \text{mo}^+; (\text{sb} \cup \text{rf})^+ \\
\varepsilon = \langle \text{Read}_{\text{na}}, \ell, v \rangle \Rightarrow \forall a \in W^{\text{na}}, \exists b \in E^*. \langle a, b \rangle \in \text{mo}^+; (\text{sb} \cup \text{rf})^+ \\
\varepsilon \in \{\langle \text{Read}_{\alpha}, \ell, v \rangle, \langle \text{Update}, \ell, v, v_n \rangle\} \Rightarrow \exists a \in W^{\text{na}}, \exists b \in E^*. \langle a, b \rangle \in (\text{sb} \cup \text{rf})^+ \\
G' \in \text{Add}(G, \pi, \rho, \varepsilon) \quad G' \text{ is strongly consistent} \\
G \xrightarrow{(\text{Fork}, \rho)} G' \\
G \xrightarrow{\text{Race-I}} G' \\
G \xrightarrow{\text{Race-II}} G' \\
G \xrightarrow{\text{UNINITIALIZED}} G'
\]

\[
G' \in \text{Add}(G, \pi, \rho, \langle \text{Fork}, \rho \rangle) \\
G \xrightarrow{(\text{Fork}, \rho)} G' \\
G \xrightarrow{\text{Race-I}} G' \\
G \xrightarrow{\text{Race-II}} G' \\
G \xrightarrow{\text{UNINITIALIZED}} G'
\]
The machine steps are combined with expression reductions and lifted to thread-pool reductions exactly as in Figure 4 (using execution graphs $G$ instead of physical states $\sigma$), defining reductions of the form $G; TS \rightarrow_\pi G'; TS'$. As before, we write $G; TS \rightarrow_\pi x G', TS'$ if $G, TS \rightarrow_\pi G', TS'$ for some transition label $x$; $G; TS \overset{\sigma}{\rightarrow} G', TS'$ if $G, TS \rightarrow_\pi G', TS'$ for some thread identifier $\sigma$; and $G, TS \rightarrow G', TS'$ if $G, TS \rightarrow_\pi G', TS'$ for some transition label $x$ and thread identifier $\pi$.

Lemma 29. Suppose that $G_0 \overset{\varepsilon_1}{\rightarrow}_{\pi_1} \ldots \overset{\varepsilon_n}{\rightarrow}_{\pi_n} G$ for some trace $tr = (\varepsilon_1, \pi_1), \ldots, (\varepsilon_n, \pi_n)$. Then $G$ is a strongly consistent execution graph that follows $tr$.

Proof. Easily follows from our definitions (note that strong consistency is trivially preserved in fork steps).

Lemma 30. Let $G$ be a strongly consistent execution graph of an expression $e$. Suppose that $G$ is not strongly racy or strongly uninitialized. Then, there exists a trace $(\varepsilon_1, \pi_1), \ldots, (\varepsilon_n, \pi_n)$ of $e$, such that $G_0 \overset{\varepsilon_1}{\rightarrow}_{\pi_1} \ldots \overset{\varepsilon_n}{\rightarrow}_{\pi_n} G$.

Proof. By Lemma 28, $G, sa \cup G, rf \cup G, mo^u$ is acyclic. Let $a_1, \ldots, a_n$ be an enumeration of $G, E$ that respects $(G, sa \cup G, rf \cup G, mo^u)^\ast$. For every $1 \leq i \leq n$, let $\pi_i = tid(a_i)$ and $\varepsilon_i = lab(a_i)$. Then, $(\varepsilon_1, \pi_1), \ldots, (\varepsilon_n, \pi_n)$ is a trace of $e$, and it is easy to see that by following this enumeration, i.e., adding $a_i$ in step $i$ for every $1 \leq i \leq n$, we obtain that $G_0 \overset{\varepsilon_1}{\rightarrow}_{\pi_1} \ldots \overset{\varepsilon_n}{\rightarrow}_{\pi_n} G$. In particular, note that the fact that $G$ is strongly consistent entails that any $(G, sa \cup G, rf)^\ast$-prefix of $G$ is also strongly consistent.

Theorem 31. Let $e$ be a non-buggy expression. Then, a trace $tr$ is allowed for $e$ iff $tr$ has a permutation $(\varepsilon_1, \pi_1), \ldots, (\varepsilon_n, \pi_n)$ that respects $sb(tr)$, and $G_0; [0 \mapsto e] \overset{\varepsilon_1}{\rightarrow}_{\pi_1} \ldots \overset{\varepsilon_n}{\rightarrow}_{\pi_n} G; TS$ for some execution graph $G$ and threadpool $TS$.

Proof. Follows from our definitions Cor. 27, Lemma 29 and Lemma 30:

$(\Rightarrow)$ Suppose that a trace $tr$ is allowed for $e$. By Cor. 27, $tr$ is a trace of $e$ and there exists a strongly consistent execution graph $G$ that follows $tr$. Furthermore, Cor. 27 entails that $G$ is not strongly racy or strongly uninitialized. By Lemma 30, there exists a trace $tr' = (\varepsilon_1, \pi_1), \ldots, (\varepsilon_n, \pi_n)$ of $e$, such that $G_0 \overset{\varepsilon_1}{\rightarrow}_{\pi_1} \ldots \overset{\varepsilon_n}{\rightarrow}_{\pi_n} G$. By definition, $tr'$ must be a permutation of $tr$ that respects $sb(tr)$.

$(\Leftarrow)$ Suppose that $tr$ has a permutation $tr' = (\varepsilon_1, \pi_1), \ldots, (\varepsilon_n, \pi_n)$ that respects $sb(tr)$, and $G_0; [0 \mapsto e] \overset{\varepsilon_1}{\rightarrow}_{\pi_1} \ldots \overset{\varepsilon_n}{\rightarrow}_{\pi_n} G; TS$ for some execution graph $G$ and threadpool $TS$. By definition, $tr'$ is a trace of $e$. By Lemma 29, $G$ is a strongly consistent execution graph that follows $tr'$. Since $tr'$ is a permutation of $tr$ that respects $sb(tr)$, $G$ follows $tr$, and by Lemma 5, $tr$ is a trace of $e$. Hence, $tr$ is allowed for $e$.

Theorem 32. An expression $e$ is buggy iff $G_0; [0 \mapsto e] \rightarrow_{\text{race}}^* \bot$ or $G_0; [0 \mapsto e] \rightarrow_{\text{uninit}}^* \bot$.

Proof.

$(\Rightarrow)$ Suppose that $e$ is buggy. By Cor. 27, some strongly consistent execution graph $G$ of $e$ is strongly racy or strongly uninitialized. Let $a_1, \ldots, a_n$ be an enumeration of $G, E$ that respects $sa \cup rf$. Let $k$ be the minimal index such that $G \cap \{a_1, \ldots, a_k\}$ is strongly racy or strongly uninitialized. Consider the two cases:

- $G \cap \{a_1, \ldots, a_k\}$ is strongly racy. Let $j < k$ be the minimal index such that $loc(a_k) = loc(a_j)$ and one of the following hold:
1. \( \langle a_k, a_j \rangle \notin \mathbb{m}_G^\uparrow \); \( \mathbb{m} \cup \mathbb{r} \) and \( \langle a_j, a_k \rangle \notin (\mathbb{s} \cup \mathbb{r})^\uparrow \), and either \( a_k \in \mathbb{W}^R \) and \( a_j \in \mathbb{R}^R \) or \( a_k \in \mathbb{W}^R \).

2. \( \langle a_j, a_k \rangle \notin \mathbb{m}_G^\uparrow \); \( \mathbb{s} \cup \mathbb{r} \) and \( \langle a_k, a_j \rangle \notin (\mathbb{s} \cup \mathbb{r})^\uparrow \), and either \( a_j \in \mathbb{W}^R \) and \( a_k \in \mathbb{R}^R \) or \( a_j \in \mathbb{W}^R \).

Distinguish the following cases:

- \( a_k \in \mathbb{W}^R \) and \( \langle a_k, a_j \rangle \notin \mathbb{m}_G^\uparrow \); \( \mathbb{s} \cup \mathbb{r} \):

  Let \( B = \{ a \in \mathbb{E} \mid \langle a, a_j \rangle \in (\mathbb{s} \cup \mathbb{r})^\uparrow \} \), and let \( G' = G \cap B \). Then, \( G' \) is strongly consistent, not strongly racy, and not strongly uninitialized, and by Lemma 30, \( G'_0 \) for some trace \( \langle \pi_1, \ldots, \pi_n \rangle \) \( \in \mathbb{E}_G \) of \( e \). Since \( \langle a_k, a_j \rangle \notin \mathbb{m}_G^\uparrow \); \( \mathbb{s} \cup \mathbb{r} \), we have that \( \langle a_k, b \rangle \notin \mathbb{m}_G^\uparrow \); \( \mathbb{s} \cup \mathbb{r} \) for every \( b \in G'_0 \mathcal{E} \). Since \( a_k \in G'_0 \mathbb{W}_G^R \), we have \( G' \mathbb{I}^{\text{lab}(a_j), \text{tid}(a_j)} \) \( \in \mathbb{E}_G \). Furthermore, it is easy to see that \( \langle \pi_1, \ldots, \pi_n, \langle \text{lab}(a_j), \text{tid}(a_j) \rangle \rangle \) is also a trace of \( e \), and so \( G_0 ; [0 \Rightarrow e] \Rightarrow \ast \).

- \( a_j \in \mathbb{W}^R \) and \( \langle a_j, a_k \rangle \notin \mathbb{m}_G^\uparrow \); \( \mathbb{s} \cup \mathbb{r} \):

  This case is handled symmetrically.

- \( a_k \in \mathbb{W}^R \) and \( a_j \in \mathbb{W}^R \) and \( \langle a_k, a_j \rangle \notin \mathbb{m}_G^\uparrow \); \( \mathbb{s} \cup \mathbb{r} \):

  Let \( B = \{ a \in \mathbb{E} \mid \langle a, a_j \rangle \in (\mathbb{s} \cup \mathbb{r})^\uparrow \} \), and let \( G' = G \cap B \). Then, \( G' \) is strongly consistent, not strongly racy, and not strongly uninitialized, and by Lemma 30, \( G'_0 \mathcal{E} \) for some trace \( \langle \pi_1, \ldots, \pi_n \rangle \) \( \in \mathbb{E}_G \) of \( e \). Since \( \langle a_k, a_j \rangle \notin \mathbb{m}_G^\uparrow \); \( \mathbb{s} \cup \mathbb{r} \), we have that \( \langle a_k, b \rangle \notin \mathbb{m}_G^\uparrow \); \( \mathbb{s} \cup \mathbb{r} \) for every \( b \in G'_0 \mathcal{E} \). Since \( a_k \in G'_0 \mathbb{W}_G^R \), we have \( G' \mathbb{I}^{\text{lab}(a_j), \text{tid}(a_j)} \) \( \in \mathbb{E}_G \). Furthermore, it is easy to see that \( \langle \pi_1, \ldots, \pi_n, \langle \text{lab}(a_j), \text{tid}(a_j) \rangle \rangle \) is also a trace of \( e \), and so \( G_0 ; [0 \Rightarrow e] \Rightarrow \ast \).

(\( \Rightarrow \)) Suppose that \( G_0 ; [0 \Rightarrow e] \Rightarrow \ast \) \( \mathbb{E}_G \) or \( G_0 ; [0 \Rightarrow e] \Rightarrow \ast \). Then, there exists a trace \( \langle \pi_1, \ldots, \pi_n \rangle \) of \( e \) such that \( G_0 \mathcal{E} \) \( \Rightarrow \ast \) \( \in \mathbb{E}_G \). Let \( G_{n-1} \) be an execution of \( e \) in \( G' \). Clearly, \( G' \) is an execution of \( e \). Using Lemma 22, as well as the receptiveness property of traces, we can obtain an execution \( G''_n \) of \( e \) that is consistent, and satisfies \( \langle a, b \rangle \notin G''_n \mathbb{I}^\uparrow \). Since \( \mathbb{m}(b) = na \), it follows that \( G''_n \) is racy.

(\( \Leftarrow \)) Suppose that \( G_0 ; [0 \Rightarrow e] \Rightarrow \ast \) \( \mathbb{E}_G \) or \( G_0 ; [0 \Rightarrow e] \Rightarrow \ast \). Then, there exists a trace \( \langle \pi_1, \ldots, \pi_n \rangle \) of \( e \) such that \( G_0 \mathcal{E} \) \( \Rightarrow \ast \) \( \in \mathbb{E}_G \). Let \( G' \) be an execution of \( e \) in \( G' \). Clearly, \( G' \) is an execution of \( e \). Using Lemma 21, Lemma 22, and Lemma 23, as well as the receptiveness property of traces, and
Lemma 24 in case $\varepsilon_n$ is an update, we can obtain an execution $G''$ of $e$ that is consistent, and satisfies $(a, b) \notin G''.hb$. Since $\text{mod}(a) = na$, it follows that $G''$ is racy.

$\varepsilon_n = \{\langle \text{Read}_a, \ell, v \rangle, \langle \text{Update}, \ell, v_n, v_n \rangle\}$, and there do not exist $a \in G_{n-1}.\text{WU}$ and $b \in G_{n-1.\varepsilon'}$ such that $(a, b) \notin (G_{n-1.\text{ab}} \cup G_{n-1.\text{rf}})^*$. Let $G' \in \text{Add}(G_{n-1}, \pi_n, \pi_n, \varepsilon_n)$ with $G'.\text{rf} = G.\text{rf}$. Clearly, $G'$ is an execution of $e$, and it is easy to see that it is consistent and uninitialized.

### A.3 Timestamp machine

Next, we show that the timestamp machine presented in §2 is equivalent to the operational machine above.

**Definition 33.** A timestamp assignment for an execution graph $G$ is a function $ts : \text{WU} \to \text{Time}$, that satisfies $ts(a) < ts(b)$ whenever $(a, b) \in \text{mo}$. Given a timestamp assignment $ts$ for $G$ and an event $a \in \text{WU} \cup \text{U}$, the view of an event $a$ in $G$ according to $ts$, denoted $\text{view}(a, G, ts)$, and the message induced by $a$ in $G$ according to $ts$, denoted $\text{msg}(a, G, ts)$, are given by:

\[
\text{view}(a, G, ts) = \lambda \ell. \max\{ts(b) \mid b \in \text{WU}_\ell, (b, a) \in (\text{ab} \cup \text{rf})^*\}
\]

\[
\text{msg}(a, G, ts) = (\lambda c(a), \lambda x(a), ts(a), \text{view}(a, G, ts))
\]

If $\{ts(b) \mid b \in \text{WU}_\ell, (b, a) \in (\text{ab} \cup \text{rf})^*\}$ is empty, we take $\text{view}(a, G, ts)(\ell) = \bot$ (undefined).

**Definition 34.** Let $G$ be an execution graph and $ts$ be a timestamp assignment $ts$ for $G$. The physical state $(M_G^t, T_G^t, N_G^a)$ is given by:

\[
\begin{align*}
M_G^t &= \{\text{msg}(a, G, ts) \mid a \in \text{WU}\} \\
T_G^t &= \lambda \pi. \lambda \ell. \max\{ts(a) \mid a \in \text{WU}_\ell, \exists b \in E, (a, b) \in (\text{ab} \cup \text{rf})^*\} \quad \text{if} \quad \{ts(a) \mid a \in \text{WU}_\ell, \exists b \in E, (a, b) \in (\text{ab} \cup \text{rf})^*\} \text{ is empty, we take } T_G^t(\pi)(\ell) = \bot \\
N_G^a &= \lambda \ell. \max\{ts(a) \mid a \in \text{WU}_\ell\} \quad \text{if} \quad \{ts(a) \mid a \in \text{WU}_\ell\} \text{ is empty, we take } N_G^a(\ell) = \bot
\end{align*}
\]

We say that $G$ relates to a physical state $(M, T, N)$, denoted $G \sim (M, T, N)$ if $(M, T, N) = (M^t_G, T^t_G, N^a_G)$ for some timestamp assignment $ts$ for $G$.

The following lemmas (tediously but straightforwardly) follow from the definitions.

**Lemma 35.** Suppose that $(M, T, N) \xrightarrow{\sim} (M', T', N')$, and let $G$ be an execution graph such that $G \sim (M, T, N)$. Then, $G \xrightarrow{\sim} G'$ for some execution graph $G'$ such that $G' \sim (M', T', N')$.

**Lemma 36.** Suppose that $(M, T, N) \xrightarrow{\sim} \bot_{\text{race}}$, and let $G$ be an execution graph such that $G \sim (M, T, N)$. Then, $G \xrightarrow{\sim} \bot_{\text{race}}$.

**Lemma 37.** Suppose that $(M, T, N) \xrightarrow{\sim} \bot_{\text{uninit}}$, and let $G$ be an execution graph such that $G \sim (M, T, N)$. Then, $G \xrightarrow{\sim} \bot_{\text{uninit}}$.

**Lemma 38.** Suppose that $G \xrightarrow{\sim} G'$, and let $ts'$ be a timestamp assignment for $G'$. Let $ts = ts'|_{G.\text{WU}}$. Then, $ts$ is a timestamp assignment for $G$, and $(M^t_G, T^t_G, N^a_G) \xrightarrow{\sim} (M_{G'}^t, T_{G'}^t, N_{G'}^a)$.

**Lemma 39.** Suppose that $G_0 \xrightarrow{\varepsilon_0} \ldots \xrightarrow{\varepsilon_n} G_n$. Then, $(M_0, T_0, N_0) \xrightarrow{\varepsilon_0} \ldots \xrightarrow{\varepsilon_n} (M_n, T_n, N_n)$ for some physical state $(M_n, T_n, N_n)$.

**Proof.** Let $ts_n$ be some timestamp assignment for $G_n$, and for every $0 \leq k < n$, let $ts_k = ts_n|_{G_k.\text{WU}}$. For every $0 \leq k \leq n$, let $(M_k, T_k, N_k) = (M^t_{G_k}, T^t_{G_k}, N^a_{G_k})$. (Clearly, this definition of $(M_k, T_k, N_k)$ agrees with the definition of the initial physical state in Definition 2). By Lemma 38, we have $(M_{k-1}, T_{k-1}, N_{k-1}) \xrightarrow{\varepsilon_k} (M_k, T_k, N_k)$ for every $1 \leq i \leq n$. □
Lemma 40. Suppose that \( G \xrightarrow{\pi} \perp_{\text{race}} \), and let \((M, T, N)\) be a physical state such that \( G \sim (M, T, N) \). Then, \((M, T, N) \xrightarrow{\pi} \perp_{\text{race}} \).

Proof. Let \( ts \) be a timestamp assignment such that \((M, T, N) = (M_G^T, T_G^N, N_G^N)\). Consider the two possible cases:

- \( \varepsilon \in \{(\text{Read}_a, \ell, v), (\text{Write}_a, \ell, v), (\text{Update}, \ell, v_o, v_n)\} \), and \( \exists a \in W^{\text{sa}}. \forall b \in E^\pi. (a, b) \notin \text{mo}^7; (sb \cup rf)^* \): We show that \( T(\pi)(\ell) < N(\ell) \) (and it would follow that \((M, T, N) \xrightarrow{\pi} \perp_{\text{race}} \)). Let \( a \in W^{\text{sa}} \) be the \text{mo}-maximal event such that \( \forall b \in E^\pi. (a, b) \notin \text{mo}^7; (sb \cup rf)^* \). Then, \( N(\ell) \geq ts(a) \). Hence, it suffices to show that \( T(\pi)(\ell) < ts(a) \). Let \( c \in W^{\ell} \) and \( b \in E^\pi \) such that \((c, b) \in (sb \cup rf)^* \). We show that \( ts(c) < ts(a) \). Indeed, otherwise, we have \( ts(c) \leq ts(a) \), and so \((a, c) \in \text{mo}^7 \). Hence, \((a, b) \in \text{mo}^7; (sb \cup rf)^* \), which contradicts our assumption.

- \( \varepsilon = (\text{Read}_a, \ell, v) \) and \( \exists a \in W^{\ell}. \forall b \in E^\pi. (a, b) \notin \text{mo}^7; (sb \cup rf)^* \): We show that \( \exists \ell', t', V'.(\ell, v', t', V') \in M \land T(\pi)(\ell) < t' \) (and it would follow that \((M, T, N) \xrightarrow{\pi} \perp_{\text{race}} \)). Let \( a \in W^{\ell} \) such that \( \forall b \in E^\pi. (a, b) \notin \text{mo}^7; (sb \cup rf)^* \). Let \( (\ell, v', t', V') \in M \), and it suffices to show that \( T(\pi)(\ell) < t' \). Let \( c \in W^{\ell} \) and \( b \in E^\pi \) such that \((c, b) \in (sb \cup rf)^* \). We show that \( ts(c) < t' = ts(a) \). Indeed, otherwise, we have \( t(c) \geq t(a) \) and so \((a, c) \in \text{mo}^7 \). Hence, \((a, b) \in \text{mo}^7; (sb \cup rf)^* \), which contradicts our assumption.

Lemma 41. Suppose that \( G \xrightarrow{\pi} \perp_{\text{uninit}} \), and let \((M, T, N)\) be a physical state such that \( G \sim (M, T, N) \). Then, \((M, T, N) \xrightarrow{\pi} \perp_{\text{uninit}} \).

Proof. Let \( ts \) be a timestamp assignment such that \((M, T, N) = (M_G^T, T_G^N, N_G^N)\). Since \( G \xrightarrow{\pi} \perp_{\text{uninit}} \), we have \( \varepsilon \in \{(\text{Read}_a, \ell, v), (\text{Update}, \ell, v_o, v_n)\} \), and \( \exists a \in W^{\ell}. \exists b \in E^\pi. (a, b) \in (sb \cup rf)^* \). By definition, we have \( T_G^N(\pi)(\ell) = \perp \), and so \((M, T, N) \xrightarrow{\pi} \perp_{\text{uninit}} \).

Finally, by combining the results above, we obtain the following:

Corollary 42. Let \( e \) be an expression.

- If \( e \) is not buggy, then a trace \( tr \) is allowed for \( e \) iff \( tr \) has a permutation \( \langle \varepsilon_1, \pi_1 \rangle, \ldots, \langle \varepsilon_n, \pi_n \rangle \) that respects \( sb(tr) \), and

  \[
  (M_0, T_0, N_0); [0 \mapsto e] \xrightarrow{\varepsilon_1 \pi_1} \ldots \xrightarrow{\varepsilon_n \pi_n} (M, T, N); TS
  \]

  for some thread \( \pi \), physical state \((M, T, N)\), and threadpool \( TS \).

- \( e \) is buggy iff \( (M_0, T_0, N_0); [0 \mapsto e] \xrightarrow{\text{race}} \perp_{\text{uninit}} \) or \((M_0, T_0, N_0); [0 \mapsto e] \xrightarrow{\text{uninit}} \perp_{\text{uninit}} \) for some thread \( \pi \).
### B  Single-writer protocols’ proof rules

\[
\begin{align*}
\ell &: s \tau \Rightarrow \Box \ell &: s \tau_R \\
\ell &: s_1 \tau_R \ast \ell &: s_2 \tau_R \Rightarrow s_1 \subseteq s_2 \lor s_2 \subseteq s_1 \\
\ell &: s_1 \tau_W \ast \ell &: s_2 \tau_R \Rightarrow s_2 \subseteq s_1
\end{align*}
\]

\[
\begin{align*}
\ell &: s \tau &\Rightarrow \ell &: s \tau_W \\
\ell &: s \tau_W &\Rightarrow T \tau_{\text{full}}(s, v) \ast T \tau_{\text{read}}(s, v) \ast Q \\
\ell &: s \tau_W &\Rightarrow v \{Q\} \\
\ell &: s \tau_W &\Rightarrow v. \exists \ell[\text{at}] \{v, \ell &: s \tau_W \ast \tau_{\text{read}}(s, v)\} \\
\ell &: s \tau_W &\Rightarrow v. \exists s' \subseteq s, \ell &: s' \tau_R \ast Q \\
P \ast \ell &: s' \tau_W \ast \tau_{\text{full}}(s, \_ \ast \tau_{\text{full}}(s', v) \ast Q \\
P \ast \ell &: s' \tau_W \ast \tau_{\text{full}}(s', v) \ast Q \\
P \ast \ell &: s' \tau_W \ast \tau_{\text{full}}(s, \_ \ast \tau_{\text{full}}(s', v) \ast Q
\end{align*}
\]

### C  Escrows and exchanges’ proof rules

In addition to escrows, we have mechanized a weaker variant of exchanges \[30\]. Our variant is weaker in the sense that the resources put in the exchanges have to be justified at the same view. Unlike the general exchanges, our variant can be trivially proven sound. In order to enforce that the resources are put back at the same view as when they are taken out, we need to expose the identity of views in the logic. However, since we only need to know equalities of views, we just need to bind views to names and expose the names. We call those names abstract views. The assertion \[P \Rightarrow Q\] asserts that \(P\) holds at some abstract view \(\beta\). To use \(P\) a thread needs the extra knowledge \(\text{seen}(\beta)\) which states that it has observed the view with name \(\beta\).

The assertion \[P \Rightarrow Q\] asserts the knowledge of an exchange at some abstract view \(\beta\) under \(\mathcal{N}\). Escrows, unsurprisingly, are just a special instance of exchanges.

\[
\begin{align*}
\text{Exchange-Intro-Left} & \quad T \Rightarrow \exists \beta. \ [P \Rightarrow Q] \Rightarrow [P \Rightarrow Q] \\
\text{Exchange-Intro-Right} & \quad T \Rightarrow \exists \beta. \ [P \Rightarrow Q] \\
\text{Exchange-Elim-Left} & \quad P \ast P \Rightarrow \perp \\
\text{Exchange-Elim-Right} & \quad Q \ast Q \Rightarrow \perp \\
\text{Escrow-Init} & \quad \Rightarrow \exists \beta. \ [P \Rightarrow Q] \\
\text{Escrow-Apply} & \quad \Rightarrow \exists \beta. \ [P \Rightarrow Q]
\end{align*}
\]
D RSL proof rules

RSL focuses on the message-passing style transferring of resources through release-write/acquire-read pairs. The two main assertions of RSL are $\text{Rel}(\ell, Q)$, representing the permission to write to $\ell$, and $\text{Acq}(\ell, Q)$, representing the permission to read from $\ell$. The resource $Q(v)$ is transferred through the write/read pair of value $v$. The assertion $\text{RMWAcq}(\ell, P)$ is another permission that is used to CAS on the location.

We present in Figure 15 the proof rules of iRSL, which were already checked by Coq. The encoding extends the original assertions with a predicate $D$, which is duplicable, to represent the restricted interpretation of the underlying invariant governing $\ell$ (in a similar fashion to what we did in iGPS). For more details please refer to the Coq development, which also contains a fractional variant.
RSL-WRel
\{ v : Q(v) \triangleright D(v) \triangleright \text{Rel}(\ell, Q, D) \}\ [\ell_{\text{at}}] := v \ [\text{Init}(\ell) \triangleright \text{Rel}(\ell, Q, D)]

RSL-RAcq
\{ \text{Init}(\ell) \triangleright \text{Acq}(\ell, R, D) \}\ [\ell_{\text{at}}] \ { v : R(v) \triangleright D(v) \triangleright \text{Acq}(\ell, R[v := \top], D) \}

RSL-CAS
\text{P} \triangleright Q(v_{\alpha}) \triangleright D(v_{\alpha}) \implies \neg \forall v_{\alpha} \ . \ P \triangleright D(v) \implies R(1, v_{\alpha})
\forall v_{\alpha} \ . \ \text{Init}(\ell) \triangleright \text{RMWAcq}(\ell, Q, D) \triangleright P \} \ { \text{cas}(\ell, v_{\alpha}, v_{\alpha}) \ | \ \exists v_{\alpha} \ . \ R(b, v) \triangleright \text{Init}(\ell) \triangleright \text{RMWAcq}(\ell, Q, D) \}

RSL-Rel-SplitJoin
\text{Rel}(\ell, Q_{1}, D) \triangleright \text{Rel}(\ell, Q_{2}, D) \ \vdash \ \text{Rel}(\ell, \lambda v. Q_{1}(v) \lor Q_{2}(v), D)

RSL-Acq-SplitJoin
\text{Acq}(\ell, R_{1}, D) \triangleright \text{Acq}(\ell, R_{2}, D) \ \iff \ \text{Acq}(\ell, \lambda v. R_{1}(v) \lor R_{2}(v), D)

RSL-RMW-Acq-Split
\text{RMWAcq}(\ell, P, D) \ \triangleright \ \text{RMWAcq}(\ell, P, D) \triangleright \text{Acq}(\ell, \text{True}, D)

RSL-Init-Persistent
\text{Init}(\ell) \vdash \square \text{Init}(\ell)

RSL-RMW-Persistent
\text{RMWAcq}(\ell, P, D) \vdash \square \text{RMWAcq}(\ell, P, D)

RSL-Rel-Persistent
\text{Rel}(\ell, Q, D) \vdash \square \text{Rel}(\ell, Q, D)

RSL-Alloc-R
\{ \top \} \ \text{alloc}(1) \ { \ell. \ \text{Rel}(\ell, P) \triangleright \text{Acq}(\ell, P) \}

RSL-Alloc-M
\{ \top \} \ \text{alloc}(1) \ { \ell. \ \text{Rel}(\ell, P) \triangleright \text{RMWAcq}(\ell, P) \}

RSL-Init-R
\{ \text{Own}(\ell) \triangleright \top \triangleright Q(v) \triangleright D(v) \}\ [\ell_{\text{na}}] := v \ [\text{Init}(\ell) \triangleright \text{Rel}(\ell, Q, D) \triangleright \text{Acq}(\ell, Q, D)]

RSL-Init-M
\{ \text{Own}(\ell) \triangleright \top \triangleright Q(v) \triangleright D(v) \}\ [\ell_{\text{na}}] := v \ [\text{Init}(\ell) \triangleright \text{Rel}(\ell, Q, D) \triangleright \text{RMWAcq}(\ell, Q, D)]

\textbf{Figure 15} Proof rules of iRSL.
Let ns ≜ 0, tc ≜ 1.

\[\text{newLock()} ≜\]
\[
\begin{align*}
\text{let } x &= \text{alloc}(2) \text{ in} \\
\text{x + ns[na]} &= 0; \\
\text{x + tc[na]} &= 0; \\
\text{x}
\end{align*}
\]

\[\text{lock(x)} ≜\]
\[
\begin{align*}
\text{let } y &= \text{FAI}(x + tc) \text{ in} \\
\text{repeat} \\
\text{let } z &= x + \text{ns[at]} \text{ in} \\
\text{y} &= z
\end{align*}
\]

\[\text{unlock(x)} ≜\]
\[
\begin{align*}
\text{let } z &= x + \text{ns[at]} \text{ in} \\
\text{x + ns[at]} &= (z + 1) \mod C
\end{align*}
\]

\[\text{Figure 16} \text{ Bounded ticket lock in } \lambda_{RN}.\]

### E Bounded ticket locks

Ticket locks are a fair locking mechanism found in the Linux kernel. The first formal proof of ticket locks in RA+NA was given in the appendix of the GPS paper. We follow that variant and its proof’s intuition closely, although our proof, thanks to single-writer protocols, is significantly simpler.

The code of the bounded ticket lock variant in \(\lambda_{RN}\) is given in Figure 16. The lock has two atomic fields, a ticket counter \((x + tc)\) and a now-serving counter \((x + ns)\), which are initialized with 0. A thread can acquire the lock by first asking for a ticket \(t\) from the ticket counter, which is a fetch-and-increment (FAI) operation that hands out the current value of the ticket counter as the new ticket, and increases the counter by 1. After obtaining the ticket, the thread waits for the now-serving counter to reach its ticket \((x + ns)\), by which time it is guaranteed that previous tickets have been served, so it is safe to acquire the lock. To release the lock, the thread only needs to write \(t + 1\) to the now-serving counter, signifying that the next ticket can now be served.

In the bounded variant of ticket locks, the counters are stored in a fixed-size machine word with the maximum unsigned value \(C\). Once the counters reach \(C - 1\), the next increment will reset them back to 0. This additional flavor of reality induces the main complication of the verification. In unbounded ticket locks, it is easy to maintain the invariant that no two threads have the same ticket. Here, however, tickets are being reused, and the ticket counter must ensure that a ticket \(t\) has already been served before re-issuing it. In order to circumvent this, the number of concurrent threads trying to acquire the lock is restricted at most \(C\).

Additionally, before getting a new ticket, each thread must prove to the ticket counter that its most recent ticket has been served. Consequently, the ticket counter must track which tickets have been issued to which threads and by its best the value of the now-serving counter, in order to keep the number of tickets waiting under \(C\).

The now-serving counter is where the actual transfer of resources takes place: the lock resource \(P\) is transferred from the release write to \(x + ns\) (by the unlock call of the previous ticket holder) to the acquire read by the lock call of the current ticket holder. Since a read cannot get ownership, escrows are used instead to transfer \(P\). The releasing thread creates an escrow to store \(P\) and deliver it to the holder of ticket \(t + 1\). The escrow is acquired by the \(t + 1\) ticket holder when the loop of its call to lock stops. By that time the acquiring thread should also have the correct exclusive token to use the escrow and obtain \(P\). The releasing thread saves the proof that it has released the lock, in order to present that to the ticket counter when it requests another ticket later.

Our proof significantly simplifies the original proof by:

- **Using single-writer protocols for the now-serving counter.** The original proof uses a normal protocol for the counter, in combination with two other ghost components
We explain the technical construction of the proof. The complete definitions of monoids, protocols and escrows needed are given in Figure 17.

The interpretations of \( \text{NSP}(\gamma) \) and \( \text{TCP}(\gamma, x) \) depend respectively on the first and second components of a product monoid for our ghost states. The first component \( \text{Disj}(\mathbb{N}) \)

\[
\begin{align*}
\operatorname{Perm}^\gamma(t) & \triangleq \{(i, \circ \emptyset) : \text{Disj}(\mathbb{N}) \times \text{AUTH(\mathbb{N}, \mathbb{N}, \mathbb{N}, \mathbb{N})} \}; \\
\operatorname{AllTkt}^\gamma(M) & \triangleq \{0, 1 \}^S \\
\operatorname{MyTkt}^{\gamma}([i \mapsto t]) & \triangleq \{0, 1 \}^S \\
\operatorname{Esects}(x, \gamma, n) & \triangleq \forall t \leq n. \left[ \operatorname{Perm}^\gamma(t) \rightsquigarrow P \ast \left[ x + \text{ns} : t \left[ \text{NSP}(\gamma) \right]_R \right] \right] \\
\operatorname{TCP}(\gamma, x)(\text{Inv}, y) & \triangleq \exists t, n, M, y = t \text{ mod } C \land n \leq t \land t \leq n + |\text{Waiting}(M, n)| \\
& \land t = \max_i M(i) \land \text{dom}(M) = [0, ..., C - 1] \\
& \land (\forall i, j \in \text{dom}(M). M(i) = M(j) \neq -1 \Rightarrow i = j) \\
& \ast \operatorname{AllTkt}^\gamma(M) \ast \bigotimes_{j \geq t} \operatorname{Perm}^\gamma(j) \ast \left[ x + \text{ns} : n \left[ \text{NSP}(\gamma) \right]_R \right] \\
\operatorname{MayAcquire}(x) & \triangleq \exists \gamma, i, t. \operatorname{MyTkt}^{\gamma}([i \mapsto t]) \ast \left[ x + \text{tc} : \text{Inv} \right] \text{TCP}(\gamma, x) \\
& \ast t \neq -1 \Rightarrow \left[ x + \text{ns} : t + 1 \left[ \text{NSP}(\gamma) \right]_R \right] \\
\operatorname{MayRelease}(x) & \triangleq \exists \gamma, i, t. \operatorname{MyTkt}^{\gamma}([i \mapsto t]) \ast \left[ x + \text{tc} : \text{Inv} \right] \text{TCP}(\gamma, x) \ast \left[ x + \text{ns} : t \left[ \text{NSP}(\gamma) \right]_R \right]
\end{align*}
\]

**Figure 17** Proof setup for bounded ticket locks.

to ensure that the counter’s value cannot surpass the ticket \( t \) held by the acquiring thread, and that once the lock is acquired, the value stays exactly \( t \) until the thread releases the lock by increasing it to \( t + 1 \). In our proof, we add the single-writer permission of \( x + \text{ns} \) to the lock’s resources, so a thread also acquires the full-write ownership of \( x + \text{ns} \) when it acquires the lock, and releases the ownership by the same time with releasing the lock. This removes two components of the ghost monoid needed by the original proof to maintain knowledge about some upper-bound of the protocol’s state.

- **Tracking only the most recent ticket of a thread.** The original proof tracks the complete history of issued tickets, which we found unnecessary. Instead, we only need to track the most recent ticket issued to a thread, which simplifies the tracking component of our ghost monoid. However, we do not claim that this results from the use of single-writer protocols: it seems that the simplification can also be applied to the original proof.

### E.1 Proof setup for bounded ticket locks

We explain the technical construction of the proof. The complete definitions of monoids, protocols and escrows needed are given in Figure 17.

We use protocols \( \text{TCP}(\gamma, x) \) and \( \text{NSP}(\gamma) \) to govern the ticket counter and the now-serving counter respectively. The \( \text{TCP}(\gamma, x) \) protocol only has a single state \( \text{Inv} \), while the states of \( \text{NSP}(\gamma) \) are natural numbers, which represent *abstract* tickets. Though working differently, both counters record the abstract ticket \( t \) they are currently working with, who requires the concrete counter value \( t_c \) to follow the equation \( t_c = t \text{ mod } C \). The abstract ticket of each counter can only increase over time (the state relation is \( \leq \) for \( \text{NSP}(\gamma) \)). By exploiting abstract tickets, we do not need to deal with using concrete tickets.

The interpretations of \( \text{NSP}(\gamma) \) and \( \text{TCP}(\gamma, x) \) depend respectively on the first and second components of a product monoid for our ghost states. The first component \( \text{Disj}(\mathbb{N}) \)
is a \textit{disjoint-union} monoid of natural numbers, whose composition is only defined for disjoint sets of natural numbers: $X \cdot Y = X \cup Y$ if $X \cap Y = \emptyset$. It is used for abstract ticket ownerships. The second component $\text{AUTH}(\mathbb{N} \rightarrow \text{Ex}(\mathbb{N} \cup \{-1\}))$ is an authoritative monoid of a finite partial map from $\mathbb{N}$ to exclusive elements from $\mathbb{N} \cup \{-1\}$. It is used for tracking issued abstract tickets for all participating threads. The composition works in the same way as that of the heap monoid in $\lambda_{\text{RN}}$ and the history and thread-view monoids in $\lambda_{\text{RN}}$: it has the same rules for exclusiveness, agreement, and updates as those monoids. The composition of the product monoid is then lifted piecewise.

\textbf{The now-serving counter protocol}

The assertion $\text{Perm}^\gamma(t)$ embeds the ownership of the abstract ticket $t$ in its first component (while the second component is an empty map which contains zero knowledge or ownership). $\text{Perm}^\gamma(t)$ satisfies exclusiveness, realizing the requirement that a ticket can only be uniquely owned:

$$\text{Perm}^\gamma(t) \ast \text{Perm}^\gamma(t) \Rightarrow \text{False}$$

Exclusiveness of $\text{Perm}^\gamma(t)$ allows us to define the escrow used to transfer the lock’s resource, which contains $P$ and the full single-writer permission of the now-serving counter. This is reflected in the definition of $\text{Escs}(x, \gamma, n)$, which asserts the existence of the escrows for all abstract tickets up to $n$. Additionally, $\text{Escs}(x, \gamma, n)$ subsumes the proofs that all abstract tickets up to (but excluding) $n$ have been served, due to the fact that the lock’s resource can only be stored in exactly one of the escrows at a time, and that the verification maintains that the resource only moves from the escrow of some ticket $t$ to that of $t + 1$. These proofs are needed when a thread updates the now-serving counter $x + ns$, to release the lock. This is enforced in $\text{NSP}(\gamma)$. When increasing the counter with value $z = n \mod C$ to $(n + 1) \mod C$, the lock-releasing thread moves the protocol state from $n$ to $n + 1$, and must re-establish $\text{NSP}(\gamma)$ by allocating the escrow for the ticket $n + 1$ of the next lock-acquiring thread.

\textbf{The ticket counter protocol}

Tracking tickets is the main role of the ticket counter. The tracking $M$ is a finite map from participant ids to their corresponding most recent abstract tickets. The domain of $M$, i.e. the set of ids, is restricted to the set $[0, \ldots, C - 1]$, since there are at most $C$ threads participating in the lock. If a thread has never obtained any ticket, then it is mapped to $-1$. The authoritative part of the map $\bullet M$ is part of $\text{AllTkts}^\gamma(M)$, which in turn is stored in the protocol $\text{TCP}(\gamma, x)$ that governs the ticket counter $x + tc$. The non-authoritative $\circ M$ is split into $\text{MyTkt}^\gamma[i \mapsto M(i)]$ for $i \in [0, \ldots, C - 1]$. Each $\text{MyTkt}^\gamma[i \mapsto M(i)]$ represents the knowledge of the most recent ticket $M(i)$, as well as the exclusive permission required to acquire a new ticket for the thread $i$. These permissions are allocated when the lock is created, and then distributed externally to the threads that wish to use the lock.

$\text{TCP}(\gamma, x)$ always keep its authoritative tracker $\text{AllTkts}^\gamma(M)$. The maximum abstract ticket of $M$ is $t - 1$, which is also the most recently issued ticket by the ticket counter. Meanwhile, the next ticket to be issued is $t$, and the current value $y$ of the ticket counter must be the corresponding concrete ticket of $t$. $M$ of course is injective, since no two threads should get the same ticket. $\text{TCP}(\gamma, x)$ also holds permissions $\bigwedge_{j \geq t} \text{Perm}^\gamma(j)$ of to-be-issued abstract tickets. When requesting a ticket, a thread obtains the abstract ticket $t$ (and the concrete ticket $y$), as well as $\text{Perm}^\gamma(t)$ to use the escrow later. The thread also increases the
ticket counter value (concretely and abstractly), and, by doing so, it must show that it has observed the now-serving counter to be at least some state that is not too far away from the ticket counter, which is realized by the conditions between $t$ and $n$ in the definition of TCP. These conditions say that the ticket counter is always ahead of the now-serving counter, but their distance cannot surpass the number of waiting tickets known to the ticket counter, i.e. those that have been issued but are still not reported to the ticket counter as served.

### Specification for bounded ticket lock

The specification that we need to verify is very simple, as given in Figure 18. $\text{MayAcquire}(x)$ is the permission for a thread to request a ticket and later acquire the lock. The thread must have the right $\text{MyTkt}^\gamma[i \mapsto t]$ to change its most recent ticket $t$, and it must show that $t$ has been served, by knowing $[x + ns : t + 1]_{R}^{\text{NSP}(\gamma)}$.

By acquiring the lock, a thread earns $\triangleright P$ and $\text{MayRelease}(x)$, which is the permission to release the lock. $\text{MayRelease}(x)$ includes the writer permission of $[x + ns : t]_{W}^{\text{NSP}(\gamma)}$ for the ticket $t$ that was used to acquire the lock. The thread has to give up this writer permission to update the now-serving counter to $t + 1$, at the same time gaining only the reader permission for $t + 1$, which is enough to re-establish $\text{MayAcquire}(x)$ that can be used for its next turn. Note that $P$ is guarded by a later, since it was put in an escrow. Fortunately there should be always enough steps in a client program to remove the later.

The proof outlines are given Figure 19, Figure 20, and Figure 21. For $\text{newLock}$, we only need to set up the default initial states of the protocols, for exactly $C$ participants. The proof of $\text{unlock}$ simply exploit the power of the single writer protocol to release the writer permission while doing the write itself (see §4.2). We explain the proof of $\text{lock}$ in more detail.

### E.2 Proof steps for lock

The proof outline for the $\text{lock}$ procedure is given in Figure 21. We explain the main reasoning below.

With the permission $\text{MyTkt}^\gamma[i \mapsto t_i]$, a thread $i$ acquires a new abstract ticket $t$ (concretely $y$) along with the permission $\text{Perm}^\gamma(t)$, and updates the tracker for $i$ to $t$. It also increases the ticket counter concretely to $(y + 1) \mod C$ along with an abstract ticket $t + 1$, by which the ticket counter demands $[x + ns : n_0]_{R}^{\text{NSP}(\gamma)}$ for some good enough $n_0$. This demand is fulfilled either by what the thread has locally, or what is already in the interpretation previously. In either way the thread updates its observation of $\text{NSP}(\gamma)$ to $n_0$. 

Figure 18 Specification of bounded ticket locks.
\{ \triangleright P \}
\begin{align*}
\text{let } x &= \text{alloc}(2) \text{ in} \\
\{ \triangleright P \times x + \ns \leftrightarrow A \times x + \tc \leftrightarrow A \} \\
\{ \triangleright P \times x + \ns \leftrightarrow A \times x + \tc \\& \leftrightarrow A \times \exists \gamma, M. \begin{array}{c}
\text{Perm}^{-1}(t) \times \text{AllTkts}(M-1) \times \text{Perm}^{-1}(t) \times \text{MyTkt}^{-1}[i \mapsto -1]
\end{array} \}
\end{align*}
\begin{align*}
x + \ns_{[\text{na}]} &:= 0; \\
\{ x + n_{s_{[\text{na}]}} := 0; \}
\begin{align*}
\{ x + \ns_{[0]} \text{NSP}(\gamma) \leftrightarrow x + \tc \leftrightarrow A \leftrightarrow \begin{array}{c}
\text{Perm}^{-1}(t) \times \text{AllTkts}(M-1) \times \text{Perm}^{-1}(t) \times \text{MyTkt}^{-1}[i \mapsto -1]
\end{array} \}
\end{align*}
\begin{align*}
x + \tc_{[\text{na}]} &:= 0; \\
\{ x + \ns : 0 \text{NSP}(\gamma) \leftrightarrow x + \tc \leftrightarrow \text{InvTCP}(\gamma, x) \leftrightarrow \begin{array}{c}
\text{MyTkt}^{-1}[i \mapsto -1]
\end{array} \}
\end{align*}
\begin{align*}
x &\{ \text{MayAcquire}(x) \}
\end{align*}
\text{\textbf{Figure 19} Proof outline for \textit{newLock}}

\begin{align*}
\{ \triangleright P \times \text{MayRelease}(x) \}
\begin{align*}
\{ \exists \gamma, i, t. \triangleright \begin{array}{c}
\text{MyTkt}^{-1}[i \mapsto t] \times \begin{array}{c}
\text{TCP}(\gamma, x) \leftrightarrow x + \ns \leftrightarrow t \text{NSP}(\gamma)
\end{array}
\end{array} \}
\end{align*}
\text{Ctx: } \begin{array}{c}
\begin{array}{c}
\text{TCP}(\gamma, x) \leftrightarrow x + \ns \leftrightarrow t \text{NSP}(\gamma)
\end{array}
\end{array}
\begin{align*}
\text{let } z &= x + \ns_{[\text{at}]} \text{ in} \\
\{ z = t \mod C \leftrightarrow \begin{array}{c}
\text{MayTkt}^{-1}[i \mapsto t] \times \begin{array}{c}
\text{NSP}(\gamma)
\end{array}
\end{array} \}
\end{align*}
\begin{align*}
x + \ns_{[\text{at}]} &:= (z + 1) \mod C \\
\{ \text{MyTkt}^{-1}[i \mapsto t] \times \begin{array}{c}
\text{NSP}(\gamma)
\end{array} \}
\end{align*}
\begin{align*}
\{ \text{MayAcquire}(x) \}
\end{align*}
\text{\textbf{Figure 20} Proof outline for \textit{unlock}}
By entering the loop, the thread keeps updating its observation of $\text{NSP}(\gamma)$ to some state $n$ later than $n_0$. When the loop exits, the thread knows that the two concrete tickets $y$ and $z$ are the same, and this fact should imply that their corresponding abstract tickets $t$ and $n$ are also the same. To show this we need to use a subtle reasoning with the single writer protocol.

Since we already have $t \mod c = n \mod c$ and $t < n + c$ by acquiring the ticket and loop-checking the counter, we only need to show $n \leq t$ to show $n = t$. This matches the intuition that the now-serving counter cannot surpass the unserved ticket, since we have locally the permission $\text{Perm}^\gamma(t)$. We assume the contrary $t < n$ and derive a contradiction.

Note that by reading the now-serving counter we also obtain $\text{Escs}(x, \gamma, n)$. Then we can use $\text{Perm}^\gamma(t)$ with the escrow for $t$ to obtain $\circ [x + ns : t \mid \text{NSP}(\gamma)]_v^w$. Combining with our locally-known $[x + ns : n \mid \text{NSP}(\gamma)]_v^w$, it must be that $n \leq t$: contradiction!

Knowing that $n = t$, we use the escrow with $\text{Perm}^\gamma(t)$ to acquire the lock, and gain $\triangleright P$.

We also gain $\triangleright [x + ns : t \mid \text{NSP}(\gamma)]_v^w$, which we combine with $[x + ns : t \mid \text{NSP}(\gamma)]_v^w$ to remove the later, and complete the $\text{MayRelease}(x)$ permission.
F iGPS Protocol Model

The model of iGPS protocols consists of two parts: a global protocol invariant \( \text{PrInv}_\tau(\ell) \) shared by all protocol variants, and local protocol assertions given out to clients, which depend on the particular protocol variant. We will now explain both parts in detail. The definitions are given in Figure 22b.\(^8\)

The protocol invariant owns, among other resources, the location’s history \( \text{Hist}(\ell, h) \) – though \( h \) will be expressed differently, as we explain below. In addition to the physical history, the invariant also owns the authoritative fragment of the logical history \( \Delta \) – an augmented history that write events paired with protocol states.

The protocol invariant further owns the protocol interpretation for every state in \( \Delta \). As all states are potentially targets of reads, we store \( \tau_{\text{read}} \) for every state in the history. To enable ownership transfer via CAS, we additionally need to store \( \tau_{\text{full}} \) for those states whose associated write event can still be target of an update operation.

This condition is sufficient for non-exclusive protocols. However, single-writer protocols offer an additional mechanism of transferring out resources from the protocol interpretation: exclusive writes. To support both protocol variants, the invariant tracks the most recent exclusive write \((s_x, v_x, V_x)\) and stores \( \tau_{\text{full}} \) for \( \delta_x \) as well as any later write that can be target of CAS. In the case of non-exclusive protocols, \( \delta_x \) is set to the initial protocol write and all CAS-able writes maintain their full interpretation \( \tau_{\text{full}} \).

We track \( \delta_x \) with an additional authoritative piece of ghost state. The protocol invariant holds the authoritative fragment \( \bullet \Delta_x \) while the non-authoritative fragment \( \circ \Delta_x \) is given to the writer – or put into the shared invariant \( \text{PrShared}_\tau \) in the case of non-exclusive protocols.

Apart from those resources, the protocol invariant imposes three conditions on the physical history \( h \) and the logical history \( \Delta \). The most important one is consistency: the modification order on \( h \) must correspond to the protocol order on \( \Delta \). The Consistent assertion captures exactly this property.

---

\[^8\] The definitions differ slightly from the Coq mechanization to simplify the presentation.
Additionally, we need to make sure that the logical history does not assign multiple protocol states to a write event. To this end, Unique asserts that $\Delta$ contains any write event at most once. With this restriction in place, we can now express $h$ in terms of $\Delta$ by forgetting the protocol states in $\Delta$, i.e., $h = \{ (v, t, V) \mid (s, v, t, V) \in \Delta \}$.

Finally, we need an additional piece of information on the logical history: $\Delta$ contains only final states, i.e. states reachable from any other state, in the disconnected segment. We call this property FinalInv. The disconnected segment – illustrated in Figure 22a – spans all writes which are not connected to the most recent exclusive write. The write events contained in that segment share an important property: they could be $mo$-after newly inserted writes (which are always $mo$-after the most recent exclusive write). To re-establish Consistent for new writes, we need to be able to relate their protocol state to any existing $mo$-later write. FinalInv guarantees that the state associated with the newly inserted write comes before any state in the disconnected segment.

Before we turn to the local protocol assertions, we define the PCMs used in $PrInv_\tau$. The PCM for the most recent exclusive write $\delta_x$ is a simple Auth construction similar to the ones shown in §3, providing both an authoritative fragment for $PrInv_\tau$ as well as an exclusive non-authoritative fragment to be given to the writer or to $PrShared_\tau$. The PCM for logical histories is more involved, as it needs to provide three kinds of fragments: (1) an authoritative fragment held by $PrInv_\tau$, (2) an exclusive fragment given to the writer or to $PrShared_\tau$, representing the right to change the logical history – and thus also the physical one – and (3) a persistent fragment used in the (reader) protocol assertion, representing a lower bound on protocol state. To fulfill these requirements, we construct the rather intimidating double-AUTH PCM LH below.

\[
\begin{align*}
LE & \triangleq \text{State} \times \text{Val} \times \text{Time} \times \text{View} & \quad (\text{logical event}) \\
LES & \triangleq \mathcal{P}_v(LE) & \quad (\text{logical event set}) \\
XW & \triangleq \text{AUTH}((\text{EX}(LE))) & \quad (\text{most recent excl. event}) \\
LH & \triangleq \text{AUTH}((\text{AUTH}(LES))) & \quad (\text{auth. logical event set})
\end{align*}
\]

With these definitions we can now define protocol assertions, which are presented in Figure 23. As before, we elide ghost and invariant names, which add some bookkeeping overhead to the definitions.