Waqas Younas' blog

How to Solve Santa Claus Concurrency Puzzle with a Model Checker

Sat, 10 Jan 2026 00:00:00 -0500

When I enjoy a book, I often look for other work by the same author. While reading a book by Ben-Ari, I looked up his other writing and came across a paper on the Santa Claus concurrency puzzle ¹. The puzzle is stated as follows:

Santa Claus sleeps at the North Pole until awakened by either all nine reindeer or a group of three out of ten elves. He performs one of two indivisible actions:

If awakened by the group of reindeer, Santa harnesses them to a sleigh, delivers toys, and finally unharnesses the reindeer, who then go on vacation.

If awakened by a group of elves, Santa shows them into his office, consults with them on toy R&D, and finally shows them out so they can return to work constructing toys.

A waiting group of reindeer must be served by Santa before a waiting group of elves. Since Santa’s time is extremely valuable, marshalling the reindeer or elves into a group must not be done by Santa.

The puzzle captures the kind of synchronization challenges that arise whenever multiple processes must coordinate. I wanted to validate its correctness using a model checker. I expected this to be a simple problem. What surprised me is how easy it is to get it wrong.

I also chose a different path to the solution. Before writing the correct model, I spent some time exploring what incorrect designs might look like. I find failures instructive because they expose unsafe assumptions and make clear what a correct solution must prevent. As a learning exercise, I wrote three small models, each reproducing a different failure scenario, and used correctness properties to catch the bugs. We will look at these failures in a bit. After that, I present the correct model and validate it with a model checker.

To carry out this analysis, I used the SPIN model checker and wrote the models in Promela, SPIN’s specification language.

A natural question is why use a model checker instead of simply writing the solution in Python or Go. The answer is coverage: a model checker explores interleavings that tests or experiments might miss, and either proves correctness or produces a counterexample.

Let’s look at three key puzzle constraints:

Santa must not marshal the groups himself.
If both reindeer and elves are waiting, Santa must serve the reindeer first (Christmas delivery is critical).
When Santa serves a group, the entire group must participate together: exactly nine reindeer for delivery, or exactly three elves for consultation. Santa cannot deliver toys with seven reindeer or consult with two elves, for example.

At first, the problem seems straightforward. You wait until nine reindeer arrive or three elves arrive, wake Santa, do the work, and repeat. It is tempting to think that a few counters or locks are enough to solve it. The difficulty is in the interleavings. A solution that appears correct when reasoning step by step can fail when operations interleave in unexpected ways.

Let’s now consider three failure scenarios that solve the puzzle incorrectly.

Santa delivers toys even though fewer than nine reindeer are actually ready.
Under a subtle interleaving, Santa ends up doing something impossible in the real world: delivering toys and consulting with elves at the same time.
Santa chooses to consult with elves even though nine reindeer are already waiting and ready to go.

We’ll look at each of these failure scenarios next, and then work toward a solution that satisfies all puzzle constraints. Before that, let’s briefly introduce the Promela concepts we’ll need: channels, options, and guards.

To communicate between processes, we use channels in Promela. A channel is a data type that supports two operations: send and receive. Channels transfer messages of a specified type from a sender to a receiver.

SPIN supports two kinds of channels: rendezvous channels and buffered channels. In a rendezvous channel, send and receive synchronize: the sender blocks until the receiver is ready (and vice versa), and the transfer occurs as one atomic handshake. Buffered channels can hold messages temporarily, even if no process is ready to receive them yet. Buffered channels let a sender run ahead of a receiver; rendezvous channels force them to move together.

A send operation consists of a channel variable followed by an exclamation mark (!) and a sequence of expressions whose number and types must match the channel’s message type. A receive operation, by contrast, consists of a channel variable followed by a question mark (?) and a sequence of variables into which the message is received.

For example, we can declare a rendezvous channel named request that carries messages of type byte as follows (rendezvous channels have capacity of 0 as indicated by [0] syntax):

chan request = [0] of { byte };

We can send data on this channel as:

request ! 1

and receive data as:

byte client;
request ? client

Now a few words about options and guards. Options in Promela can be used to do branching in the code. For example, one could write an if statement in Promela as:

if
:: (a < 50) -> printf("a < 50\n");
:: (a > 50) -> printf("a > 50\n");
fi

Each line beginning with :: is an option. An option consists of a guard followed by an action. The guard is the Boolean expression before the arrow (->). An option is enabled if its guard evaluates to true.

When the if statement is reached:

if exactly one option is enabled, Promela takes that option;
if multiple options are enabled, SPIN may choose any one of them nondeterministically;
if no option is enabled, the process blocks at the if.

Now let’s look at the first failure scenario in which Santa may deliver toys without all nine reindeer being ready. In the failure models I sometimes shrink the number of elves (e.g., to 3) to keep the state space small; the failure mechanism is the same.

Here is a code excerpt from the model that simulates the first failure scenario:

#define NUM_REINDEER 9
#define NUM_ELVES 3

// buffered channels
chan harnessed = [NUM_REINDEER] of { bit };
chan done_consulting = [NUM_ELVES] of { bit };

// Some code omitted for brevity

active [NUM_REINDEER] proctype Reindeer()
{
    do
    ::
        r_arrive ! 1;
        harnessed ? 1;
        actually_harnessed++;
        actually_harnessed--;
    od
}

active proctype Santa()
{
    byte i = 0;
    byte e = 0;
    byte j;
    
    do
    :: (i < NUM_REINDEER) ->
        r_arrive ? 1;
        i++;
        if
        :: (i == NUM_REINDEER) -> reindeer_ready = true
        :: else -> skip
        fi
        
    
    :: (i == NUM_REINDEER) ->
        
        for (j : 1 .. NUM_REINDEER) {
            harnessed ! 1;
        }
        
        delivering = true;
        /*
            simulating delivering toys
        */
        delivering = false;
        reindeer_ready = false;
        i = 0;
    
    // Some code omitted for brevity - complete code linked below
}

// Correctness property 
ltl safety { [] (delivering -> actually_harnessed == NUM_REINDEER) }

This Promela model is a small “executable spec” of the Santa problem. The full model and instructions for running it are available in the repository ². To keep things simple, I mainly discuss the Santa–reindeer interaction; elves are present only to keep the overall structure similar.

The syntax active [N] proctype P(){…} tells SPIN to start N concurrent copies of process P in the initial state, so here you get 9 Reindeer processes, 3 Elves processes, and 1 Santa process. Each reindeer loops by rendezvous-sending on the r_arrive channel, so it blocks until Santa receives, then waits for a harness message on harnessed. Santa loops, counting reindeer arrivals. Once all nine reindeer have arrived, he “harnesses” them by sending nine messages on harnessed and then briefly sets delivering = true to represent toy delivery.

The bug is that harnessed is a buffered channel, so Santa can enqueue all nine harness messages and immediately proceed; nothing forces the reindeer to have actually received them yet. That violates the puzzle constraint that Santa must deliver with the whole group participating together. To check whether this implementation respects the puzzle’s constraints, we write correctness properties in Linear Temporal Logic (LTL).

In simple terms, an LTL property describes a rule that must hold over time as the system runs. The model checker constructs the system’s state space by enumerating all reachable states, each state being a snapshot of all variables and the current position of each process in its code, under all possible interleavings. You can think of this state space as a graph: each node is a state, and each edge represents a single step the program can take as processes interleave. An LTL property is then checked against this entire state space: when we say a condition must “always” hold, we mean it must be true in every reachable state that SPIN explores.

The LTL property ltl safety { [] (delivering -> actually_harnessed == NUM_REINDEER) } encodes that requirement. Here [] means “always,” so the property states that in every reachable state, if delivering is true, then all nine reindeer must already be harnessed.

When SPIN checks this property, it finds an execution in which Santa sets delivering = true before all nine reindeer have received their harness signals. In other words, Santa starts delivering toys while some reindeer are still unharnessed. The property fails, and SPIN produces a counterexample showing exactly how this execution occurs. The code shows this model, along with instructions for running it and viewing the trace that contains the counterexample.

Now let’s look at a second failure scenario, in which Santa may end up doing the seemingly impossible, delivering toys with the reindeer and consulting with the elves at the same time. The complete model that simulates this fault is available here ³; an excerpt is shown below.

To make this bug easy to trigger, I did something silly: I split Santa into two: one responsible for toy delivery and one for consultation. Each process independently waits for its own group to form and then sets the corresponding flag (delivering or consulting) to indicate that work has begun. Because these two processes run concurrently and share no mutual exclusion, there exists an interleaving in which both flags become true at the same time.

The code violates a core constraint of the puzzle: Santa must perform only one indivisible action at a time. By adding an assert statement that asserts Santa never delivers and consults simultaneously, SPIN is able to find a concrete counterexample trace showing exactly how this impossible situation arises under a particular interleaving.

active proctype SantaConsulting()
{

 byte e = 0;
    byte j;
    end:
    do
    
    :: (e < NUM_ELVES) ->
        e_arrive ? 1;
        e++;
    :: (e == NUM_ELVES) ->

        consulting = true;
        
        assert !(consulting && delivering);
        // We release elves here
        consulting = false;
        e = 0;
    od 

}
active proctype SantaToyDelivery()
{
    byte i = 0;
    byte j=0;
    end:
    do
    :: (i < NUM_REINDEER) ->
        r_arrive ? 1;
        i++;
        
    :: (i == NUM_REINDEER) ->
        
        delivering = true;
        // We release the reindeer here
        delivering = false;
        
        i = 0;
        
    
    od
}

// correctness property

ltl reindeer_precedence {
    [] ( (r_count == NUM_REINDEER) -> ( (!consulting) U delivering ) )
}

Now let’s examine the final failure scenario, in which Santa may consult with elves even when a full group of nine reindeer is already waiting. The complete model that simulates this fault is available here ⁴; structurally, it is very similar to the previous one.

In this version, Santa’s consultation logic does not check whether a full reindeer group is ready before proceeding. As a result, there exists an interleaving in which all nine reindeer have arrived and a group of three elves is also ready, yet Santa begins consulting with the elves. This violates a central constraint of the puzzle: when both groups are waiting, reindeer must be served before elves.

I found it tricky to detect this using only the tools discussed so far, and then discovered a precedence property expressed in LTL. Informally, the property states: once a full group of nine reindeer is ready, consulting must not occur before delivery begins. We encode this using the LTL ‘until’ operator (written as ‘U’), which lets us state that consulting must remain false until delivery occurs. When SPIN checks this property, it finds a counterexample in which Santa consults first, demonstrating that the implementation violates the required precedence rule.

There’s one more rule that all of the models above break: Santa must not do the marshalling himself. In the above models, Santa is directly counting arrivals and releasing reindeer and elves, which makes it easy to mix up group formation with the work itself.

The fix is to give Santa some help. Following Ben-Ari ¹, we introduce rooms. A room’s only job is to collect arrivals and let Santa know when a complete group is ready. Santa never counts arrivals anymore, he simply waits for a “group ready” signal and then does the work.

To build a correct model, we use one room for the reindeer and another for the elves. The key idea is to separate marshalling (handled by the rooms) from serving. Let’s now look at the modified sequence of events.

When a group of nine reindeer arrives at the reindeer room, the room signals Santa. While this is happening, two elves also arrive. Santa delivers toys and releases the reindeer. When the third elf later arrives, the elf room signals Santa; Santa consults with the elves and then releases them.

To model this behavior in SPIN, each component is represented as a separate process. SPIN explores all possible interleavings of these processes and checks that specified correctness properties hold in every reachable state.

We model the system using the following processes:

Reindeer: We run nine reindeer processes concurrently. Each reindeer arrives at the reindeer room and waits there until it is released.

Elves: We also run ten elf processes concurrently. Each elf arrives at the elf room and waits there until it is released.

Reindeer room: We run a single instance of this process, which simulates the reindeer room. It collects arriving reindeer, signals Santa when a full group of nine has formed, and releases the group after Santa completes delivery.

Elf room: Similarly, we run a single instance of this process to simulate the elf room. It collects arriving elves, signals Santa when a group of three has formed, and releases the group after Santa completes consultation.

Santa: A single Santa process waits for “group ready” signals from the rooms. It serves reindeer with priority over elves and notifies the appropriate room when its work is complete.

To communicate between processes, we use the following rendezvous channels, each carrying messages of type bit:

chan r_arrive  = [0] of { bit };
chan e_arrive  = [0] of { bit };
chan r_release = [0] of { bit };
chan e_release = [0] of { bit };
chan r_done    = [0] of { bit };
chan e_done    = [0] of { bit };

We append an r to channel names that transfer data for reindeer, and an e to those for elves. Below are the roles of the main channels:

r_arrive / e_arrive: These rendezvous channels represent how reindeer and elves enter their rooms. A reindeer tries to enter by sending on r_arrive; because the channel is rendezvous, it blocks until the room is ready to receive. When the room is full, it stops receiving arrivals, so additional reindeer (or elves) block until the room re-opens.
r_release / e_release: These channels represent how a room releases a complete group. After Santa finishes, the room sends on r_release (or e_release) once per group member. Because the channel is rendezvous, the room cannot “release ahead”, each send synchronizes with an actual waiting reindeer (or elf).
r_done / e_done: These channels represent Santa’s “I’m finished” notification to the room. The room waits for this signal before releasing anyone. For example, after Santa finishes delivering toys, he sends on r_done; without it, the room would not know when it is safe to release the reindeer.

Now let’s look at the reindeer and elf processes, which are quite similar. We’ll start with the reindeer process:

active [9] proctype Reindeer()
{
    do
    :: r_arrive ! 1;     // enter the room
       r_release ? 1;    // wait to be released
    od
}

Because of the active [9] keyword, SPIN starts nine reindeer processes running concurrently in the initial system state. The statement r_arrive ! 1 indicates that a reindeer is attempting to enter the reindeer room by sending on the r_arrive channel. Because r_arrive is a rendezvous channel, the reindeer blocks until the room is ready to accept it. After entering the room, the reindeer waits to be released by blocking on the r_release channel.

The code for the elf process is similar:

active [10] proctype Elf()
{
    do 
    :: e_arrive ! 1;
       e_release ? 1;
    od
}

The room is where marshalling and synchronization take place. Let’s now look at the reindeer room.

active proctype RoomReindeer()
{
    byte waiting = 0;
    byte i = 0;

    do
    /* Collect reindeer until the group is complete (entrance open iff waiting < 9) */
    :: (waiting < NUM_REINDEER) ->
        atomic {
            r_arrive ? 1;
            waiting++;
            reindeer_waiting = waiting;
            if
            :: (waiting == NUM_REINDEER) -> r_request = 1
            :: else -> skip
            fi
        }

    /* Group complete: wait for Santa, then release all 9 and reset */
    :: (waiting == NUM_REINDEER) ->
        atomic { r_done ? 1; i = 0; }

        do
        :: (i < NUM_REINDEER) ->
            atomic { r_release ! 1; i++; }
        :: else -> break
        od;

        atomic {
            waiting = 0;
            reindeer_waiting = 0;
        }
    od
}

The repetition structure has two alternatives. When waiting is less than NUM_REINDEER, the room accepts arriving reindeer and increments the counter. If the arrival completes the group, the room sets r_request = 1 to notify Santa.

When waiting equals NUM_REINDEER, the room stops accepting arrivals and waits for Santa to message completion on r_done. After receiving this message, the room releases the reindeer via r_release and resets its state to accept new arrivals.

The logic for the elf room follows the same pattern and is omitted here for brevity.

The Santa process models two kinds of work: toy delivery with the reindeer and consultation with the elves. Its main loop contains two guarded alternatives.

In the first alternative, guarded by r_request, Santa performs the toy delivery action, clears r_request, and signals completion to the reindeer room by sending on r_done. The corresponding code is shown below.

:: r_request ->
        atomic {
            delivering = true;
            r_request = 0;   /* claim the request */
        }

        /* Deliver toys (abstracted). */
        delivering = false;

        /* Notify the Room that delivery is complete; it can release the reindeer. */
        r_done ! 1;

The second alternative is guarded by !r_request && e_request. This guard enforces the priority rule: Santa consults with the elves only if no reindeer group is waiting and a group of three elves has formed. Santa then performs the consultation and signals completion to the elf room by sending on e_done.

:: (!r_request && e_request) ->
        atomic {
            consulting = true;
            e_request = 0;   /* claim the request */
        }

        /* Consult (abstracted). */
        consulting = false;

        /* Notify the Room that consultation is complete; it can release the elves. */
        e_done ! 1;

How do we validate that the model is correct?

We can specify two kinds of correctness properties: safety properties and liveness properties. Let’s look at them one by one.

Safety properties assert that nothing bad ever happens. We use three safety properties in this model. First, we ensure that Santa always delivers toys with exactly nine reindeer, that is, a full group of nine reindeer must be present in the room. We specify this as:

ltl safety_delivery { [] (delivering -> reindeer_waiting == 9) }

This property states that, at all times, if Santa is delivering toys, then the variable reindeer_waiting must equal 9.

Second, we ensure that Santa always consults with exactly three elves. We specify this as:

ltl safety_consult { [] (consulting -> elf_waiting == 3) }

Finally, we specify a mutual-exclusion property to ensure that Santa never delivers toys and consults at the same time:

ltl mutex_santa { [] !(delivering && consulting) }

You might wonder why the variables delivering and consulting are set to true only briefly. In this model, delivery and consultation are not simulated in detail, they are represented as single abstract steps. The boolean flags simply mark the moment when Santa starts a delivery or consultation, so that we can state and check correctness properties. Their purpose is not to model time passing, but to make illegal overlaps visible to the model checker.

Now let’s turn to the liveness property. Liveness properties state that something good eventually happens. The liveness property we specify is: always, if a request is pending, then eventually Santa will either be delivering toys or consulting with elves. This rules out executions where requests remain pending forever without Santa ever doing work.

We specify this property as follows (where [] denotes “always” and <> denotes “eventually”):

ltl live_progress { [] ((r_request || e_request) -> <> (delivering || consulting)) }

One important role of a liveness property is to avoid vacuous correctness. For example, a safety property such as [] (delivering -> reindeer_waiting == 9) can hold even if delivering is never true (‘->’ is an implication, and in logic an implication “A implies B” is considered true whenever A is false, regardless of whether B is true or false.). In contrast, this liveness property rules out executions where a request remains pending forever without Santa ever taking a delivery or consultation step, in other words, it checks that some service step eventually occurs whenever a request is pending.

When we run the model in SPIN, all safety and liveness properties pass. In separate runs, SPIN explored about 5 million states while checking safety_delivery and mutex_santa, and about 7 million states while checking the liveness property and found no counterexamples. In a similar fashion, we can also specify and check the precedence property.

Both the model and the instructions for running it are available in the repository ⁵.

Let’s now look at an example execution to see how these pieces fit together.

Time flows from top to bottom. When the ninth reindeer arrives, the reindeer room sets r_request and closes the room to further arrivals. While r_request remains true, Santa serves the reindeer, even if elves arrive, because the elf branch is disabled by the guard !r_request.

After Santa signals completion by sending on r_done, the room releases all nine reindeer via rendezvous. When a group of three elves later forms, the model sets e_request. Since r_request is now false, Santa serves the elves.

To transform this model into a computer program, I thought I would try implementing it in Go. I do not know Go particularly well, so I collaborated with an AI to translate the Promela design into a simple, idiomatic Go implementation. I reviewed the code ⁶ to keep it faithful to the model, but any mistakes are mine.

Please keep in mind that I’m only human, and there’s a chance this post contains errors. If you notice anything off, I’d appreciate a correction. Please feel free to send me an email.

Endnotes:

Ben-Ari, M. How to Solve the Santa Claus Problem. Concurrency: Practice and Experience, 1998. ↩ ↩²
Link to the source code for the first failure scenario: https://github.com/wyounas/model-checking/blob/main/puzzles/santa_claus/santa_bug_deliver_without_full_group.pml ↩
Link to the source code for the second failure scenario: https://github.com/wyounas/model-checking/blob/main/puzzles/santa_claus/santa_bug_deliver_and_consult_simultaneously.pml ↩
Link to the source code for the third failure scenario: https://github.com/wyounas/model-checking/blob/main/puzzles/santa_claus/santa_bug_consult_before_delivery.pml ↩
Link to the source code for the room-based solution: https://github.com/wyounas/model-checking/blob/main/puzzles/santa_claus/santa_claus.pml ↩
Link to the Go implementation of the room-based solution: https://github.com/wyounas/model-checking/blob/main/puzzles/santa_claus/santa_claus.go ↩

Reproducing the AWS Outage Race Condition with a Model Checker

Thu, 30 Oct 2025 00:00:00 -0400

AWS published a post-mortem about a recent outage [1]. Big systems like theirs are complex, and when you operate at that scale, things sometimes go wrong. Still, AWS has an impressive record of reliability.

The post-mortem mentioned a race condition, which caught my eye. I don’t know all the details of AWS’s internal setup, but using the information in the post-mortem and a few assumptions, we can try to reproduce a simplified version of the problem.

As a small experiment, we’ll use a model checker to see how such a race could happen. Formal verification can’t prevent every failure, but it helps us think more clearly about correctness and reason about subtle concurrency bugs. Of course, writing about it after the incident benefits from hindsight, we already know what went wrong. For this experiment, we’ll use the Spin model checker, which uses the Promela language.

There’s a lot of detail in the post-mortem, but for simplicity we’ll focus only on the race-condition aspect. The incident was triggered by a defect in DynamoDB’s automated DNS management system. The components of this system involved in the incident were the DNS Planner, DNS Enactor, and Amazon Route 53 service.

The DNS Planner creates DNS plans, and the DNS Enactors look for new DNS plans and apply them to the Amazon Route 53 service. Three Enactors operate independently in three different availability zones.

Here is an illustration showing these components (if the images appear small, please open them in a new browser tab):

My understanding of how the DNS Enactor works is as follows: it picks up the latest plan and, before applying it, performs a one-time check to ensure the plan is newer than the previously applied one. It then applies the plan and invokes a clean-up process. During the clean-up, it identifies plans significantly older than the one it just applied and deletes them.

Using the details from the incident report, we could sketch an interleaving that could explain the race condition. Two Enactors running side by side: Enactor 2 applies a new plan and starts cleaning up, while the other, running just a little behind, applies an older plan, making it an active one. When the Enactor 2 finishes its cleanup, it deletes that plan, and the DNS entries disappear. Here’s what that sequence looks like:

Let’s try to uncover this interleaving using a model checker.

In Promela, you can model each part of the system as its own process. Spin then takes those processes, starts from the initial state, and systematically applies every possible transition, exploring all interleavings to build the set of reachable states [2]. It checks that your invariants hold in each one, and if it finds a violation, it reports a counterexample.

We’ll create a DNS Planner process that produces plans, and DNS Enactor processes that pick them up. The Enactor will check whether the plan it’s about to apply is newer than the previous one, update the state of certain variables to simulate changes in Route 53, and finally clean up the older plans.

In our simplified model, we’ll run one DNS Planner process and two concurrent DNS Enactor processes. (AWS appears to run three across zones; we abstract that detail here.) The Planner generates plans, and through Promela channels, these plans are sent to the Enactors for processing.

Inside each DNS Enactor, we track the key aspects of system state. The Enactor keeps the current plan in current_plan, and it represents DNS health using dns_valid. It also records the highest plan applied so far in highest_plan_applied. The incident report also notes that the clean-up process deletes plans that are “significantly older than the one it just applied.” In our model, we capture this by allowing an Enactor to remove only those plans that are much older than its current plan. To simulate the deletion of an active plan, the Enactor’s clean-up process checks whether current_plan equals the plan being deleted. If it does, we simulate the resulting DNS failure by setting dns_valid to false.

Here’s the code for the DNS Planner:

active proctype Planner() {
    byte plan = 1;
    
    do
    :: (plan <= MAX_PLAN) ->
        latest_plan = plan;
        plan_channel ! plan; 
        printf("Planner: Generated Plan v%d\n", plan);
        plan++;
    :: (plan > MAX_PLAN) -> break;
    od;
    
    printf("Planner: Completed\n");
}

It creates plans and sends them over a channel (plan is being sent to the channel plan_channel) to be picked up later by the DNS Enactor.

We start two concurrent DNS Enactor processes by specifying the number of enactors after the active keyword.

active [NUM_ENACTORS] proctype Enactor()

The DNS Enactor waits for plans and receives them (? opertaor receives a plan from the channel plan_channel). It then performs a staleness check, updates the state of certain variables to simulate changes in Route 53, and finally cleans up the older plans.

:: plan_channel ? my_plan ->
    snapshot_current = current_plan;

    // staleness check    
    if
    :: (my_plan > snapshot_current || snapshot_current == 0) ->

        if
            :: !plan_deleted[my_plan] ->
                /* Apply the plan to Route53 */
                
                current_plan = my_plan;
                dns_valid = true;
                initialized = true;
               /* Track highest plan applied for regression detection */
                if 
                :: (my_plan > highest_plan_applied) ->
                    highest_plan_applied = my_plan;
                fi 
            
            // runs the clean-up process (omitted for brevity, included in the 
            // code linked below)
        fi
    fi

How do we discover the race condition? The idea is this: we express as an invariant what must always be true of the system, and then ask the model checker to confirm that it holds in every possible state. In this case, we can set up an invariant stating that the DNS should never be deleted once a newer plan has been applied. (With more information about the real system, we could simplify or refine this rule further.)

We specify this invariant formally as follows:

/*

A quick note on some of the keywords used in the invariant below:

ltl - keyword that declares a temporal property to verify (ltl: linear temporal logic lets you specify properties about all possible executions of your program.)

[] - "always" operator (this must be true at every step forever)

-> - "implies" (if left side is true, then right side must be true)

*/

ltl no_dns_deletion_on_regression {
    [] ( (initialized && highest_plan_applied > current_plan 
            && current_plan > 0) -> dns_valid )
}

When we start the model checker, one DNS Planner process begins generating plans and sending them through channels to the DNS Enactors. Two Enactors receive these plans, perform their checks, apply updates, and run their cleanup routines. As these processes interleave, the model checker systematically builds the set of reachable states, allowing the invariant to be checked in each one.

When we run the model with this invariant in the model checker, it reports a violation. Spin reports one error and writes a trail file that shows, step by step, how the system reached the bad state.

$ spin -a aws-dns-race.pml
$ gcc -O2 -o pan pan.c                                                       
$ ./pan -a -N no_dns_deletion_on_regression  

pan: wrote aws-dns-race.pml.trail

(Spin Version 6.5.2 -- 6 December 2019)

State-vector 64 byte, depth reached 285, errors: 1
    23201 states, stored
    11239 states, matched
    34440 transitions (= stored+matched)
  (truncated for brevity....)

The trail file in the repository below shows how the race happens. The trail file shows that two Enactors operate side by side: the faster one applies plan 4 and starts cleaning up. Because cleanup only removes plans much older than the one just applied, it deletes 1 and 2 but skips 3. The slower Enactor then applies plan 3 and makes it active, and when the faster Enactor picks up cleanup again, it deletes 3 and the DNS goes down.

Here’s an illustration of the interleaving reconstructed from the trail:

Before publishing, I reread the incident report and noted: “Additionally, because the active plan was deleted, the system was left in an inconsistent state…”. This suggests a direct invariant: the active plan must never be deleted.

ltl never_delete_active {
    [] ( current_plan > 0 -> !plan_deleted[current_plan] )
}

Running the model checker with this invariant produces essentially the same counterexample as before: one Enactor advances to newer plans while the other lags and applies an older plan, thereby making it active. When control returns to the faster Enactor, its cleanup deletes that now-active plan, violating the invariant.

Invariants are invaluable for establishing correctness. If we can show that an invariant holds in the initial state, in every state reachable from it, and in the final state as well, we gain confidence that the system’s logic is sound.

To fix the code, we execute the problematic statements atomically. You can find both versions of the code, the one with the race and the fixed one, along with the interleaving trail in the accompanying repository [3]. I’ve included detailed comments to make it self-explanatory, as well as instructions on how to run the model and explore the trail.

Some of the assumptions in this model are necessarily simplified, since I don’t have access to AWS’s internal design details. Without that context, there will naturally be gaps between this abstraction and the real system. This model was created in a short time frame for experimental purposes. With more time and context, one could certainly build a more accurate and refined version.

Please keep in mind that I’m only human, and there’s a chance this post contains errors. If you notice anything off, I’d appreciate a correction. Please feel free to send me an email.

References

When the Simplest Concurrent Program Goes Against All Intuition

Mon, 13 Jan 2025 00:00:00 -0500

I came across a fascinating and surprising aspect of a seemingly simple concurrent program when run on a model checker. Consider this:

If we run P and Q concurrently with ‘n’ initialized to zero, what could be the lowest value of ‘n’ when the two processes finish executing their statements on a model checker? Can a model checker also help us find the extreme interleaving that produces this lowest value of ‘n’?

I thought the final value of ‘n’ would be between 10 and 20. What do you think? Take a guess.

I came across this in Ben-Ari’s book on the SPIN model checker [1]. He said he was shocked to discover an extreme interleaving that set the final value of ‘n’ to 2. I was shocked as well—the outcome completely defied my intuition.

To see this in action, we could write a simple program in SPIN and claim that there is a computation where the value is 2. We can obtain this computation automatically by adding the assertion assertion (n > 2) at the end of the program and running a verification. SPIN searches the state space, looking for counterexamples.

Here is the program in PROMELA (a simple language used for writing modes in SPIN) followed by the trail that shows the extreme interleaving (statements in PROMELA are atomic):

When I ran this, the error appeared:

You can check out the complete interleaving trail in this file. I have never seen an interleaving this extreme.

I tried to draw the illustration of the trail and here is a rough summary (though you should check out the above trail, it’s not very long):

Process 1 sets ‘temp’ to 1, and then Process 2 is scheduled and continues executing until ‘n’ is set to 9. At this point, Process 1 takes over and sets ‘n’ to 1. Process 2 is then scheduled again, setting ‘temp’ to 2 (because Process 1 had just set ‘n’ to 1). Process 1 resumes execution and exhausts the loop. Finally, Process 2 executes once more and, using the value it had set for temp (which was 2), sets ‘n’ to 2.

After seeing this, I wondered: Is it possible to observe a computation in practice where ‘n’ is set to 2? I think it’s highly unlikely to create such a computation in practice. A Go expert shared this following code with me, which limits execution to one thread and explicitly reschedules operations. When you run this Go program, the value of ‘n’ is sometimes 11, sometimes 10, but never less than 10.

package main

import "runtime"

func main() {
	runtime.GOMAXPROCS(1)

	n := 0
	done := make(chan bool)
	for range 2 {
		go func() {
			for range 10 {
				runtime.Gosched()
				t := n + 1
				runtime.Gosched()
				n = t
			}
			done <- true
		}()
	}
	<-done
	<-done
	println(n)
}

Anyway, I found it not only counterintuitive but fascinating, even though we may not encounter it in practice. So, I thought I should share it.

I wonder if it’s possible to create this computation in any other way, or a computation with the value of ‘n’ lower than 10?

This one surprised me. What’s the simplest concurrent program that has surprised you the most?

References:

Principles of the Spin Model Checker by Mordechai Ben-Ari.

How concurrency works: A visual guide

Thu, 12 Dec 2024 00:00:00 -0500

Concurrent programming is hard.

Mentally enumerating all the possible states that complex concurrent code might go through is far from easy. Visualizing concurrency can make it easier to understand how these programs operate, especially for those just beginning to learn about concurrency.

Such visualizations might not always be effective for larger or more complex systems. But even with complex systems, breaking them down into smaller models and visualizing those models can be an excellent way to understand what’s happening behind the scenes.

Recently, I’ve been exploring model checking and found the mindset it enforces to be not only intriguing but also quite powerful. Leslie Lamport, the renowned researcher in distributed systems and concurrency, has a brilliant quote: If you’re thinking without writing, you only think you’re thinking. For large and complex distributed or concurrent programs, I believe this principle extends further: If you’re implementing without formally verifying your solution through model checking, you only think you’re implementing it correctly.

Model checking is a powerful tool, and I’ve come across a few resources that can help in understanding concurrency. This inspired me to write about them—not only to deepen my own understanding but also to share what I’ve learned. We’ll begin by exploring how to visualize the execution of a sequential program, then move on to visualizing a concurrent one. Finally, we’ll touch on how to reason about the correctness of concurrent programs.

Alright, let’s dive in!

We can visualize how concurrent programs operate by exploring their state space. A few questions arise immediately: What is a state? And what is a state space? Let’s tackle the first question first, then circle back to the second.

A program’s state is defined by the values of its variables and the location counter (which indicates the next instruction to be executed). Let’s walk through an example using a simple C-like language. We’ll look at the states and demonstrate how states transition during the sequential (non-concurrent) execution of the following program. (We’ll dive into a concurrent program example right after this.)

The program above consists of three instructions, and for clarity, labels are added as location counters on the far left (i.e., 1, 2, 3, and end). The program includes a single variable, n, which is declared as a byte.

The state of the program can be represented as a tuple containing the value of the variable n and the location counter: (n, location counter). For example, when the program starts, its initial state is (undefined, 1). This is because, at the beginning, none of the program’s statements have been executed, leaving the value of n as “undefined,” while the location counter points to the first instruction, labeled as 1.

We can visualize this state as follows: the orange arrow represents the location counter, and the current value of n is displayed at the bottom for clarity.

To move to the next state, we can advance the location counter. So we advance the location counter to 2 after executing the statement at location counter 1, the next state becomes (0, 2), indicating that n is now 0 and the location counter points to 2. This state, (0, 2), reflects that the value of n has been set to 0 as a result of executing the first statement and is visualized as follows:

The location counter increments again, and the next state becomes (1, 3). The value of n has changed to 1 because, in the previous state, the location counter was 2, and executing the statement at that location set n to 1. The situation now looks like this:

Finally, the location counter increments one last time to reach the end, marking the completion of execution. At this point, the value of n remains 1.

A given sequence of states, starting from an initial state and continuing as statements execute, is called a computation. For example, for the program above, the following is a computation:

As explained earlier, the state of a program is defined by the values of its variables and the location counter (which points at the next instruction to be executed). The state space of a program is simply the set of all possible states that can possibly occur during a computation. Ben-Ari offers a formal definition of state space in [2]: “The state space of a program is a directed graph; each reachable state is a node, and there is an edge from state s1 to state s2 if a transition enabled in s1 moves the computation to s2.”

Essentially, a state space represents all the states a concurrent program goes through during its execution. By examining the state space, you can understand the full range of program behavior, including unexpected scenarios that might arise from the intricate sequencing of concurrent operations. For this reason, I believe understanding how a state space is generated for a concurrent program can help us grasp how that program works on a more fundamental level.

In model checking, the state space is also used to reason about the correctness of concurrent programs—another reason to understand how it works. We’ll explore how to reason about correctness at the end.

We’ll now consider two procedures, running concurrently, to observe how their states transition and how the state space is generated. The program includes a global variable, n, and two procedures, P and Q, each consisting of the following statements:

These are two simple procedures, similar in their operation, each with its own location counter.

How can we represent a state for the program described above?

We can represent the state as a triple consisting of the value of the variable n and the location counters of P and Q: (n, location counter of P, location counter of Q). To make it even clearer, we’ll represent the state as (n, P: location-counter, Q: location-counter). For example, (5, P: 1, Q: 2) indicates that the value of n is 5, the location counter in P is at 1, and the location counter in Q is at 2.

For the program above, the initial state could be (0, P: 1, Q: 2), where the value of n is 0, and the location counters are at the first statements of both P and Q. We can visualize this initiate state with the visualization below, with orange arrows indicating the location counters and the value of n displayed at the bottom:

Let’s explore two possible transitions from this initial state.

We can transition to the next state by incrementing the location counter in P, resulting in (1, P: end, Q: 2). Here, n becomes 1, reflecting the result of the evaluated expression when the statement at location counter 1 was executed. The state at this point looks like this:

The second possible next state from the initial state could be reached by incrementing the location counter in Q. That next state we get by incrementing the location counter in Q is (2, P: 1, Q: end), and we can visualize it like this:

The location counter of Q reached its end, while P remained at location counter 1, and the value of n is 2.

So, the transition from the initial state (0, P: 1, Q: 2) to these two states could be represented as state space like this (left arrow points to the next state after an increment in the location counter of P whereas right arrow points to the next state after an increment in the location counter of Q) :

Next, we’ll write the above program in PROMELA (a C-like language that SPIN [1] uses; SPIN is a software verification tool and a model checker) and then visualize its state space.

The code is C-like. The keyword proctype is used to declare a process, while the keyword active activates a process, ensuring that P and Q run concurrently when the program starts.

As before, we represent the state as a triple: (n, location counter of P, location counter of Q). In the image above, the line numbers on the far left serve as location counters.

Below is the state space of the program:

Figure 1

The edges in the state space are labeled with orange text, indicating what triggers the transition to the next state. This state space shows that at the end of the program, the value of n could be either 1 or 2. (As an aside: In concurrency, there is also a concept called interleaving, where statements are chosen nondeterministically from processes. For example, a statement from process P could execute first, followed by one from Q, and so on. In the state space above, depending on the interleavings, we could say that at the end, the value of n might be either 1 or 2.)

Let’s quickly explore how state transitions occur along the right-hand path of the above state space in Figure 1.

Starting with the initial state at the top of the state space, (0, P: 5, Q: 9),we can represent it as:

Starting from the initial state (0, P: 5, Q: 9), if we increment the location counter in Q, moving it to line 10, the statement at line 9 executes, setting n to 2. This brings us to the next state, (2, P: 5, Q: 10):

From this state, we transition to the next state by incrementing the location counter in P, resulting in (1, P: 6, Q: 10). At this point, the value of n becomes 1 as the statement at location counter 5 executes, as shown below:

This marks the end of the computation along the right-hand path of the state space shown in Figure 1.

What happens if we extend P and Q by adding more statements? Does the state space expand, or does it shrink? Let’s find out.

Suppose we add a few more lines to both P and Q—introducing a local variable, assigning it a value, and setting the global variable total to a constant (as shown in Figure 2 below). While these changes might seem minor, they can impact the state space.

To explore the effect of these changes on the state space, I generated a visualization using jSpin [3], an excellent educational tool by Ben-Ari built on top of Spin.

Figure 2

The code is displayed on the left side, while the state space is generated as a graph on the right side. In the graph, each box contains three lines: the top line shows the location counter in P, the middle line shows the location counter in Q, and the last line displays the value of the variable total.

The state space has expanded dramatically compared to the earlier example.

Now, imagine a concurrent program with dozens of processes, each executing thousands of lines of code with numerous variables. The state space would grow exponentially. Even small programs, if not managed carefully, can generate massive state spaces.

To further examine the impact of concurrency on state space size, I modified the above program to run five concurrent processes instead of two. As a result, the state space grew so large that, in the final visualization, each state appeared as a tiny dot, rendering the entire image unreadable.

This demonstrates just how challenging it can be to debug and test large or complex concurrent programs. You may need to address numerous scenarios, each leading to countless states. While creating visualizations for such large programs might not be practical (at least with jSpin), you can simplify your program into smaller models and generate visualizations from them.

These simplified visualizations can provide valuable insights into your implementation and serve as an excellent starting point for understanding what’s happening behind the scenes. They’re especially useful as a first step in helping folks grasp the complexities of concurrency.

State space can help us reason about the correctness of concurrent and distributed systems.

To increase confidence in a program’s correctness, we can define certain invariants or correctness properties that must hold across a program’s state space. Model checkers like SPIN, for one, can verify that these properties are upheld throughout the entire state space. For instance, we might define a property such as “X should always be true,” meaning that X must not be false in any state.

When reasoning about large and complex distributed systems, this invariant-based thinking becomes essential for ensuring correctness. To achieve this, we rely on two key types of properties:

Safety properties: Ensuring that nothing bad happens.
Liveness properties: Ensuring that something good eventually happens.

Model checkers like SPIN and TLA+ allow us to verify these properties.

Let’s try to come up with a safety property for the program shown in Figure 2 above. We want the safety properties to hold for every state in the program’s state space. Since a state is defined by a combination of the location counter and variables, for this example, we’ll focus on defining our property for the variable total. Please note that this is a somewhat contrived example, intended to demonstrate concepts.

For example purposes, to define a safety property, we can try to come up with an expression for the variable total that holds true in all program states. Consider the following expression as a safety property, where we expect total to be 1 in all states:

total == 1

Would this hold true in every state? Let’s revisit the state space, represented as a directed graph on the right side of the code:

Does total equal 1 in all states? No, the above wouldn’t hold in every state because the value of total can be 0, 1, or 2 (as shown at the bottom of each square in the state space). There are states where total is 0 and others where it is 2, so the property total == 1 doesn’t hold universally. Similarly, another property like total == 2 wouldn’t hold either, for the same reason.

How about this instead?

total == 1 || total == 2

Since there are states where the value of total is 0, the previous property wouldn’t hold universally either. However, the following does hold across all states:

total == 1 || total == 2 || total == 0

We almost derived a safety property by exploring the state space—this was just for example purposes. In practice, you may define safety and liveness properties even before writing any models or code.

As we examine the above expression, a question arises: How can we express it as a safety property and use it to validate correctness in SPIN?

Expressions like the one above can be expressed as a program’s safety property using Linear Temporal Logic (LTL). LTL is based on propositional logic and allows formulas to include both logical operators and temporal operators, such as:

Always ([] in SPIN): Ensures a condition holds in all states.
Eventually (<> in SPIN): Ensures a condition will hold in some future state.

As an example, we can express the above expression as a safety property in LTL:

[] (total == 1 || total == 2 || total == 0)

This reads as: Always, total is either 1, 2, or 0. We can use this property in SPIN and validate the program’s correctness by ensuring it holds across the state space [4]. Our confidence in concurrent programs increases when we know that a certain property holds across the state space.

There is much more to LTL, and we’ll dive deeper into it in future articles. Validating safety and liveness properties with LTL is especially useful for large or complex programs.

In complex concurrent or distributed systems with thousands of possible states, relying on unit tests alone makes it hard to be confident in the solution’s correctness. Most of us would struggle to keep such a vast state space in mind while writing tests. Concurrency bugs are also tough to catch during testing as traditional methods, such as unit or integration tests (though essential), might overlook the issues. The timing and interleaving of events make these bugs hard to find and reproduce. This often leads to dealing with “heisenbugs”—bugs that behave unpredictably and are difficult to track down.

This all is what makes model checkers excel—they allow you to verify program correctness across a wide range of scenarios. Please stay tuned as I plan to write more about this.

Thanks to Gerard Holzmann, Hillel Wayne, Jack Vanlightly, and Murat Demirbas for reading drafts of this.

Please email with questions, ideas, or corrections.

References

More detail and installation instructions are available at https://spinroot.com. And the overview paper The Model Checker Spin, IEEE Trans. on Software Engineering Vol. 23, No. 5, May 1997, pp. 279-295.
Mathematical Logic for Computer Science, 3rd edition, by Mordechai Ben-Ari.
For state space visualization, I used https://github.com/motib/jspin.
You can specify the safety property in a .prp file and, assuming your PROMELA code is in safety.pml, run the following commands to validate it:
```
     $ spin -a -F safety.prp  safety.pml
     $ gcc -DSAFETY -o pan pan.c
     $ ./pan
```
Top image courtesy of https://geek-and-poke.com/

How Rendezvous Channels Work in PROMELA (SPIN)

Mon, 18 Nov 2024 00:00:00 -0500

PROMELA is a language used to write models that can be validated with the SPIN model checker. To facilitate data exchange between processes, PROMELA provides the concept of channels. These channels are essential when creating a model where processes need to communicate and share data.

Channels allow two processes to exchange information. One process can send data through a channel, and another process can receive it from the same channel. In PROMELA, there are two types of channels:

Rendezvous channels: These have zero capacity and require both sender and receiver to synchronize.
Buffered channels: These have a capacity greater than zero and can store messages until they are read.

In this blog, we’ll focus on rendezvous channels.

In PROMELA, you can initialize channels using the following syntax:

chan channelname = [capacity] of {typename}

For example, you can create a rendezvous channel of type byte like this (meaning the channel can send and receive data of type byte):

chan ch = [0] of {byte}

Rendezvous channels have a capacity of zero. I think of a zero-capacity channel as a medium that simply exchanges data without storing it, since it has no capacity to hold messages.

To send a value on the channel, you use a send statement, which consists of the channel variable followed by an exclamation mark (!), and then the message value. For example, the following statement sends the value 10 on the channel we created earlier:

ch ! 10

To receive a value, you use a receive statement, which consists of the channel variable followed by a question mark (?), and then the variable where the received value will be stored. For example:

byte value
ch ? value

In this example, the channel ch facilitates a handshake followed by the transfer of the value. Communication via a rendezvous channel is synchronous, meaning only two processes can engage in the handshake at a time.

When writing models, one process may send data while another receives it. If a rendezvous channel’s “send” statement is executed but there is no matching “receive” statement, the sending process will block. Similarly, a process with a “receive” statement will block if there is no matching “send” statement.

Here’s an example where a value is sent but never received:

chan ch = [0] of {byte}; // Rendezvous channel

active proctype Sender() {
    ch ! 10
}

In this case, the Sender process will block indefinitely because there is no corresponding receive statement. (To run above code, save the code in “file.pml” and then run “spin file.pml” on the command prompt.)

Now, let’s walk through an example to understand how rendezvous channels work. Imagine we have two processes: a sender and a receiver.

The Sender process sends a value on a rendezvous channel named ch, while the Receiver process receives this value from the channel and stores it in a local variable named value. The orange arrows represent the location counters, which indicate the next instruction to be executed in each process. The numbers 1, 2, 6, 7, and 8 are the specific locations where the location counter can point. Initially, both counters are assumed to point to the first instructions in their respective processes.

When the location counter of the Sender process reaches the channel’s send statement, it offers to engage in a rendezvous. If the location counter of the Receiver process is at the matching channel’s receive statement, the rendezvous is accepted, and the sent value is copied into the local variable. These send and receive statements for rendezvous channels are executed atomically. Once the rendezvous completes, the location counters in both processes advance to their next instructions.

I know this might feel like a lot of information, don’t worry—we’ll walk through it step by step below.

Let’s start again. As indicated earlier, this is our code at the start:

Initially, both location counters point to the first instruction. Suppose the location counter in the Sender process reaches the send statement and offers to engage in a rendezvous. At this point, the Sender process becomes blocked because there is no matching receive statement yet.

Now, the location counter in the Receiver process advances to 7:

At this point, the rendezvous is accepted, and both statements execute atomically. The value 10 is sent on the channel, received by the Receiver process, and copied into its local variable. Once the statements are executed, the location counters in both processes advance to their next instructions.

We can also send a channel through another channel. This allows one process to pass a channel to another process. Let’s take a look at the following code:

chan ch = [0] of {chan}

active proctype P(){ 

    chan localch = [0] of {byte}

    // sends a channel
    ch ! localch 

    byte data

    // waiting to receive data
    localch ? data

    // ensure data is 100 
    assert  data == 100
    printf("Data = %d \n", data)
}

active proctype Q(){ 

    chan localch 

    // receives a channel in localch 
    ch ? localch 

    // sends 100 on localch
    localch ! 100

}

We declare a global rendezvous channel, ch, which is used to exchange other channels. In process P, we send a channel of type byte on ch. After that, P waits for localch to receive data into the variable data (as seen at line 15).

The following visualization helps illustrate what’s happening here:

Consider the lines with arrows as channels. Process P sends localch on the channel ch, and process Q receives it. Then, Q sends the value 100 on the channel localch, and P receives it.

One important note: in such scenarios, if the process that instantiated the channel terminates, the corresponding channel also disappears. Any attempt to access it from another process after this will fail and result in an error.

Hopefully, this provides a general understanding of how rendezvous channels work in PROMELA. We’ll build on these concepts in future articles.

Please email or tweet with questions, ideas, or corrections.

How Control Structures Work in PROMELA (SPIN)

Sun, 10 Nov 2024 00:00:00 -0500

The PROMELA language is used to write models in the SPIN model checker. PROMELA’s syntax is similar to C, but its control statements are inspired by a formalism called “guarded commands,” invented by E.W. Dijkstra, which is particularly well-suited for handling nondeterminism.

Let’s start by examining the if statement in PROMELA. It begins with the reserved word “if” and ends with “fi.” Within it, there are one or more execution sequences, each starting with a double colon (::), followed by a guard and an arrow (->). Consider this example:

In the program above, we have an “if” statement with three execution sequences (at lines 6, 8, and 10). Each statement after the double colon and before the arrow is a guard; for example, at line 6 the guard is “n > 0,” at line 8 it is “n < 0,” and at line 10 the guard is “n == 0.” An execution sequence runs when its guard evaluates to true. Here, the guard at line 6 evaluates to true, so its execution sequence executes.

If more than one guard evaluates to true, one of the sequences is selected nondeterministically. If no guard evaluates to true, the process remains blocked until at least one guard becomes selectable.

Now, let’s first explore how nondeterminism works. Consider:

In this case, when two or more guards evaluate to true, the statements associated with either guard could be executed. Running the program multiple times will sometimes print “Greater than two” and other times “Greater than three.” To try this out, save the code in a file named file.pml and run it with the command “spin file.pml”. The choice is made nondeterministically.

Interestingly, this nondeterminism can be used to generate random numbers in PROMELA. In the program below, the value of “n” is determined randomly. Running the program multiple times will show that a different value is chosen for “n” each time.

Now let’s see what happens when all guards evaluate to false. When all guards evaluate to false, the process will block. This means it cannot proceed until at least one of the guards becomes true. Consider:

The program above prints “Begin” and then blocks. In a more complex setup with multiple concurrent processes modifying the value of a global variable, a blocked process with an if-statement like the one above may eventually become unblocked if one of the guards evaluates to true.

Now, let’s take a look at looping structures in PROMELA. One of these is the do-statement, which functions similarly to an if-statement. Consider this program:

The loop continues to run until the value of ‘n’ reaches 10, at which point it breaks out. The guard selection rules work the same way as they do in an if-statement. If multiple guards in a loop evaluate to true, SPIN will select one nondeterministically. If all guards evaluate to false, the execution process will block.

The keyword “else” can be used as a guard in a selection or repetition structure, and it defines a condition that is true only if all other guards evaluate to false within the same structure. Let’s look at how it can be used:

A for-statement was also introduced in version 6 of SPIN, allowing us to write a similar program using a for-statement as follows:

Please email or tweet with questions, ideas, or corrections.

Note:

To run a PROMELA program, save the code in a file with a .pml extension and run the program from the command line using spin file.pml.

The Beautiful Simplicity of the Gentzen System

Tue, 29 Oct 2024 00:00:00 -0400

Gentzen system, created by German mathematician Gerhard Gentzen, is a deductive system which can be used to prove propositional formulas. I recently learned about it while I was reading Ben-Ari’s fantastic book on mathematical logic [1] and I like its simplicity.

Should we care about the Gentzen system? Let’s say you’re a programmer, why should you care about logic or mathematical reasoning?

I recently started learning more about mathematical logic, and I’ve realized that, just as writing can clarify your thoughts, formal mathematical reasoning can bring coherence to your thinking. If parts of your reasoning lack logical soundness, you won’t be able to construct a coherent argument as a whole—mathematical reasoning helps prevent this.

I’ve been programming for a while now, mostly self-taught, and I’ve observed that while learning mathematical formalism isn’t necessary for most programming jobs, it definitely helps you think more carefully about the correctness of your code. It helps you reason through problems with greater precision.

Now let’s dive into the Gentzen system, starting with a scenario.

Imagine you’re a detective trying to solve a robbery. To prove that a certain person committed the crime, you gather and connect key pieces of evidence. You see this person entering the house on a video camera on the day of the robbery, and you later find a receipt showing them selling items belonging to the homeowner. While real detective work is rarely this clear-cut, this simplified example highlights the deductive process of assembling truths and making logical inferences to build a case. In a similar way, a deductive system like the Gentzen system starts from basic truths and uses logical rules to reach sound conclusions.

In a deductive system, you start with basic statements assumed to be true, known as axioms, and you also apply inference rules to build a logical case. The Gentzen system stands out for its simplicity: it has just one axiom and a few inference rules, yet it can prove complex formulas. There’s a certain beauty in seeing how such a simple setup—just one axiom and a few rules—can tackle complex problems with logical precision.

Gentzen’s Genius: A Single Axiom and a Few Inference Rules

For me, what makes the Gentzen System elegant is its minimalism. Gentzen system’s approach is surprisingly simple: one axiom and just a handful of inference rules. And then you can use it to prove complex propositional formulas.

Let’s see how Gentzen system can be used to prove a propositional formula (Please note that I am teaching myself these concepts by learning from this wonderful book [1]. Any errors are my own, and I’m happy to correct anything you find incorrect. I’m writing this to reinforce my own understanding, following Confucius’ advice: “I hear and I forget. I see and I remember. I do and I understand.”).

A little context about propositional logic and formulas: A propositional formula, like $ p \lor q $ (which is basically ‘p or q’, a logical disjunction), uses atomic propositions. The atomic propositions in $ p \lor q $ are $ p $ and $ q $. The atomic propositions can be assigned values $ \text{true} $ or $ \text{false} $. If $ p $ is $ \text{true} $, then $ \neg p $ (not $ p $) is $ \text{false} $. A complementary pair is a pair containing both $ p $ and $ \neg p $.

And now the axiom and a few words about inference rules.

The Axiom: Gentzen system starts with only one axiom: a complementary pair, e.g., $ p $ and $ \neg p $. That’s the only axiom in the Gentzen system.

The Inference Rules: The system also provides inference rules. These rules, alongside the one axiom, allow us to prove new things. There are two types of inference rules: $ \alpha $ (alpha) and $ \beta $ (beta). Here are some details about each of them:

$ \alpha $ inference rules: Think of these rules as ways to shape two logical formulas into a propositional formula. Let’s say you have two propositional formulas, $ A1 $ and $ A2 $, then you can combine them into a single $ \alpha $ formula. There are a few $ \alpha $ rules, but for simplicity, I’m not including them all here—they are all in Ben-Ari’s book. Let’s look at two $ \alpha $ inference rules as we will use these both in a proof below.
- If you have two formulas, $ A1 $ and $ A2 $, then you can write them as $ A1 \lor A2 $.
- If you have $ \neg A1 $ and $ A2 $, then you can write them as $ A1 \rightarrow A2 $ (A1 implies A2).
$ \beta $ inference rules: There are also a few $ \beta $ rules. Like $ \alpha $ rules, you can use them to simplify two $ \beta $ formulas. Here is one inference rule we’ll use in a proof below.
- If you have $ \neg B1 $ and $ \neg B2 $, then you can write them as $ \neg (B1 \lor B2) $.

Now, let’s try to construct a proof using the above axiom and inference rules.

We want to prove $ (p \lor q) \rightarrow (q \lor p) $ using the Gentzen system.

Basically, in such proofs, you proceed step by step, using the outcome of each step to derive the next one, eventually leading to the proof.

Prove: $ (p \lor q) \rightarrow (q \lor p) $

Here is the proof (sorry for the extra text due to the added explanation):

We start by examining the propositional formula we want to prove. Which atomic propositions does it invovle? Well, $ p $ and $ q $. For p, we apply the axiom, giving us both $ p $ and $ \neg p $. We then combine this with $ q $, so the outcome of the first step is: $ p, \neg p, q $.
Next, we apply the axiom again, this time for $ q $. This gives us both $ q $ and $ \neg q $. So, after the second step, we have: $ \neg q, q, p $.
Now, we can apply a $ \beta $ inference rule to the results of the first two steps. We have $ \neg p $ from the first step and $ \neg q $ from the second step. According to the $ \beta $ inference rule, if we have $ \neg B1 $ and $ \neg B2 $, we can combine them into $ \neg (B1 \lor B2) $. So, we treat $ \neg p $ as B1 and $ \neg q $ as B2. Applying the rule gives us $ \neg (p \lor q) $. The result of this third step is $ \neg (p \lor q) $, with p and q carrying over from the first two steps. Now, to explain the carryover: when we use $ \neg p $ and $ \neg q $ to form $ \neg (p \lor q) $, the remaining elements, $ p $ and $ q $, from both steps are combined with $ \neg (p \lor q) $. Since we are working with sets, we only keep unique elements, so the outcome of previous two steps is $ \neg (p \lor q), p, q $.
In this step, we apply an $ \alpha $ rule. According to this rule, if we have A1 and A2, we can combine them into $ A1 \lor A2 $. From step 3, we have $ q $ and $ p $. Applying the alpha rule to these two propositions gives us $ q \lor p $. Importantly,
$ \neg (p \lor q) $ is unchanged, so it carries over. The result of this step is: $ \neg (p \lor q), (q \lor p) $.
Now, let’s look at what the result of step 4. Can we apply a rule to get the formula we set out to prove? Yes! We can use another $ \alpha $ rule, which says if we have $ \neg A1 $ and $ A2 $, we can form $ A1 \rightarrow A2 $ (A1 implies A2). In step 4, we have $ \neg (p \lor q) $ as $ \neg A1 $ and $ (q \lor p) $ as $ A2 $. Using this rule, we arrive at $ (p \lor q) \rightarrow (q \lor p) $, which is exactly what we set out to prove. Voila!

Here is a brief summary of the above steps. In the first two steps of the proof below, we start with axioms. Then, in the third step, we apply the $ \beta $ inference rule on the the outcome of the first and second step. Then, on the result of the third step, we apply an $ \alpha $ inference rule. Finally, on the outcome of the fourth step, we apply another $ \alpha $ rule, and we are be able to prove what we set out to prove.

At first, I didn’t understand this by just reading it—it seemed too clever (as the author also hinted that it seemed clever). I had to solve it a couple of times using pen and paper.

This can also be presented in the tree form. In the screenshot below, the top nodes are axioms, the internal nodes are inference rules, and the node at the bottom is the formula that needed to be proved:

There is an inexplicable beauty in the idea that you start with only one established truth and a few inference rules, and using them, you can prove complex propositional formulas.

Creating a system with just one axiom and a few inference rules that can prove complex propositional formulas is a sign of elegance. The idea that we can take basic truths and use them to prove more complex things has a certain elegance. The idea that we can discipline our thinking in steps, with each rule application or inference bringing us closer to our goal while maintaining a coherent structure, also embodies elegance. This kind of elegance can inspire us to learn and master complexity in other fields.

Please email or tweet with questions, ideas, or corrections.

References:

Mathematical Logic for Computer Science by Mordechai Ben-Ari, 3rd edition.