ewd998

Experience TLA+ in action by specifying distributed termination detection on a ring, due to Shmuel Safra. Each git commit introduces a new TLA+ concept. Go back to the very first commit to follow along!

v00: IDE

Click either one of the buttons to launch a zero-install IDE to give the TLA+ specification language a try:

(=> Screencast how to create the TLA+ Codespace)

v01: Problem statement - Termination detection in a ring

v01a: Termination of pleasingly parallel

For this tutorial, we assume that the distributed system nodes are organized as a ring, with one the (unique) leader[^1]. If we further assume that nodes execute independent computations, (global) termination detection becomes trivial--the leader initiates a token transfer around the ring, and each node passes the token to its next neighbor, iff the node finished its computation. When the initiator receives back the token, it knows that all (other) nodes have terminated.

This problem is too simple, and we don't need TLA+ to model it.

[^1] Perhaps by some leader election algorithm.

v01b: Termination of collaborative computation

A more interesting problem is to look at a "collaborative" computation, which implies that nodes can re-activate each other. For example, the result of a computation at node 23 is (atomically!) sent to and further processed at node 42. With the previous protocol, node 42 might have already passed on the token, causing the initiator to eventually detect (global) termination; a bug that is at least difficult to reproduce with testing! A solution is offered in EWD840:

Initiator sends a "stateful" token around the ring
Each node remembers if it activated another node
Activation taints the token (when the activator gets the token)
Initiator keeps running rounds until it receives an untainted token

v01c: Termination detection with asynchronous message delivery

What happens if we loosen the restriction that message delivery is atomic (it seldom is)? Clearly, we are back at square one:

Node 23 sends a message to 42
23 taints the token
Initiator starts a new round
Node 42 received the fresh token before receiving the activation message from 23
Boom!

The fix proposed in Shmuel Safra's EWD998, is to count in-flight messages. But will this work?

Throughout the chapters of this tutorial, we will use the TLA+ specification language to model EWD998, and check interesting properties.

v02: High-level spec AsyncTerminationDetection

TLA+ is all about abstraction, and, as we will later see, has first-class support to connect different levels of abstraction. Let's use this and write a basic spec that either falsifies our design above, or gives us sufficient confidence to invest in writing a more detailed spec.

(Credit: Stephan Merz wrote AsyncTerminationDetection)

v02a: Spec skeleton

Instead of modeling message channels, let alone modeling the transport layer, we will write a spec that models:

A ring of N nodes
The activation status of each node
The number of messages pending[^2] at a node
A send action
A receive action
A terminate action
The initial configuration of the system

Please switch to AsyncTerminationDetection.tla and read its comments. From here on, the tutorial continues there...

[^2] It's difficult to (efficiently) count pending messages in an implementation. In a TLA+ spec, we don't care about that notion of efficiency. Also, all variables are global.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
figures		figures
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
APAsyncTerminationDetection.cfg		APAsyncTerminationDetection.cfg
APEWD998.cfg		APEWD998.cfg
APEWD998.tla		APEWD998.tla
AsyncTerminationDetection.tla		AsyncTerminationDetection.tla
AsyncTerminationDetection_apalache.tla		AsyncTerminationDetection_apalache.tla
AsyncTerminationDetection_proof.tla		AsyncTerminationDetection_proof.tla
C.cfg		C.cfg
C.tla		C.tla
CE.cfg		CE.cfg
EWD998.tla		EWD998.tla
EWD998_proof.tla		EWD998_proof.tla
F.cfg		F.cfg
F.tla		F.tla
IncDec.tla		IncDec.tla
LICENSE		LICENSE
MCAsyncTerminationDetection.cfg		MCAsyncTerminationDetection.cfg
MCAsyncTerminationDetection.tla		MCAsyncTerminationDetection.tla
MCAsyncTerminationDetection_actions.dot		MCAsyncTerminationDetection_actions.dot
MCEWD998.cfg		MCEWD998.cfg
MCEWD998.tla		MCEWD998.tla
MCEWD998_actions.dot		MCEWD998_actions.dot
O.tla		O.tla
README.md		README.md
SmokeEWD998.cfg		SmokeEWD998.cfg
SmokeEWD998.tla		SmokeEWD998.tla
SyncTerminationDetection.tla		SyncTerminationDetection.tla
TemporalLogic.cfg		TemporalLogic.cfg
TemporalLogic.tla		TemporalLogic.tla
Utils.tla		Utils.tla

License

tlaplus-workshops/ewd998

Folders and files

Latest commit

History

Repository files navigation

ewd998

v00: IDE

v01: Problem statement - Termination detection in a ring

v01a: Termination of pleasingly parallel

v01b: Termination of collaborative computation

v01c: Termination detection with asynchronous message delivery

v02: High-level spec AsyncTerminationDetection

v02a: Spec skeleton

About

Topics

Resources

License

Stars

Watchers

Forks

Languages