Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in parallel TLC can lead to non-exhaustive state space exploration or bogus counterexample #361

Closed
Calvin-L opened this issue Sep 11, 2019 · 26 comments
Labels
bug error, glitch, fault, flaw, ... Tools The command line tools - TLC, SANY, ...
Projects
Milestone

Comments

@Calvin-L
Copy link
Collaborator

Calvin-L commented Sep 11, 2019

I have encountered a situation where command-line TLC outputs a spurious invariant failure when I run it with more than one worker.

System information:

  • I was able to reproduce the problem on both Mac and Linux
  • TLC version: TLC2 Version 2.14 of 10 July 2019 (compiled from source myself from the v1.6.0 tag)
  • JVM version: Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.2+13, mixed mode) (MacOS), OpenJDK 64-Bit Server VM 18.9 (build 11.0.3+7-LTS, mixed mode, sharing) (Linux)
  • JVM flags: -XX:+UseParallelGC (I did not experiment with other flags)

Apologies for the long example. I have done my best to minimize it, but the bug is slippery and disappears due to seemingly irrelevant changes to the specification.

Foo.tla:

---------------------------- MODULE Foo ----------------------------

EXTENDS Integers, Sequences

CONSTANT Clients
VARIABLE log \* history of requests/responses

-----------------------------------------------------------------------------
\* Types

Types             == {"T1", "T2"}
T1RequestType     == [type : {"T1"}]
T1ResponseType    == [status : {"OK"}]
T2RequestType     == [type : {"T2"}]
T2ResponseType    == {}

RequestType ==
    T1RequestType \union
    T2RequestType

ResponseType(requestType) ==
    CASE
        requestType = "T1" -> T1ResponseType []
        requestType = "T2" -> T2ResponseType

AnyResponseType == UNION {ResponseType(type) : type \in Types}

LogEntry ==
    [type : {"BEGIN"}, client : Clients, request : RequestType] \union
    [type : {"END"}, client : Clients, output : AnyResponseType]

TypeOK == log \in Seq(LogEntry)

-----------------------------------------------------------------------------

\* A client's pc indicates the next step it will perform.
pc == [c \in Clients |->
    IF \E i \in DOMAIN log: log[i].type = "END" /\ log[i].client = c THEN "COMPLETE"
    ELSE IF \E i \in DOMAIN log: log[i].type = "BEGIN" /\ log[i].client = c THEN "EXEC"
    ELSE "BEGIN"]

Begin(c) ==
    /\ pc[c] = "BEGIN"
    /\ \E req \in RequestType:
        /\ log' = log \o <<[type |-> "BEGIN", client |-> c, request |-> req]>>
        /\ Len(log') <= 2

End(c) ==
    /\ pc[c] = "EXEC"
    /\ \E t \in Types: \E resp \in ResponseType(t):
        /\ log' = log \o <<[type |-> "END", client |-> c, output |-> resp]>>
        /\ Len(log') <= 2

-----------------------------------------------------------------------------

Init ==
    /\ log = <<>>

Next ==
    \E c \in Clients:
    \/ Begin(c)
    \/ End(c)

Spec ==
    Init /\ [][Next]_log

=============================================================================

Foo.cfg:

CONSTANT Clients = {Alice}
SPECIFICATION Spec
INVARIANT TypeOK

Single-threaded invocation (always succeeds):

$ tlc2 -deadlock Foo
TLC2 Version 2.14 of 10 July 2019 (rev: ${git.shortRevision})
Running breadth-first search Model-Checking with fp 41 and seed 112424518556650637 with 1 worker on 4 cores with 3641MB heap and 64MB offheap memory [pid: 50969] (Mac OS X 10.14.6 x86_64, Azul Systems, Inc. 11.0.1 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /Users/cloncari/sw/src/plsql/tmp/Foo.tla
Parsing file /private/var/folders/k0/tw__ldzn7r19gby9d93tffh40000gn/T/Integers.tla
Parsing file /private/var/folders/k0/tw__ldzn7r19gby9d93tffh40000gn/T/Sequences.tla
Parsing file /private/var/folders/k0/tw__ldzn7r19gby9d93tffh40000gn/T/Naturals.tla
Semantic processing of module Naturals
Semantic processing of module Integers
Semantic processing of module Sequences
Semantic processing of module Foo
Starting... (2019-09-11 15:23:30)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2019-09-11 15:23:30.
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 0.0
5 states generated, 5 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 3.
The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 2 and the 95th percentile is 2).
Finished in 00s at (2019-09-11 15:23:30)

Multi-threaded invocation (sometimes succeeds, sometimes fails):

$ tlc2 -deadlock -workers 40 Foo
TLC2 Version 2.14 of 10 July 2019 (rev: ${git.shortRevision})
Running breadth-first search Model-Checking with fp 33 and seed -6500267177216783589 with 40 workers on 4 cores with 3641MB heap and 64MB offheap memory [pid: 50996] (Mac OS X 10.14.6 x86_64, Azul Systems, Inc. 11.0.1 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /Users/cloncari/sw/src/plsql/tmp/Foo.tla
Parsing file /private/var/folders/k0/tw__ldzn7r19gby9d93tffh40000gn/T/Integers.tla
Parsing file /private/var/folders/k0/tw__ldzn7r19gby9d93tffh40000gn/T/Sequences.tla
Parsing file /private/var/folders/k0/tw__ldzn7r19gby9d93tffh40000gn/T/Naturals.tla
Semantic processing of module Naturals
Semantic processing of module Integers
Semantic processing of module Sequences
Semantic processing of module Foo
Starting... (2019-09-11 15:24:12)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2019-09-11 15:24:12.
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 0.0
5 states generated, 5 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 3.
The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 2 and the 95th percentile is 2).
Finished in 00s at (2019-09-11 15:24:12)
$ tlc2 -deadlock -workers 40 Foo
TLC2 Version 2.14 of 10 July 2019 (rev: ${git.shortRevision})
Running breadth-first search Model-Checking with fp 74 and seed -5567078798218024607 with 40 workers on 4 cores with 3641MB heap and 64MB offheap memory [pid: 51001] (Mac OS X 10.14.6 x86_64, Azul Systems, Inc. 11.0.1 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /Users/cloncari/sw/src/plsql/tmp/Foo.tla
Parsing file /private/var/folders/k0/tw__ldzn7r19gby9d93tffh40000gn/T/Integers.tla
Parsing file /private/var/folders/k0/tw__ldzn7r19gby9d93tffh40000gn/T/Sequences.tla
Parsing file /private/var/folders/k0/tw__ldzn7r19gby9d93tffh40000gn/T/Naturals.tla
Semantic processing of module Naturals
Semantic processing of module Integers
Semantic processing of module Sequences
Semantic processing of module Foo
Starting... (2019-09-11 15:24:13)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2019-09-11 15:24:13.
Error: Invariant TypeOK is violated.
Error: The behavior up to this point is:
State 1: <Initial predicate>
log = <<>>

State 2: <Begin line 43, col 5 to line 46, col 25 of module Foo>
log = <<[type |-> "BEGIN", client |-> Alice, request |-> [type |-> "T2"]]>>

State 3: <End line 49, col 5 to line 52, col 25 of module Foo>
log = << [type |-> "BEGIN", client |-> Alice, request |-> [type |-> "T2"]],
   [type |-> "END", client |-> Alice, output |-> [status |-> "OK"]] >>

5 states generated, 5 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 3.
The average outdegree of the complete state graph is 2 (minimum is 2, the maximum 2 and the 95th percentile is 2).
Finished in 00s at (2019-09-11 15:24:13)

The failure in the second multi-threaded request looks wrong to me; the invariant check should pass for the displayed state.

@Calvin-L
Copy link
Collaborator Author

In case it is helpful, the problem still occurs with this more permissive TypeOK invariant:

TypeOK ==
    \A i \in DOMAIN log:
    LET entry == log[i] IN
    entry.type = "END" => entry.output \in AnyResponseType

This version does not use Seq, so hopefully it helps narrow down the problem.

@lemmy
Copy link
Member

lemmy commented Sep 13, 2019

Out of N runs on a 40 core machine, how many produce the spurious invariant? Is this limited to the most recent TLC release or does it also occur with older versions?

@lemmy lemmy added bug error, glitch, fault, flaw, ... Tools The command line tools - TLC, SANY, ... labels Sep 13, 2019
@lemmy lemmy added this to TODO (Engineering) in mku via automation Sep 13, 2019
@Calvin-L
Copy link
Collaborator Author

On my 2-core laptop, roughly 1 in 4 runs fail. Precisely, 24 of 100 total runs failed while I compiled a large project in the background to generate some scheduler noise.

A few more details:

  • The number of actual cores on the machine doesn't seem to matter much. I was able to reproduce the problem on my 2-core laptop and a 104-core server.
  • The -workers 40 flag doesn't seem to be relevant. It still occurs with -workers 2, but not with -workers 1.

I haven't tried other TLC versions, but I will try to find time to do so in the next day or two.

@Calvin-L
Copy link
Collaborator Author

I did some digging and tried various older versions. Every version I tested seems to have the bug.

The oldest Git commit I was able to compile was 488e847 from January 2009, more than a decade ago. The bug was present in that version. (I was not able to test older versions because that commit fixed compilation for Java 1.6, and I do not have a pre-1.6 Java compiler installed.)

@lemmy
Copy link
Member

lemmy commented Sep 14, 2019

Thanks for checking older revisions. Since the bug exists in a rev of 2009, we might as well assume it existed from the very beginning.

I can successfully reproduce this issue with:

while true; do java -Dutil.FileUtil.milliseconds=true -jar /opt/TLA+Toolbox/tla2tools.jar -deadlock -workers 2 MC >> foo.out; done

on

TLC2 Version 2.15 of Day Month 20?? (rev: e7e678c)
Warning: Please run the Java VM which executes TLC with a throughput optimized garbage collector by passing the "-XX:+UseParallelGC" property.
(Use the -nowarning option to disable this warning.)
Running breadth-first search Model-Checking with fp 100 and seed -5976171377444021654 with 2 workers on 4 cores with 5964MB heap and 64MB offheap memory [pid: 1413] (Linux 4.18.0-16-generic amd64, Ubuntu 11.0.4 x86_64, MSBDiskFPSet, DiskStateQueue).

@lemmy
Copy link
Member

lemmy commented Sep 16, 2019

Pending further investigation: replacing

TypeOK == log \in Seq(LogEntry)

with

MySeq(S) == UNION {[1..n -> S] : n \in 0..5}
TypeOK == log \in MySeq(LogEntry)

makes the spurious invariant failure disappear. This indicates that this is either related to the Java impl of the Seq operator or to how TLC generally executes Java module overrides.

@Calvin-L
Copy link
Collaborator Author

I did some poking around over the weekend and I have a different hypothesis. (The alternate TypeOK in my first comment above does not use Seq and still exhibits the bug.)

Is it possible for one instance of SetEnumValue to be shared between different worker threads? If so, it might be possible for two threads to see inconsistent results if one thread tries to normalize the set while another tries to do read the set.

I was able to make the bug go away by marking the SetEnumValue.normalize method as synchronized. In general, the subclasses of Value do not do enough synchronization to protect themselves against concurrent modifications.

(Of course, this find might be a red herring. Adding additional synchronization might be masking a different concurrency bug somewhere else. Another explanation is that if values are supposed to be normalized before they are shared between threads, then there might be some place normalization isn't happening before sharing.)

@Calvin-L
Copy link
Collaborator Author

(I should also point out that adding the synchronized keyword to normalize is not sufficient to make SetEnumValue fully thread-safe. There are still many ways for unsynchronized readers to see an inconsistent view of the object while normalization is happening. Making this class fully thread-safe will require a great deal more thought.)

@lemmy
Copy link
Member

lemmy commented Sep 16, 2019

Normalization of Value instances happens as part of state fingerprinting which happens before a state is enqueued in the state queue and thus shared with another thread. Additionally, tlc2.tool.ModelChecker.doNextCheckInvariants(TLCState, TLCState, boolean, boolean) checks if the successor state is a valid (which spuriously fails) right after the successor has been created before it is even shared.

@lemmy
Copy link
Member

lemmy commented Sep 16, 2019

Two successive runs with identical worker schedules but one showing the spurious violation.

TLC2 Version 2.15 of Day Month 20??
Warning: Please run the Java VM which executes TLC with a throughput optimized garbage collector by passing the "-XX:+UseParallelGC" property.
(Use the -nowarning option to disable this warning.)
Running breadth-first search Model-Checking with fp 1 and seed -7081275985609514574 with 2 workers on 4 cores with 5964MB heap and 64MB offheap memory [pid: 20737] (Linux 4.18.0-16-generic amd64, Ubuntu 11.0.4 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /home/markus/src/TLA/_specs/models/tlcbugs/Github361/Github361.toolbox/Model_1/MC.tla
Parsing file /home/markus/src/TLA/_specs/models/tlcbugs/Github361/Github361.toolbox/Model_1/Github361.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/TLC.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Integers.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Sequences.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Naturals.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/FiniteSets.tla
Semantic processing of module Naturals
Semantic processing of module Integers
Semantic processing of module Sequences
Semantic processing of module Github361
Semantic processing of module FiniteSets
Semantic processing of module TLC
Semantic processing of module MC
Starting... (2019-09-16 12:20:41)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2019-09-16 12:20:42.
0 -> 0
0 -> 0
0 -> 1
0 -> 0
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 0.0
5 states generated, 5 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 3.
The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 2 and the 95th percentile is 2).
Finished in 00s at (2019-09-16 12:20:42)
TLC2 Version 2.15 of Day Month 20??
Warning: Please run the Java VM which executes TLC with a throughput optimized garbage collector by passing the "-XX:+UseParallelGC" property.
(Use the -nowarning option to disable this warning.)
Running breadth-first search Model-Checking with fp 1 and seed 7412956290917719493 with 2 workers on 4 cores with 5964MB heap and 64MB offheap memory [pid: 20771] (Linux 4.18.0-16-generic amd64, Ubuntu 11.0.4 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /home/markus/src/TLA/_specs/models/tlcbugs/Github361/Github361.toolbox/Model_1/MC.tla
Parsing file /home/markus/src/TLA/_specs/models/tlcbugs/Github361/Github361.toolbox/Model_1/Github361.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/TLC.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Integers.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Sequences.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Naturals.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/FiniteSets.tla
Semantic processing of module Naturals
Semantic processing of module Integers
Semantic processing of module Sequences
Semantic processing of module Github361
Semantic processing of module FiniteSets
Semantic processing of module TLC
Semantic processing of module MC
Starting... (2019-09-16 12:20:43)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2019-09-16 12:20:43.
0 -> 0
0 -> 0
0 -> 1
0 -> 0
HERE WE ARE
Error: Invariant TypeOK is violated.
Error: The behavior up to this point is:
State 1: <Initial predicate>
log = <<>>

State 2: <Begin line 48, col 5 to line 51, col 25 of module Github361>
log = <<[type |-> "BEGIN", client |-> "Alice", request |-> [type |-> "T1"]]>>

State 3: <End line 54, col 5 to line 57, col 25 of module Github361>
log = << [type |-> "BEGIN", client |-> "Alice", request |-> [type |-> "T1"]],
   [type |-> "END", client |-> "Alice", output |-> [status |-> "OK"]] >>

5 states generated, 5 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 3.
The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 2 and the 95th percentile is 2).
Finished in 01s at (2019-09-16 12:20:43)

@lemmy
Copy link
Member

lemmy commented Sep 16, 2019

Overall the bug seems highly sensitive to timing changes. Just adding a couple of System.err.println here and there reduces its probability of showing up.

@Calvin-L
Copy link
Collaborator Author

Additionally, tlc2.tool.ModelChecker.doNextCheckInvariants(TLCState, TLCState, boolean, boolean) checks if the successor state is a valid (which spuriously fails) right after the successor has been created before it is even shared.

Yes, but Value instances from the current state can be referenced by the successor state. It looks like this can even happen in subtle ways through "lazy" values like UnionValue and SetCupValue. Even in the successor state, I think there can be a lot of sharing going on.

Normalization of Value instances happens as part of state fingerprinting which happens before a state is enqueued in the state queue and thus shared with another thread.

Awesome, good to know. Working under the assumption that the problem is failure to normalize some value before it is enqueued, I was able to fix the bug by adding the line set.deepNormalize(); to the method UnionValue.deepNormalize. That fixes the problem even without the synchronization I mentioned earlier. Again, this might be a red herring; I am not really an expert on the TLC code.

@lemmy
Copy link
Member

lemmy commented Sep 16, 2019

I can confirm that (even) calling ''normalize" on UnionValue#set appears to make the spurious violation disappear.

Note that UV#deepNormalize is called as part of constant processing at TLC startup, but not during state space exploration (SSE). In other words, it executes when 0-arity operators such as AnyResponseType are evaluated way before SSE. The results of constant processing are kept in the nodes of the semantic graph and are thus - contrary to state values - never fingerprinted. Later, workers will read the results when expressions are evaluated as part of SSE.

What I can see happening, is that worker A and B race to normalize such a Value. I.e. worker A starts to normalize e.g. a SetEnumValue instance and worker B reads an inconsistent SetEnumValue.

@lemmy lemmy moved this from TODO (Engineering) to Doing (Engineering) in mku Sep 17, 2019
@lemmy
Copy link
Member

lemmy commented Sep 17, 2019

Double checking; the following change to SetEnumValue also causes the spurious violation to disappear:

  public final Value normalize() {
    try {
      if (!this.isNorm) {
    	  synchronized (this) {
    		  if (!this.isNorm) {
    			  this.elems.sort(true);   // duplicates eliminated
    			  this.isNorm = true;
    		  }
		}
      }
      return this;
    }
    catch (RuntimeException | OutOfMemoryError e) {
      if (hasSource()) { throw FingerprintException.getNewHead(this, e); }
      else { throw e; }
    }
  }

@lemmy
Copy link
Member

lemmy commented Sep 17, 2019

I managed to manually cause two workers to race to normalize a SetEnumValue that represents a constant expression. The obvious fix would be to eagerly normalize constants.

@Calvin-L
Copy link
Collaborator Author

Yes, with your proposed implementation of normalize I can no longer reproduce the problem.

However, in case you are considering committing it, I advise against considering the proposed implementation a complete fix, since:

  1. Double-checked locking is broken.
  2. Synchronization in normalize alone will not prevent other readers (e.g. SetOfRcdsValue.SubsetEnumerator.elementAt) from observing an inconsistent view of elems while it is being sorted by some other thread.

Eagerly normalizing constants seems like a more complete fix to me since it (hopefully?) prevents workers from modifying shared objects. We could also consider a more invasive set of changes that would make SetEnumValue truly thread-safe.

@lemmy
Copy link
Member

lemmy commented Sep 17, 2019

I have yet to be convinced that we need thread-safe value instances, because state values (as opposed to constant values) will be normalized prior to sharing due to fingerprinting.

Constant values on the other hand can be eagerly normalized before SSE.

@Calvin-L
Copy link
Collaborator Author

That makes sense to me!

@lemmy lemmy changed the title Parallel TLC nondeterministically reports incorrect result UnionValue#deepNormalize omits to normalize set leading to parallel TLC nondeterministically reporting spurious safety violation Sep 17, 2019
@lemmy
Copy link
Member

lemmy commented Sep 18, 2019

After spending more quality time with Value#deepNormalize I noticed that sub-classes show inconsistent behavior, i.e. Value#isNormalized returns false after Value#deepNormalize has been called.

package tlc2.value.impl;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

import org.junit.Test;

import util.InternTable;
import util.UniqueString;

public class DeepNormalizeValueTest {
	
	/**
	 * Test method for {@link tlc2.value.impl.UnionValue#deepNormalize()}.
	 */
	@Test
	public void union() {
		final ValueVec vec = new ValueVec();
		vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(42)}, false));
		vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(23)}, false));
		vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(4711)}, false));
		vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(1)}, false));
		
		final UnionValue uv = new UnionValue(new SetEnumValue(vec, false));
		assertFalse(uv.isNormalized());
		assertFalse(uv.set.isNormalized());
		
		uv.deepNormalize();
		
		assertTrue(uv.set.isNormalized());
		assertTrue(uv.isNormalized());
	}

	/**
	 * Test method for {@link tlc2.value.impl.SetCapValue#deepNormalize()}.
	 */
	@Test
	public void setcap() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		
		SetCapValue scv = new SetCapValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
		assertFalse(scv.isNormalized());
		assertFalse(scv.set1.isNormalized());
		assertFalse(scv.set2.isNormalized());
		
		scv.deepNormalize();
		
		assertTrue(scv.set1.isNormalized());
		assertTrue(scv.set2.isNormalized());
		assertTrue(scv.isNormalized());
	}

	@Test
	public void setcup() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		
		SetCupValue scv = new SetCupValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
		assertFalse(scv.isNormalized());
		assertFalse(scv.set1.isNormalized());
		assertFalse(scv.set2.isNormalized());
		
		scv.deepNormalize();
		
		assertTrue(scv.set1.isNormalized());
		assertTrue(scv.set2.isNormalized());
		assertTrue(scv.isNormalized());
	}

	@Test
	public void setdiff() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		
		SetDiffValue sdv = new SetDiffValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
		assertFalse(sdv.isNormalized());
		assertFalse(sdv.set1.isNormalized());
		assertFalse(sdv.set2.isNormalized());
		
		sdv.deepNormalize();
		
		assertTrue(sdv.isNormalized());
		assertTrue(sdv.set1.isNormalized());
		assertTrue(sdv.set2.isNormalized());
	}

	@Test
	public void subset() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		
		SubsetValue sub = new SubsetValue(new SetEnumValue(vec, false));
		assertFalse(sub.isNormalized());
		assertFalse(sub.set.isNormalized());
		
		sub.deepNormalize();
		
		assertTrue(sub.set.isNormalized());
		assertTrue(sub.isNormalized());
	}

	@Test
	public void record() {
		final InternTable internTable = new InternTable(2);
		final UniqueString a = internTable.put("a");
		final UniqueString b = internTable.put("b");
		
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		final Value aVal = new SetEnumValue(vec, false);
		final Value bVal = new SetEnumValue(vec, false);
		
		final RecordValue rcdv = new RecordValue(new UniqueString[] {b, a}, new Value[] {bVal, aVal}, false);

		assertFalse(rcdv.isNormalized());
		for (Value v : rcdv.values) {
			assertFalse(v.isNormalized());
		}
		
		rcdv.deepNormalize();
		
		for (Value v : rcdv.values) {
			assertTrue(v.isNormalized());
		}
		assertTrue(rcdv.isNormalized());
	}

	@Test
	public void fcnrecord() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		final Value aVal = new SetEnumValue(vec, false);
		final Value bVal = new SetEnumValue(vec, false);

		final FcnRcdValue rcdv = new FcnRcdValue(new Value[] { new StringValue("B"), new StringValue("A") },
				new Value[] { bVal, aVal }, false);

		assertFalse(rcdv.isNormalized());
		for (Value v : rcdv.values) {
			assertFalse(v.isNormalized());
		}
		
		rcdv.deepNormalize();
		
		for (Value v : rcdv.values) {
			assertTrue(v.isNormalized());
		}
		assertTrue(rcdv.isNormalized());
	}
	
	@Test
	public void tuple() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		final Value aVal = new SetEnumValue(vec, false);
		
		final TupleValue tuple = new TupleValue(aVal);

		for (Value v : tuple.elems) {
			assertFalse(v.isNormalized());
		}
		assertFalse(tuple.isNormalized());
		
		tuple.deepNormalize();
		
		assertTrue(tuple.isNormalized());
		for (Value v : tuple.elems) {
			assertTrue(v.isNormalized());
		}
	}
	
	@Test
	public void setOfTuple() {
			final IntervalValue intVal = new IntervalValue(1, 2);
		final SetOfTuplesValue inner = new SetOfTuplesValue(intVal, intVal);
		final SetOfTuplesValue tuples = new SetOfTuplesValue(inner, inner);

		for (Value v : tuples.sets) {
			assertFalse(v.isNormalized());
		}
		assertFalse(tuples.isNormalized());
		
		tuples.deepNormalize();
		
		assertTrue(tuples.isNormalized());
		for (Value v : tuples.sets) {
			assertTrue(v.isNormalized());
		}
	}
	
	@Test
	public void setOfRcds() {

		final Value[] values = new Value[3];
		values[0] = new SetEnumValue(getValue(7, "a"), true);
		values[1] = new IntervalValue(1, 2);
		values[2] = new IntervalValue(1, 4);

		final SetOfRcdsValue setOfRcrds= new SetOfRcdsValue(getNames(3), values, true);

		for (Value v : setOfRcrds.values) {
			assertFalse(v.isNormalized());
		}
		assertFalse(setOfRcrds.isNormalized());

		setOfRcrds.deepNormalize();

		assertTrue(setOfRcrds.isNormalized());
		for (Value v : setOfRcrds.values) {
			assertTrue(v.isNormalized());
		}
	}

	private static final Value[] getValue(final int n, String str) {
		final Value[] values = new Value[n];
		for (int i = 0; i < n; i++) {
			values[i] = new StringValue(str + i);
		}
		return values;
	}

	private static final UniqueString[] getNames(final int n) {
		final UniqueString[] names = new UniqueString[n];
		for (int i = 0; i < names.length; i++) {
			names[i] = UniqueString.uniqueStringOf("N" + i);
		}
		return names;
	}
}

This indicates that it is best to call Value#fingerprint in addition to Value#deepNormalize in SpecProcessor to guarantee fully initialized Value instances (fingerprint internally triggers convertAndCache when defined in a Value sub-class). This way, SpecProcessor created Value instances follow the same life-cycle as Value instances created during SSE.

package tlc2.value.impl;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

import org.junit.BeforeClass;
import org.junit.Test;

import tlc2.util.FP64;
import util.InternTable;
import util.UniqueString;

public class DeepNormalizeValueTest {

	@BeforeClass
	public static void setup() {
		FP64.Init();
	}
	
	/**
	 * Test method for {@link tlc2.value.impl.UnionValue#deepNormalize()}.
	 */
	@Test
	public void union() {
		final ValueVec vec = new ValueVec();
		vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(42)}, false));
		vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(23)}, false));
		vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(4711)}, false));
		vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(1)}, false));
		
		final UnionValue uv = new UnionValue(new SetEnumValue(vec, false));
		assertFalse(uv.isNormalized());
		assertFalse(uv.set.isNormalized());
		
		uv.fingerPrint(0L);
		
		assertTrue(uv.set.isNormalized());
		assertTrue(uv.isNormalized());
	}

	/**
	 * Test method for {@link tlc2.value.impl.SetCapValue#deepNormalize()}.
	 */
	@Test
	public void setcap() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		
		SetCapValue scv = new SetCapValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
		assertFalse(scv.isNormalized());
		assertFalse(scv.set1.isNormalized());
		assertFalse(scv.set2.isNormalized());
		
		scv.deepNormalize();
		scv.fingerPrint(0L);
		
		assertTrue(scv.set1.isNormalized());
		assertTrue(scv.set2.isNormalized());
		assertTrue(scv.isNormalized());
	}

	@Test
	public void setcup() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		
		SetCupValue scv = new SetCupValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
		assertFalse(scv.isNormalized());
		assertFalse(scv.set1.isNormalized());
		assertFalse(scv.set2.isNormalized());
		
		scv.fingerPrint(0L);
		
		assertTrue(scv.set1.isNormalized());
		assertTrue(scv.set2.isNormalized());
		assertTrue(scv.isNormalized());
	}

	@Test
	public void setdiff() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		
		SetDiffValue sdv = new SetDiffValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
		assertFalse(sdv.isNormalized());
		assertFalse(sdv.set1.isNormalized());
		assertFalse(sdv.set2.isNormalized());
		
		sdv.deepNormalize();
		sdv.fingerPrint(0L);
		
		assertTrue(sdv.isNormalized());
		assertTrue(sdv.set1.isNormalized());
		assertTrue(sdv.set2.isNormalized());
	}

	@Test
	public void subset() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		
		SubsetValue sub = new SubsetValue(new SetEnumValue(vec, false));
		assertFalse(sub.isNormalized());
		assertFalse(sub.set.isNormalized());
		
		sub.fingerPrint(0L);
		
		assertTrue(sub.set.isNormalized());
		assertTrue(sub.isNormalized());
	}

	@Test
	public void record() {
		final InternTable internTable = new InternTable(2);
		final UniqueString a = internTable.put("a");
		final UniqueString b = internTable.put("b");
		
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		final Value aVal = new SetEnumValue(vec, false);
		final Value bVal = new SetEnumValue(vec, false);
		
		final RecordValue rcdv = new RecordValue(new UniqueString[] {b, a}, new Value[] {bVal, aVal}, false);

		assertFalse(rcdv.isNormalized());
		for (Value v : rcdv.values) {
			assertFalse(v.isNormalized());
		}
		
		rcdv.fingerPrint(0L);
		
		for (Value v : rcdv.values) {
			assertTrue(v.isNormalized());
		}
		assertTrue(rcdv.isNormalized());
	}

	@Test
	public void fcnrecord() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		final Value aVal = new SetEnumValue(vec, false);
		final Value bVal = new SetEnumValue(vec, false);

		final FcnRcdValue rcdv = new FcnRcdValue(new Value[] { new StringValue("B"), new StringValue("A") },
				new Value[] { bVal, aVal }, false);

		assertFalse(rcdv.isNormalized());
		for (Value v : rcdv.values) {
			assertFalse(v.isNormalized());
		}
		
		rcdv.fingerPrint(0L);
		
		for (Value v : rcdv.values) {
			assertTrue(v.isNormalized());
		}
		assertTrue(rcdv.isNormalized());
	}
	
	@Test
	public void tuple() {
		final ValueVec vec = new ValueVec();
		vec.addElement(IntValue.gen(42));
		vec.addElement(IntValue.gen(23));
		vec.addElement(IntValue.gen(4711));
		vec.addElement(IntValue.gen(1));
		final Value aVal = new SetEnumValue(vec, false);
		
		final TupleValue tuple = new TupleValue(aVal);

		tuple.deepNormalize();
		tuple.fingerPrint(0L);
		
		assertTrue(tuple.isNormalized());
		for (Value v : tuple.elems) {
			assertTrue(v.isNormalized());
		}
	}
	
	@Test
	public void setOfTuple() {
		final IntervalValue intVal = new IntervalValue(1, 2);
		final SetOfTuplesValue inner = new SetOfTuplesValue(intVal, intVal);
		final SetOfTuplesValue tuples = new SetOfTuplesValue(inner, inner);
	
		tuples.deepNormalize();
		tuples.fingerPrint(0L);
		
		assertTrue(tuples.isNormalized());
		for (Value v : tuples.sets) {
			assertTrue(v.isNormalized());
		}
	}
	
	@Test
	public void setOfRcds() {

		final Value[] values = new Value[3];
		values[0] = new SetEnumValue(getValue(7, "a"), true);
		values[1] = new IntervalValue(1, 2);
		values[2] = new IntervalValue(1, 4);

		final SetOfRcdsValue setOfRcrds= new SetOfRcdsValue(getNames(3), values, true);

		setOfRcrds.deepNormalize();
		setOfRcrds.fingerPrint(0L);

		assertTrue(setOfRcrds.isNormalized());
		for (Value v : setOfRcrds.values) {
			assertTrue(v.isNormalized());
		}
	}

	private static final Value[] getValue(final int n, String str) {
		final Value[] values = new Value[n];
		for (int i = 0; i < n; i++) {
			values[i] = new StringValue(str + i);
		}
		return values;
	}

	private static final UniqueString[] getNames(final int n) {
		final UniqueString[] names = new UniqueString[n];
		for (int i = 0; i < names.length; i++) {
			names[i] = UniqueString.uniqueStringOf("N" + i);
		}
		return names;
	}
}

@lemmy lemmy added this to the 1.6.1 milestone Sep 18, 2019
@lemmy lemmy changed the title UnionValue#deepNormalize omits to normalize set leading to parallel TLC nondeterministically reporting spurious safety violation UnionValue#deepNormalize omits to normalize set leading to bogus behavior with parallel TLC Sep 18, 2019
lemmy added a commit that referenced this issue Sep 18, 2019
found in overwrites of Value#deepNormalize.


This omission surfaced through a race condition that led to a spurious
safety violation (#361):

1) A TLA+ spec defines a (zero-arity) operator s.a. "Foo == UNION {...}"
that appears in an invariant.

2) SpecProcessor#processConstantDefns eagerly evaluates the operator Foo
at startup and inserts its Value result UV into the corresponding node
of the semantic graph.

3) Two workers check if two states violate the invariant which triggers
UnionValue#member, which internally causes this.set to be normalized.
Since Value instances are not thread-safe because they are expected to
be fully normalized during state space exploration, the two workers race
to normalize this.set.

4) Worker A gets ahead and loops over the elements in UV#member while
worker B still normalizes UV. Worker A reads inconsistent data and thus
reports the invariant to be violated.

Thanks to Calvin Loncaric for suggesting this fix.

Fixes Github issue #361: UnionValue#deepNormalize omits to normalize set
leading to parallel TLC nondeterministically reporting spurious safety
violation
#361

[Bug][TLC]
@lemmy
Copy link
Member

lemmy commented Sep 18, 2019

3014894

@lemmy lemmy closed this as completed Sep 18, 2019
mku automation moved this from Doing (Engineering) to Done Sep 18, 2019
@Calvin-L
Copy link
Collaborator Author

Fantastic, thank you for the quick resolution on this. :)

@lemmy lemmy reopened this Oct 17, 2019
@lemmy
Copy link
Member

lemmy commented Oct 17, 2019

For some specs such as the ones below, the current fix to eagerly evaluate constant definition causes TLC to take forever to start.

---------------------------- MODULE CarTalkPuzzle ---------------------------
(***************************************************************************)
(* Car Talk is a U.S.  radio program about car repair.  Each show includes *)
(* a puzzle, which is often a little mathematical problem.  Usually, those *)
(* problems are easy for a mathematically sophisticated listener to solve. *)
(* However, I was not able immediately to see how to solve the puzzle from *)
(* the 22 October 2011 program.  I decided to specify the problem in TLA+  *)
(* and let TLC (the TLA+ model checker) compute the solution.  This is the *)
(* specification I wrote.  (I have tried to explain in comments all TLA+   *)
(* notation that is not standard mathematical notation.) Once TLC had      *)
(* found the solution, it was not hard to understand why it worked.        *)
(*                                                                         *)
(* Here is the problem.  A farmer has a 40 pound stone and a balance       *)
(* scale.  How can he break the stone into 4 pieces so that, using those   *)
(* pieces and the balance scale, he can weigh out any integral number of   *)
(* pounds of corn from 1 pound through 40 pounds.                          *)
(***************************************************************************)

(***************************************************************************)
(* The following statement imports the standard operators of arithmetic    *)
(* such as + and =< (less than or equals).  It also defines the operator   *)
(* ..  so that i..j is the set of all integers k with i =< k =< j.         *)
(***************************************************************************)
EXTENDS Integers 

(***************************************************************************)
(* For generality, I solve the problem of breaking an N pound stone into P *)
(* pieces.  The following statement declares N and P to be unspecified     *)
(* constant values.                                                        *)
(***************************************************************************)
CONSTANTS N, P

(***************************************************************************)
(* I define the operator Sum so that if f is any integer-valued function,  *)
(* and S any finite subset of its domain, then Sum(f, S) is the sum of     *)
(* f[x] for all x in S.  (In TLA+, function application is indicated by    *)
(* square brackets instead of parentheses, as it is in ordinary math.)     *)
(*                                                                         *)
(* A RECURSIVE declaration must precede a recursively defined operator.    *)
(* The operator CHOOSE is known to logicians as Hilbert's Epsilon.  It is  *)
(* defined so that CHOOSE x \in S : P(x) equals some unspecified value v   *)
(* such that P(v) is true, if such a value exists.                         *)
(***************************************************************************)
RECURSIVE Sum(_,_)
Sum(f,S) == IF S = {} THEN 0
                      ELSE LET x == CHOOSE x \in S : TRUE
                           IN  f[x] + Sum(f, S \ {x})

(***************************************************************************)
(* I now define the set Break of all "breaks", where a break represents a  *)
(* method of breaking the stone into P (integer-weight) pieces.  The       *)
(* obvious definition of a break would be a set of weights.  However, that *)
(* doesn't work because it doesn't handle the situation in which two of    *)
(* pieces have the same weight.  Instead, I define a break of the N pound  *)
(* stone into P pieces to be a function B from 1..P (the integers from 1   *)
(* through P) into 1..N such that B[i] is the weight of piece number i.    *)
(* To avoid solutions that differ only by how the pieces are numbered, I   *)
(* consider only breaks in which the pieces are numbered in non-decreasing *)
(* order of their weight.  This leads to the following definition of the   *)
(* set Break of all breaks.                                                *)
(*                                                                         *)
(* In TLA+, [S -> T] is the set of all functions with domain S and range a *)
(* subset of T.  \A and \E are the universal and existential quantifiers.  *)
(***************************************************************************)
Break == {B \in [1..P -> 1..N] :    Sum(B, 1..P) = N
                                 /\ \A i \in 1..P : \A j \in (i+1)..P : B[i] =< B[j]}

(***************************************************************************)
(* To weigh a quantity of corn, we can put some of the weights on the same *)
(* side of the balance scale as the corn and other weights on the other    *)
(* side of the balance.  The following operator is true for a weight w, a  *)
(* break B, and sets S and T of pieces if w plus the weight of the pieces  *)
(* in S equals the weight of the pieces in T.  The elements of S and T are *)
(* piece numbers (numbers in 1..P), so Sum(B, S) is the weight of the      *)
(* pieces in S.                                                            *)
(***************************************************************************)
IsRepresentation(w, B, S, T) ==    S \cap T = {}
                                 /\ w + Sum(B,S) = Sum(B,T)
(***************************************************************************)
(* I now define IsSolution(B) to be true iff break B solves the problem,   *)
(* meaning that it can be used to balance any weight in 1..N.              *)
(*                                                                         *)
(* SUBSET S is the set of all subsets of S (the power set of S).           *)
(***************************************************************************)
IsSolution(B) ==  \A w \in 1..N : 
                     \E S, T \in SUBSET (1..P) : IsRepresentation(w, B, S, T) 

(***************************************************************************)
(* I define AllSolutions to be the set of all breaks B that solve the      *)
(* problem.                                                                *)
(***************************************************************************)
AllSolutions == { B \in Break : IsSolution(B) }
=========
Created by Leslie Lamport on 26 October 2011

With N = 121 and P = 5.


------------------------- MODULE ConfluentAlgorithm -------------------------
\* This is an excerpt of the actual spec.
EXTENDS TLC

Nodes == {1, 2, 3, 4, 5}
InitNodes == {1}
Arcs == {<<1, 2>>, <<2, 1>>, <<2, 3>>, <<3, 3>>, <<3, 4>>, <<4, 5>>, <<5, 3>>}
null == 6

Reachable(n, m) ==
  LET NbrsNotIn(x, S) == {y \in Nodes \ S : <<x, y>> \in Arcs}
      RECURSIVE RF(_)
      RF(S) == LET Nxt == UNION { NbrsNotIn(x, S) : x \in S }
               IN  IF m \in S THEN TRUE
                              ELSE IF Nxt = {} THEN FALSE
                                               ELSE RF(S \cup Nxt)                               
  IN  (n = m) \/ RF({n})

MCC ==
  LET RECURSIVE M(_, _)
      M(Partial, Rest) == 
        IF Rest = {} THEN Partial
                     ELSE LET x == CHOOSE x \in Rest : TRUE
                              CC == {y \in Rest : /\ Reachable(x, y)
                                                  /\ Reachable(y, x)}
                          IN M(Partial \cup {CC}, Rest \ CC)
  IN  M({}, Nodes)

Outgoing(node) == {a \in Arcs : a[1] = node}

Vertices == [nodes : SUBSET Nodes, succ : Nodes \cup {null}, dead : BOOLEAN]

Partitions == {P \in SUBSET Vertices :
                 /\ \A v1, v2 \in P : 
                       (v1 # v2) => ((v1.nodes \cap v2.nodes) = {})
                 /\ \A v \in P : /\ v.nodes # {}
                                 /\ \E w \in P : \/ v.succ = null
                                                 \/ v.succ \in w.nodes}     

ASSUME(PrintT(Partitions))
\* [...]
================                  

@lemmy lemmy moved this from Done to Doing (Engineering) in mku Oct 17, 2019
@lemmy
Copy link
Member

lemmy commented Oct 18, 2019

Root Cause:

  1. Values can be (deeply) nested.

  2. Values transition through the following sequence of states: New (instantiated), normalized, and fingerprinted. Transitioning from one state to another is not thread-safe. Non-technical speaking, a value is both the symbolic and the explicit entity.

  3. Normalization of an (enumerable) value causes its elements to be sorted and duplicates to be removed. It does not cause new elements to be generated/enumerated. For example, normalizing the SetPredValue, corresponding to the expression tla { s \in SUBSET S : s # {} }, causes S to be sorted and any duplicate elements to be removed.

  4. Fingerprinting an enumerable value causes its elements to be explicitly enumerated. For example, the SetPredValue from before causes all subsets s to be enumerated.

  5. At startup, TLC eagerly evaluates zero-arity constant definitions and operators which involves creating values. These values are normalized but not fingerprinted! For example, the operator NonEmptySubsS == { s \in SUBSET S : s # {} } will be evaluated to a SetPredValue that is stored in the NonEmptySubsS node of the semantic graph. We denote the set of values in the semantic graph with VSG.

  6. Generating a TLCState during the evaluation of the next-state relation, assigns newly created values to variables. Before a TLCState is enqueued in the StateQueue, its values are fingerprinted. Thus, values of TLCStates can safely be shared across Workers.

Lets assume a spec with one variable x whose value - in all states - is some set. Furthermore, let P be the property [](x \in NonEmptySubsS). For simplicity reasons, we assume:

  1. P is not checked on the initial states
  2. Two Workers check next states simultaneously

Under these two assumptions, both Workers will race to enumerate (fingerprint) NonEmptySubsS.

Note that the fact that we had to assume 1) and 2) explains why this race condition didn't surfaced earlier. However, the assumptions are true in real-world executions of TLC. For example, the evaluation of a more complex NonEmptySubsS, with a deeply nested hierarchy of values, may not require the evaluation of nested values in the initial states.


@xxyzzn came up with another scenario that might lead to the calculation of bogus fingerprints:

EXTENDS Integers, Sequences
VARIABLE x, y

Foo == { s \in SUBSET {1,2,3,4} : s # {} }

Init == x = << >> /\ y \in 1..2

Next == x' \in {<<i, Foo>> : i \in 1..2} /\ UNCHANGED y

Variable y only exists so that the state-graph has two initial states (sequentially generated). A single initial state would restrict the first evaluation of Next to a single Worker (only a single initial state to dequeue from the StateQueue) and thus fingerprinting cannot occur concurrently.

Attempted Fix A:

At startup, (eagerly) fingerprint all values of VSG.

Problem: Not all values in VSG need not be fingerprinted to completely check a spec. For some specs such as CarTalkPuzzle, this is even prohibitive because explicit enumeration is infeasible: TLC will hang at startup.

Deferring fingerprinting of VSG until after the assumptions have been checked, such as when the initial states are (atomically) generated, is no general solution.

Proposed Fix A:

Properly synchronize values such that no race conditions are possible if two or more Workers enumerate them concurrently.

Disadvantages:

  1. It creates a scalability bottleneck: Workers will content for the synchronized NonEmptySubson every state that is generated.
  2. Engineering effort to correctly synchronize the Value type hierarchy without introducing dead-locks. It does not suffice to synchronize only fingerprinting, because other workers can also read inconsistent values.

Proposed Fix B:

For each worker, create a copy of VSG (technically borrows from the concept of a ThreadLocal).

Disadvantages:

  1. Requires a type check (and potentially cast) on every lookup that occurrs during the evaluation of the next-state relation. An optimization would be to eventually garbage-collect the copies of VSG to at least skip the type cast once one copy is completely fingerprinted.
  2. Value#deepCopy - required to create the copies of VSG - is broken for most value types (can probably be fixed).
    2b) Instead of through IValue#deepCopy, SpecProcessor can simply create values by repeatedly evaluating the OpDefNode.

@lemmy lemmy changed the title UnionValue#deepNormalize omits to normalize set leading to bogus behavior with parallel TLC Race condition can lead to non-exhaustive state space exploration or bogus counterexamples with parallel TLC Oct 31, 2019
@lemmy lemmy changed the title Race condition can lead to non-exhaustive state space exploration or bogus counterexamples with parallel TLC Race condition in parallel TLC can lead to non-exhaustive state space exploration or bogus counterexample Oct 31, 2019
lemmy added a commit that referenced this issue Nov 1, 2019
bogus counterexamples with parallel TLC.

Github issue #361.
#361

[Bug][TLC]
lemmy added a commit that referenced this issue Nov 1, 2019
bogus counterexamples with parallel TLC.

Optimize by only wrapping mutable values.

Github issue #361.
#361

[Bug][TLC]
@lemmy
Copy link
Member

lemmy commented Nov 1, 2019

@quaeler Can you please review the changes in f1a089e and 56740fd and let me know if anything need clarification? Please reopen if you find anything.

@lemmy lemmy closed this as completed Nov 1, 2019
mku automation moved this from Doing (Engineering) to Done Nov 1, 2019
@quaeler
Copy link
Contributor

quaeler commented Nov 2, 2019

No, i think this is clearly written. Do we have any idea of how performance would be changed were we to put time into making sure that all IValue (or at least all Enumerable) implementors had a well implemented deepCopy()?

@lemmy
Copy link
Member

lemmy commented Nov 2, 2019

@quaeler The reason why I did not change deepCopy was to make sure I don not introduce subtle bugs. There is probably (hopefully) a reason why it is implemented the way it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug error, glitch, fault, flaw, ... Tools The command line tools - TLC, SANY, ...
Projects
No open projects
mku
  
Done
Development

No branches or pull requests

3 participants