New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition in parallel TLC can lead to non-exhaustive state space exploration or bogus counterexample #361
Comments
In case it is helpful, the problem still occurs with this more permissive
This version does not use |
Out of N runs on a 40 core machine, how many produce the spurious invariant? Is this limited to the most recent TLC release or does it also occur with older versions? |
On my 2-core laptop, roughly 1 in 4 runs fail. Precisely, 24 of 100 total runs failed while I compiled a large project in the background to generate some scheduler noise. A few more details:
I haven't tried other TLC versions, but I will try to find time to do so in the next day or two. |
I did some digging and tried various older versions. Every version I tested seems to have the bug. The oldest Git commit I was able to compile was 488e847 from January 2009, more than a decade ago. The bug was present in that version. (I was not able to test older versions because that commit fixed compilation for Java 1.6, and I do not have a pre-1.6 Java compiler installed.) |
Thanks for checking older revisions. Since the bug exists in a rev of 2009, we might as well assume it existed from the very beginning. I can successfully reproduce this issue with: while true; do java -Dutil.FileUtil.milliseconds=true -jar /opt/TLA+Toolbox/tla2tools.jar -deadlock -workers 2 MC >> foo.out; done on
|
Pending further investigation: replacing TypeOK == log \in Seq(LogEntry) with MySeq(S) == UNION {[1..n -> S] : n \in 0..5}
TypeOK == log \in MySeq(LogEntry) makes the spurious invariant failure disappear. This indicates that this is either related to the Java impl of the |
I did some poking around over the weekend and I have a different hypothesis. (The alternate TypeOK in my first comment above does not use Is it possible for one instance of I was able to make the bug go away by marking the (Of course, this find might be a red herring. Adding additional synchronization might be masking a different concurrency bug somewhere else. Another explanation is that if values are supposed to be normalized before they are shared between threads, then there might be some place normalization isn't happening before sharing.) |
(I should also point out that adding the |
Normalization of Value instances happens as part of state fingerprinting which happens before a state is enqueued in the state queue and thus shared with another thread. Additionally, tlc2.tool.ModelChecker.doNextCheckInvariants(TLCState, TLCState, boolean, boolean) checks if the successor state is a valid (which spuriously fails) right after the successor has been created before it is even shared. |
Two successive runs with identical worker schedules but one showing the spurious violation. TLC2 Version 2.15 of Day Month 20??
Warning: Please run the Java VM which executes TLC with a throughput optimized garbage collector by passing the "-XX:+UseParallelGC" property.
(Use the -nowarning option to disable this warning.)
Running breadth-first search Model-Checking with fp 1 and seed -7081275985609514574 with 2 workers on 4 cores with 5964MB heap and 64MB offheap memory [pid: 20737] (Linux 4.18.0-16-generic amd64, Ubuntu 11.0.4 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /home/markus/src/TLA/_specs/models/tlcbugs/Github361/Github361.toolbox/Model_1/MC.tla
Parsing file /home/markus/src/TLA/_specs/models/tlcbugs/Github361/Github361.toolbox/Model_1/Github361.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/TLC.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Integers.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Sequences.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Naturals.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/FiniteSets.tla
Semantic processing of module Naturals
Semantic processing of module Integers
Semantic processing of module Sequences
Semantic processing of module Github361
Semantic processing of module FiniteSets
Semantic processing of module TLC
Semantic processing of module MC
Starting... (2019-09-16 12:20:41)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2019-09-16 12:20:42.
0 -> 0
0 -> 0
0 -> 1
0 -> 0
Model checking completed. No error has been found.
Estimates of the probability that TLC did not check all reachable states
because two distinct states had the same fingerprint:
calculated (optimistic): val = 0.0
5 states generated, 5 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 3.
The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 2 and the 95th percentile is 2).
Finished in 00s at (2019-09-16 12:20:42)
TLC2 Version 2.15 of Day Month 20??
Warning: Please run the Java VM which executes TLC with a throughput optimized garbage collector by passing the "-XX:+UseParallelGC" property.
(Use the -nowarning option to disable this warning.)
Running breadth-first search Model-Checking with fp 1 and seed 7412956290917719493 with 2 workers on 4 cores with 5964MB heap and 64MB offheap memory [pid: 20771] (Linux 4.18.0-16-generic amd64, Ubuntu 11.0.4 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /home/markus/src/TLA/_specs/models/tlcbugs/Github361/Github361.toolbox/Model_1/MC.tla
Parsing file /home/markus/src/TLA/_specs/models/tlcbugs/Github361/Github361.toolbox/Model_1/Github361.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/TLC.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Integers.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Sequences.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/Naturals.tla
Parsing file /home/markus/src/TLA/tla/tlatools/class/tla2sany/StandardModules/FiniteSets.tla
Semantic processing of module Naturals
Semantic processing of module Integers
Semantic processing of module Sequences
Semantic processing of module Github361
Semantic processing of module FiniteSets
Semantic processing of module TLC
Semantic processing of module MC
Starting... (2019-09-16 12:20:43)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2019-09-16 12:20:43.
0 -> 0
0 -> 0
0 -> 1
0 -> 0
HERE WE ARE
Error: Invariant TypeOK is violated.
Error: The behavior up to this point is:
State 1: <Initial predicate>
log = <<>>
State 2: <Begin line 48, col 5 to line 51, col 25 of module Github361>
log = <<[type |-> "BEGIN", client |-> "Alice", request |-> [type |-> "T1"]]>>
State 3: <End line 54, col 5 to line 57, col 25 of module Github361>
log = << [type |-> "BEGIN", client |-> "Alice", request |-> [type |-> "T1"]],
[type |-> "END", client |-> "Alice", output |-> [status |-> "OK"]] >>
5 states generated, 5 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 3.
The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 2 and the 95th percentile is 2).
Finished in 01s at (2019-09-16 12:20:43) |
Overall the bug seems highly sensitive to timing changes. Just adding a couple of System.err.println here and there reduces its probability of showing up. |
Yes, but
Awesome, good to know. Working under the assumption that the problem is failure to normalize some value before it is enqueued, I was able to fix the bug by adding the line |
I can confirm that (even) calling ''normalize" on UnionValue#set appears to make the spurious violation disappear. Note that UV#deepNormalize is called as part of constant processing at TLC startup, but not during state space exploration (SSE). In other words, it executes when 0-arity operators such as What I can see happening, is that worker A and B race to normalize such a Value. I.e. worker A starts to normalize e.g. a SetEnumValue instance and worker B reads an inconsistent SetEnumValue. |
Double checking; the following change to SetEnumValue also causes the spurious violation to disappear: public final Value normalize() {
try {
if (!this.isNorm) {
synchronized (this) {
if (!this.isNorm) {
this.elems.sort(true); // duplicates eliminated
this.isNorm = true;
}
}
}
return this;
}
catch (RuntimeException | OutOfMemoryError e) {
if (hasSource()) { throw FingerprintException.getNewHead(this, e); }
else { throw e; }
}
} |
I managed to manually cause two workers to race to normalize a SetEnumValue that represents a constant expression. The obvious fix would be to eagerly normalize constants. |
Yes, with your proposed implementation of However, in case you are considering committing it, I advise against considering the proposed implementation a complete fix, since:
Eagerly normalizing constants seems like a more complete fix to me since it (hopefully?) prevents workers from modifying shared objects. We could also consider a more invasive set of changes that would make |
I have yet to be convinced that we need thread-safe value instances, because state values (as opposed to constant values) will be normalized prior to sharing due to fingerprinting. Constant values on the other hand can be eagerly normalized before SSE. |
That makes sense to me! |
After spending more quality time with Value#deepNormalize I noticed that sub-classes show inconsistent behavior, i.e. Value#isNormalized returns false after Value#deepNormalize has been called. package tlc2.value.impl;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;
import org.junit.Test;
import util.InternTable;
import util.UniqueString;
public class DeepNormalizeValueTest {
/**
* Test method for {@link tlc2.value.impl.UnionValue#deepNormalize()}.
*/
@Test
public void union() {
final ValueVec vec = new ValueVec();
vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(42)}, false));
vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(23)}, false));
vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(4711)}, false));
vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(1)}, false));
final UnionValue uv = new UnionValue(new SetEnumValue(vec, false));
assertFalse(uv.isNormalized());
assertFalse(uv.set.isNormalized());
uv.deepNormalize();
assertTrue(uv.set.isNormalized());
assertTrue(uv.isNormalized());
}
/**
* Test method for {@link tlc2.value.impl.SetCapValue#deepNormalize()}.
*/
@Test
public void setcap() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
SetCapValue scv = new SetCapValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
assertFalse(scv.isNormalized());
assertFalse(scv.set1.isNormalized());
assertFalse(scv.set2.isNormalized());
scv.deepNormalize();
assertTrue(scv.set1.isNormalized());
assertTrue(scv.set2.isNormalized());
assertTrue(scv.isNormalized());
}
@Test
public void setcup() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
SetCupValue scv = new SetCupValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
assertFalse(scv.isNormalized());
assertFalse(scv.set1.isNormalized());
assertFalse(scv.set2.isNormalized());
scv.deepNormalize();
assertTrue(scv.set1.isNormalized());
assertTrue(scv.set2.isNormalized());
assertTrue(scv.isNormalized());
}
@Test
public void setdiff() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
SetDiffValue sdv = new SetDiffValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
assertFalse(sdv.isNormalized());
assertFalse(sdv.set1.isNormalized());
assertFalse(sdv.set2.isNormalized());
sdv.deepNormalize();
assertTrue(sdv.isNormalized());
assertTrue(sdv.set1.isNormalized());
assertTrue(sdv.set2.isNormalized());
}
@Test
public void subset() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
SubsetValue sub = new SubsetValue(new SetEnumValue(vec, false));
assertFalse(sub.isNormalized());
assertFalse(sub.set.isNormalized());
sub.deepNormalize();
assertTrue(sub.set.isNormalized());
assertTrue(sub.isNormalized());
}
@Test
public void record() {
final InternTable internTable = new InternTable(2);
final UniqueString a = internTable.put("a");
final UniqueString b = internTable.put("b");
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
final Value aVal = new SetEnumValue(vec, false);
final Value bVal = new SetEnumValue(vec, false);
final RecordValue rcdv = new RecordValue(new UniqueString[] {b, a}, new Value[] {bVal, aVal}, false);
assertFalse(rcdv.isNormalized());
for (Value v : rcdv.values) {
assertFalse(v.isNormalized());
}
rcdv.deepNormalize();
for (Value v : rcdv.values) {
assertTrue(v.isNormalized());
}
assertTrue(rcdv.isNormalized());
}
@Test
public void fcnrecord() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
final Value aVal = new SetEnumValue(vec, false);
final Value bVal = new SetEnumValue(vec, false);
final FcnRcdValue rcdv = new FcnRcdValue(new Value[] { new StringValue("B"), new StringValue("A") },
new Value[] { bVal, aVal }, false);
assertFalse(rcdv.isNormalized());
for (Value v : rcdv.values) {
assertFalse(v.isNormalized());
}
rcdv.deepNormalize();
for (Value v : rcdv.values) {
assertTrue(v.isNormalized());
}
assertTrue(rcdv.isNormalized());
}
@Test
public void tuple() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
final Value aVal = new SetEnumValue(vec, false);
final TupleValue tuple = new TupleValue(aVal);
for (Value v : tuple.elems) {
assertFalse(v.isNormalized());
}
assertFalse(tuple.isNormalized());
tuple.deepNormalize();
assertTrue(tuple.isNormalized());
for (Value v : tuple.elems) {
assertTrue(v.isNormalized());
}
}
@Test
public void setOfTuple() {
final IntervalValue intVal = new IntervalValue(1, 2);
final SetOfTuplesValue inner = new SetOfTuplesValue(intVal, intVal);
final SetOfTuplesValue tuples = new SetOfTuplesValue(inner, inner);
for (Value v : tuples.sets) {
assertFalse(v.isNormalized());
}
assertFalse(tuples.isNormalized());
tuples.deepNormalize();
assertTrue(tuples.isNormalized());
for (Value v : tuples.sets) {
assertTrue(v.isNormalized());
}
}
@Test
public void setOfRcds() {
final Value[] values = new Value[3];
values[0] = new SetEnumValue(getValue(7, "a"), true);
values[1] = new IntervalValue(1, 2);
values[2] = new IntervalValue(1, 4);
final SetOfRcdsValue setOfRcrds= new SetOfRcdsValue(getNames(3), values, true);
for (Value v : setOfRcrds.values) {
assertFalse(v.isNormalized());
}
assertFalse(setOfRcrds.isNormalized());
setOfRcrds.deepNormalize();
assertTrue(setOfRcrds.isNormalized());
for (Value v : setOfRcrds.values) {
assertTrue(v.isNormalized());
}
}
private static final Value[] getValue(final int n, String str) {
final Value[] values = new Value[n];
for (int i = 0; i < n; i++) {
values[i] = new StringValue(str + i);
}
return values;
}
private static final UniqueString[] getNames(final int n) {
final UniqueString[] names = new UniqueString[n];
for (int i = 0; i < names.length; i++) {
names[i] = UniqueString.uniqueStringOf("N" + i);
}
return names;
}
} This indicates that it is best to call Value#fingerprint in addition to Value#deepNormalize in SpecProcessor to guarantee fully initialized Value instances (fingerprint internally triggers convertAndCache when defined in a Value sub-class). This way, SpecProcessor created Value instances follow the same life-cycle as Value instances created during SSE. package tlc2.value.impl;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;
import org.junit.BeforeClass;
import org.junit.Test;
import tlc2.util.FP64;
import util.InternTable;
import util.UniqueString;
public class DeepNormalizeValueTest {
@BeforeClass
public static void setup() {
FP64.Init();
}
/**
* Test method for {@link tlc2.value.impl.UnionValue#deepNormalize()}.
*/
@Test
public void union() {
final ValueVec vec = new ValueVec();
vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(42)}, false));
vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(23)}, false));
vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(4711)}, false));
vec.addElement(new SetEnumValue(new IntValue[] {IntValue.gen(1)}, false));
final UnionValue uv = new UnionValue(new SetEnumValue(vec, false));
assertFalse(uv.isNormalized());
assertFalse(uv.set.isNormalized());
uv.fingerPrint(0L);
assertTrue(uv.set.isNormalized());
assertTrue(uv.isNormalized());
}
/**
* Test method for {@link tlc2.value.impl.SetCapValue#deepNormalize()}.
*/
@Test
public void setcap() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
SetCapValue scv = new SetCapValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
assertFalse(scv.isNormalized());
assertFalse(scv.set1.isNormalized());
assertFalse(scv.set2.isNormalized());
scv.deepNormalize();
scv.fingerPrint(0L);
assertTrue(scv.set1.isNormalized());
assertTrue(scv.set2.isNormalized());
assertTrue(scv.isNormalized());
}
@Test
public void setcup() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
SetCupValue scv = new SetCupValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
assertFalse(scv.isNormalized());
assertFalse(scv.set1.isNormalized());
assertFalse(scv.set2.isNormalized());
scv.fingerPrint(0L);
assertTrue(scv.set1.isNormalized());
assertTrue(scv.set2.isNormalized());
assertTrue(scv.isNormalized());
}
@Test
public void setdiff() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
SetDiffValue sdv = new SetDiffValue(new SetEnumValue(vec, false), new SetEnumValue(vec, false));
assertFalse(sdv.isNormalized());
assertFalse(sdv.set1.isNormalized());
assertFalse(sdv.set2.isNormalized());
sdv.deepNormalize();
sdv.fingerPrint(0L);
assertTrue(sdv.isNormalized());
assertTrue(sdv.set1.isNormalized());
assertTrue(sdv.set2.isNormalized());
}
@Test
public void subset() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
SubsetValue sub = new SubsetValue(new SetEnumValue(vec, false));
assertFalse(sub.isNormalized());
assertFalse(sub.set.isNormalized());
sub.fingerPrint(0L);
assertTrue(sub.set.isNormalized());
assertTrue(sub.isNormalized());
}
@Test
public void record() {
final InternTable internTable = new InternTable(2);
final UniqueString a = internTable.put("a");
final UniqueString b = internTable.put("b");
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
final Value aVal = new SetEnumValue(vec, false);
final Value bVal = new SetEnumValue(vec, false);
final RecordValue rcdv = new RecordValue(new UniqueString[] {b, a}, new Value[] {bVal, aVal}, false);
assertFalse(rcdv.isNormalized());
for (Value v : rcdv.values) {
assertFalse(v.isNormalized());
}
rcdv.fingerPrint(0L);
for (Value v : rcdv.values) {
assertTrue(v.isNormalized());
}
assertTrue(rcdv.isNormalized());
}
@Test
public void fcnrecord() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
final Value aVal = new SetEnumValue(vec, false);
final Value bVal = new SetEnumValue(vec, false);
final FcnRcdValue rcdv = new FcnRcdValue(new Value[] { new StringValue("B"), new StringValue("A") },
new Value[] { bVal, aVal }, false);
assertFalse(rcdv.isNormalized());
for (Value v : rcdv.values) {
assertFalse(v.isNormalized());
}
rcdv.fingerPrint(0L);
for (Value v : rcdv.values) {
assertTrue(v.isNormalized());
}
assertTrue(rcdv.isNormalized());
}
@Test
public void tuple() {
final ValueVec vec = new ValueVec();
vec.addElement(IntValue.gen(42));
vec.addElement(IntValue.gen(23));
vec.addElement(IntValue.gen(4711));
vec.addElement(IntValue.gen(1));
final Value aVal = new SetEnumValue(vec, false);
final TupleValue tuple = new TupleValue(aVal);
tuple.deepNormalize();
tuple.fingerPrint(0L);
assertTrue(tuple.isNormalized());
for (Value v : tuple.elems) {
assertTrue(v.isNormalized());
}
}
@Test
public void setOfTuple() {
final IntervalValue intVal = new IntervalValue(1, 2);
final SetOfTuplesValue inner = new SetOfTuplesValue(intVal, intVal);
final SetOfTuplesValue tuples = new SetOfTuplesValue(inner, inner);
tuples.deepNormalize();
tuples.fingerPrint(0L);
assertTrue(tuples.isNormalized());
for (Value v : tuples.sets) {
assertTrue(v.isNormalized());
}
}
@Test
public void setOfRcds() {
final Value[] values = new Value[3];
values[0] = new SetEnumValue(getValue(7, "a"), true);
values[1] = new IntervalValue(1, 2);
values[2] = new IntervalValue(1, 4);
final SetOfRcdsValue setOfRcrds= new SetOfRcdsValue(getNames(3), values, true);
setOfRcrds.deepNormalize();
setOfRcrds.fingerPrint(0L);
assertTrue(setOfRcrds.isNormalized());
for (Value v : setOfRcrds.values) {
assertTrue(v.isNormalized());
}
}
private static final Value[] getValue(final int n, String str) {
final Value[] values = new Value[n];
for (int i = 0; i < n; i++) {
values[i] = new StringValue(str + i);
}
return values;
}
private static final UniqueString[] getNames(final int n) {
final UniqueString[] names = new UniqueString[n];
for (int i = 0; i < names.length; i++) {
names[i] = UniqueString.uniqueStringOf("N" + i);
}
return names;
}
} |
found in overwrites of Value#deepNormalize. This omission surfaced through a race condition that led to a spurious safety violation (#361): 1) A TLA+ spec defines a (zero-arity) operator s.a. "Foo == UNION {...}" that appears in an invariant. 2) SpecProcessor#processConstantDefns eagerly evaluates the operator Foo at startup and inserts its Value result UV into the corresponding node of the semantic graph. 3) Two workers check if two states violate the invariant which triggers UnionValue#member, which internally causes this.set to be normalized. Since Value instances are not thread-safe because they are expected to be fully normalized during state space exploration, the two workers race to normalize this.set. 4) Worker A gets ahead and loops over the elements in UV#member while worker B still normalizes UV. Worker A reads inconsistent data and thus reports the invariant to be violated. Thanks to Calvin Loncaric for suggesting this fix. Fixes Github issue #361: UnionValue#deepNormalize omits to normalize set leading to parallel TLC nondeterministically reporting spurious safety violation #361 [Bug][TLC]
Fantastic, thank you for the quick resolution on this. :) |
For some specs such as the ones below, the current fix to eagerly evaluate constant definition causes TLC to take forever to start. ---------------------------- MODULE CarTalkPuzzle ---------------------------
(***************************************************************************)
(* Car Talk is a U.S. radio program about car repair. Each show includes *)
(* a puzzle, which is often a little mathematical problem. Usually, those *)
(* problems are easy for a mathematically sophisticated listener to solve. *)
(* However, I was not able immediately to see how to solve the puzzle from *)
(* the 22 October 2011 program. I decided to specify the problem in TLA+ *)
(* and let TLC (the TLA+ model checker) compute the solution. This is the *)
(* specification I wrote. (I have tried to explain in comments all TLA+ *)
(* notation that is not standard mathematical notation.) Once TLC had *)
(* found the solution, it was not hard to understand why it worked. *)
(* *)
(* Here is the problem. A farmer has a 40 pound stone and a balance *)
(* scale. How can he break the stone into 4 pieces so that, using those *)
(* pieces and the balance scale, he can weigh out any integral number of *)
(* pounds of corn from 1 pound through 40 pounds. *)
(***************************************************************************)
(***************************************************************************)
(* The following statement imports the standard operators of arithmetic *)
(* such as + and =< (less than or equals). It also defines the operator *)
(* .. so that i..j is the set of all integers k with i =< k =< j. *)
(***************************************************************************)
EXTENDS Integers
(***************************************************************************)
(* For generality, I solve the problem of breaking an N pound stone into P *)
(* pieces. The following statement declares N and P to be unspecified *)
(* constant values. *)
(***************************************************************************)
CONSTANTS N, P
(***************************************************************************)
(* I define the operator Sum so that if f is any integer-valued function, *)
(* and S any finite subset of its domain, then Sum(f, S) is the sum of *)
(* f[x] for all x in S. (In TLA+, function application is indicated by *)
(* square brackets instead of parentheses, as it is in ordinary math.) *)
(* *)
(* A RECURSIVE declaration must precede a recursively defined operator. *)
(* The operator CHOOSE is known to logicians as Hilbert's Epsilon. It is *)
(* defined so that CHOOSE x \in S : P(x) equals some unspecified value v *)
(* such that P(v) is true, if such a value exists. *)
(***************************************************************************)
RECURSIVE Sum(_,_)
Sum(f,S) == IF S = {} THEN 0
ELSE LET x == CHOOSE x \in S : TRUE
IN f[x] + Sum(f, S \ {x})
(***************************************************************************)
(* I now define the set Break of all "breaks", where a break represents a *)
(* method of breaking the stone into P (integer-weight) pieces. The *)
(* obvious definition of a break would be a set of weights. However, that *)
(* doesn't work because it doesn't handle the situation in which two of *)
(* pieces have the same weight. Instead, I define a break of the N pound *)
(* stone into P pieces to be a function B from 1..P (the integers from 1 *)
(* through P) into 1..N such that B[i] is the weight of piece number i. *)
(* To avoid solutions that differ only by how the pieces are numbered, I *)
(* consider only breaks in which the pieces are numbered in non-decreasing *)
(* order of their weight. This leads to the following definition of the *)
(* set Break of all breaks. *)
(* *)
(* In TLA+, [S -> T] is the set of all functions with domain S and range a *)
(* subset of T. \A and \E are the universal and existential quantifiers. *)
(***************************************************************************)
Break == {B \in [1..P -> 1..N] : Sum(B, 1..P) = N
/\ \A i \in 1..P : \A j \in (i+1)..P : B[i] =< B[j]}
(***************************************************************************)
(* To weigh a quantity of corn, we can put some of the weights on the same *)
(* side of the balance scale as the corn and other weights on the other *)
(* side of the balance. The following operator is true for a weight w, a *)
(* break B, and sets S and T of pieces if w plus the weight of the pieces *)
(* in S equals the weight of the pieces in T. The elements of S and T are *)
(* piece numbers (numbers in 1..P), so Sum(B, S) is the weight of the *)
(* pieces in S. *)
(***************************************************************************)
IsRepresentation(w, B, S, T) == S \cap T = {}
/\ w + Sum(B,S) = Sum(B,T)
(***************************************************************************)
(* I now define IsSolution(B) to be true iff break B solves the problem, *)
(* meaning that it can be used to balance any weight in 1..N. *)
(* *)
(* SUBSET S is the set of all subsets of S (the power set of S). *)
(***************************************************************************)
IsSolution(B) == \A w \in 1..N :
\E S, T \in SUBSET (1..P) : IsRepresentation(w, B, S, T)
(***************************************************************************)
(* I define AllSolutions to be the set of all breaks B that solve the *)
(* problem. *)
(***************************************************************************)
AllSolutions == { B \in Break : IsSolution(B) }
=========
Created by Leslie Lamport on 26 October 2011 With N = 121 and P = 5. ------------------------- MODULE ConfluentAlgorithm -------------------------
\* This is an excerpt of the actual spec.
EXTENDS TLC
Nodes == {1, 2, 3, 4, 5}
InitNodes == {1}
Arcs == {<<1, 2>>, <<2, 1>>, <<2, 3>>, <<3, 3>>, <<3, 4>>, <<4, 5>>, <<5, 3>>}
null == 6
Reachable(n, m) ==
LET NbrsNotIn(x, S) == {y \in Nodes \ S : <<x, y>> \in Arcs}
RECURSIVE RF(_)
RF(S) == LET Nxt == UNION { NbrsNotIn(x, S) : x \in S }
IN IF m \in S THEN TRUE
ELSE IF Nxt = {} THEN FALSE
ELSE RF(S \cup Nxt)
IN (n = m) \/ RF({n})
MCC ==
LET RECURSIVE M(_, _)
M(Partial, Rest) ==
IF Rest = {} THEN Partial
ELSE LET x == CHOOSE x \in Rest : TRUE
CC == {y \in Rest : /\ Reachable(x, y)
/\ Reachable(y, x)}
IN M(Partial \cup {CC}, Rest \ CC)
IN M({}, Nodes)
Outgoing(node) == {a \in Arcs : a[1] = node}
Vertices == [nodes : SUBSET Nodes, succ : Nodes \cup {null}, dead : BOOLEAN]
Partitions == {P \in SUBSET Vertices :
/\ \A v1, v2 \in P :
(v1 # v2) => ((v1.nodes \cap v2.nodes) = {})
/\ \A v \in P : /\ v.nodes # {}
/\ \E w \in P : \/ v.succ = null
\/ v.succ \in w.nodes}
ASSUME(PrintT(Partitions))
\* [...]
================ |
Root Cause:
Lets assume a spec with one variable
Under these two assumptions, both Workers will race to enumerate (fingerprint) Note that the fact that we had to assume 1) and 2) explains why this race condition didn't surfaced earlier. However, the assumptions are true in real-world executions of TLC. For example, the evaluation of a more complex NonEmptySubsS, with a deeply nested hierarchy of values, may not require the evaluation of nested values in the initial states. @xxyzzn came up with another scenario that might lead to the calculation of bogus fingerprints: EXTENDS Integers, Sequences
VARIABLE x, y
Foo == { s \in SUBSET {1,2,3,4} : s # {} }
Init == x = << >> /\ y \in 1..2
Next == x' \in {<<i, Foo>> : i \in 1..2} /\ UNCHANGED y Variable Attempted Fix A:At startup, (eagerly) fingerprint all values of VSG. Problem: Not all values in VSG need not be fingerprinted to completely check a spec. For some specs such as CarTalkPuzzle, this is even prohibitive because explicit enumeration is infeasible: TLC will hang at startup. Deferring fingerprinting of VSG until after the assumptions have been checked, such as when the initial states are (atomically) generated, is no general solution. Proposed Fix A:Properly synchronize values such that no race conditions are possible if two or more Workers enumerate them concurrently. Disadvantages:
Proposed Fix B:For each worker, create a copy of VSG (technically borrows from the concept of a Disadvantages:
|
No, i think this is clearly written. Do we have any idea of how performance would be changed were we to put time into making sure that all |
@quaeler The reason why I did not change |
I have encountered a situation where command-line TLC outputs a spurious invariant failure when I run it with more than one worker.
System information:
TLC2 Version 2.14 of 10 July 2019
(compiled from source myself from thev1.6.0
tag)Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.2+13, mixed mode)
(MacOS),OpenJDK 64-Bit Server VM 18.9 (build 11.0.3+7-LTS, mixed mode, sharing)
(Linux)-XX:+UseParallelGC
(I did not experiment with other flags)Apologies for the long example. I have done my best to minimize it, but the bug is slippery and disappears due to seemingly irrelevant changes to the specification.
Foo.tla:
Foo.cfg:
Single-threaded invocation (always succeeds):
Multi-threaded invocation (sometimes succeeds, sometimes fails):
The failure in the second multi-threaded request looks wrong to me; the invariant check should pass for the displayed state.
The text was updated successfully, but these errors were encountered: