-
-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLC with multiple workers in DFID mode intermittently fails to find safety invariant violation #548
Comments
TLC's test suite has only two basic tests for DFID mode (checked by adding Below is a patch that fixes this bug at the expense of breaking TLC's continue flag. However, the DFID implementation appears generally broken when it comes to multi-threading, why it's safer to disable multi-threaded DFID (see second patch). diff --git a/tlatools/org.lamport.tlatools/src/tlc2/tool/DFIDModelChecker.java b/tlatools/org.lamport.tlatools/src/tlc2/tool/DFIDModelChecker.java
index b8efccf91..5a541c2e9 100644
--- a/tlatools/org.lamport.tlatools/src/tlc2/tool/DFIDModelChecker.java
+++ b/tlatools/org.lamport.tlatools/src/tlc2/tool/DFIDModelChecker.java
@@ -4,6 +4,7 @@ package tlc2.tool;
import java.io.IOException;
import java.util.Set;
+import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicLong;
import java.util.stream.Collectors;
@@ -629,21 +630,36 @@ public class DFIDModelChecker extends AbstractChecker
}
}
+ private final AtomicBoolean foundError = new AtomicBoolean(false);
+
/**
* Set the error state.
* <strong>Note:</note> this method must be protected by lock
*/
public boolean setErrState(TLCState curState, TLCState succState, boolean keep, int errorCode)
{
- boolean result = super.setErrState(curState, succState, keep, errorCode);
- if (!result)
- {
- return false;
- } else
- {
- this.setStop(2);
- return true;
+ assert Thread.holdsLock(this) : "Caller thread has to hold monitor!";
+ //MAK 12/28/2020: Instead of branching on the return value of super.setErrState, which
+ // is determined on the value of this.done, branch on a dedicated variable foundError.
+ // The variable done is set to true under *two* condition: (i) the first worker from the set
+ // of workers found an (invariant) violation and (ii) when a subset of the workers do not find
+ // unexplored initial states (see tlc2.tool.DFIDWorker.getInit()). Case (ii) is what
+ // interferes with this code and can cause the worker that attempts to report a violation
+ // to read done=true and erroneously skip reporting the violation. What this change breaks
+ // is TLC's continue functionality because foundError is never set back to false.
+ // The variable foundError wouldn't have to be an AtomicBoolean, because foundError is only
+ // read and written from synchronized blocks on this instance.
+ if (foundError.get()) {
+ return false;
}
+ IdThread.resetCurrentState();
+ this.predErrState = curState;
+ this.errState = (succState == null) ? curState : succState;
+ this.errorCode = errorCode;
+ this.done = true;
+ this.keepCallStack = keep;
+ this.setStop(2);
+ return true;
}
/** Disable multi-threaded DFID: diff --git a/tlatools/org.lamport.tlatools/src/tlc2/tool/DFIDModelChecker.java b/tlatools/org.lamport.tlatools/src/tlc2/tool/DFIDModelChecker.java
index b8efccf91..847af4a54 100644
--- a/tlatools/org.lamport.tlatools/src/tlc2/tool/DFIDModelChecker.java
+++ b/tlatools/org.lamport.tlatools/src/tlc2/tool/DFIDModelChecker.java
@@ -20,6 +20,7 @@ import tlc2.util.IStateWriter;
import tlc2.util.IdThread;
import tlc2.util.LongVec;
import tlc2.util.SetOfStates;
+import util.Assert;
import util.FileUtil;
import util.UniqueString;
@@ -61,10 +62,15 @@ public class DFIDModelChecker extends AbstractChecker
this.theInitStates = null;
this.theInitFPs = null;
this.theFPSet = new MemFPIntSet(); // init the state set
- this.theFPSet.init(TLCGlobals.getNumWorkers(), this.metadir, this.tool.getRootFile());
+ final int numWorkers = TLCGlobals.getNumWorkers();
+ if (numWorkers > 1) {
+ Assert.fail(EC.GENERAL,
+ "Multi-threaded DFID is broken (https://github.com/tlaplus/tlaplus/issues/548). Please remove the '-workers' flag from your command-line.");
+ }
+ this.theFPSet.init(numWorkers, this.metadir, this.tool.getRootFile());
// Initialize all the workers:
- this.workers = new DFIDWorker[TLCGlobals.getNumWorkers()];
+ this.workers = new DFIDWorker[numWorkers];
this.numOfGenStates = new AtomicLong(0);
} Given the popularity of DFID mode and the existence of a workaround ( |
Related: #544 |
safety invariant violation. Addresses Github issue tlaplus#548 tlaplus#548 [Bug][TLC]
Version Info
TLC version: 1.7.0 release
JRE version: AdoptOpenJDK build 15.0.1+9
OS version: Windows 10
What happened?
I continually ran TLC in DFID mode against a simple test spec which produces a safety invariant violation. I noticed a rare failure where TLC did not find a safety violation, seemingly only when a
-workers 4
command line parameter was present. Here's an example where I ran the same command twice in a row on the same files and got different results:Further testing shows this occurs about 1/50-100 runs of the tool.
What did you expect to happen?
I expected TLC to find the safety invariant violation every time. I'm pretty sure DFID should fully explore the state space and find the safety invariant violation.
Steps to reproduce
java tlc2.TLC -config .\TESpecSafetyTest.cfg -workers 4 -dfid 10 .\TESpecTest.tla
It helps to use a loop on the command line.
The text was updated successfully, but these errors were encountered: