New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixup timeouts #20321
Fixup timeouts #20321
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % lint errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % some nits (mostly requesting code comments)
executor = threading.Thread(target=self.run_func) | ||
executor.start() | ||
|
||
flag = self.result_flag.wait(self.timeout + 2 * self.extra_timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: perhaps call this finished
or something instead of the generic flag
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe worth explaining here why we have a multiplier of 2 (I assume it's because the inner TestExecutor
uses single extra_timeout
).
@@ -559,7 +561,19 @@ def run_test(self): | |||
self.logger.info("Run %d/%d" % (self.run_count, self.rerun)) | |||
self.send_message("reset") | |||
self.run_count += 1 | |||
wait_timeout = (self.state.test.timeout * self.executor_kwargs['timeout_multiplier'] + | |||
3 * self.executor_cls.extra_timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth documenting here why the multiplier is 3.
test = self.state.test | ||
self.test_ended(test, | ||
(test.result_cls("EXTERNAL-TIMEOUT", | ||
"Executor hit external timeout " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we have a different message here to differentiate this from the EXTERNAL-TIMEOUT
from executors? Maybe "TestRunner hit external timeout"?
timer = threading.Timer(wait_timeout, self._timeout) | ||
timer.start() | ||
|
||
self._run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is the root cause? We are stuck here forever even though the timer thread has set the flag...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There's a few unused And please take the Chromium export commit out before landing this PR. Thanks for the quick (and proper!) fix, James! |
25b2843
to
1521212
Compare
Most executors were implementing very common variations on "run a function for up to some timeout and then mark the result as EXTERNAL-TIMEOUT". But the logic for these weren't shared and so fixes weren't shared between implementations. Move these into a common subclass such that the implementations can be shared, and adopt the logic from the WebDriverExecutor which runs the function in a thread and waits a given amount of time for the flag to be set. In contrast to the MarionetteExecutor implementation this actually works in the case where there's a browser hang.
In the case where the test just doesn't complete after the expected amount of time (plus some slack), forcibly end the test in the parent process. This eventually causes the child process to be restarted.
1521212
to
78fc8a5
Compare
After #20321, we now sometimes see an error from mozlog when the new external timeout is triggered: "ERROR test_end for ... logged while not in progress" A theory is that we call test_ended from a different thread which might not have all the correct states. This attempted fix switches to sending a test_ended message to the queue instead, similar to what TestRunner does normally.
After #20321, we now sometimes see an error from mozlog when the new external timeout is triggered: "ERROR test_end for ... logged while not in progress" A theory is that we call test_ended from a different thread which might not have all the correct states. This attempted fix switches to sending a test_ended message to the queue instead, similar to what TestRunner does normally. In doing so, we also need to treat duplicate test_ended messages as non-fatal errors.
After #20321, we now sometimes see an error from mozlog when the new external timeout is triggered: "ERROR test_end for ... logged while not in progress" A theory is that we call test_ended from a different thread which might not have all the correct states. This attempted fix switches to sending a test_ended message to the queue instead, similar to what TestRunner does normally. In doing so, we also need to treat duplicate test_ended messages as non-fatal errors. Hopefully this fixes #20607 .
No description provided.