Fixup timeouts #20321

jgraham · 2019-11-19T15:14:08Z

No description provided.

jgraham · 2019-11-19T15:17:42Z

https://community-tc.services.mozilla.com/tasks/groups/YPwHVFogTsawmjOeMl6Kpw for a Fx run and https://community-tc.services.mozilla.com/tasks/groups/YPwHVFogTsawmjOeMl6Kpw for Chrome

LukeZielinski

LGTM % lint errors

Hexcles

LGTM % some nits (mostly requesting code comments)

Hexcles · 2019-11-20T23:14:38Z

tools/wptrunner/wptrunner/executors/base.py

+        executor = threading.Thread(target=self.run_func)
+        executor.start()
+
+        flag = self.result_flag.wait(self.timeout + 2 * self.extra_timeout)


Nit: perhaps call this finished or something instead of the generic flag.

Also maybe worth explaining here why we have a multiplier of 2 (I assume it's because the inner TestExecutor uses single extra_timeout).

Hexcles · 2019-11-20T23:20:15Z

tools/wptrunner/wptrunner/testrunner.py

@@ -559,7 +561,19 @@ def run_test(self):
            self.logger.info("Run %d/%d" % (self.run_count, self.rerun))
            self.send_message("reset")
        self.run_count += 1
+        wait_timeout = (self.state.test.timeout * self.executor_kwargs['timeout_multiplier'] +
+                        3 * self.executor_cls.extra_timeout)


Maybe worth documenting here why the multiplier is 3.

Hexcles · 2019-11-20T23:21:20Z

tools/wptrunner/wptrunner/testrunner.py

+        test = self.state.test
+        self.test_ended(test,
+                        (test.result_cls("EXTERNAL-TIMEOUT",
+                                         "Executor hit external timeout "


Shall we have a different message here to differentiate this from the EXTERNAL-TIMEOUT from executors? Maybe "TestRunner hit external timeout"?

Hexcles · 2019-11-20T23:29:28Z

tools/wptrunner/wptrunner/executors/executormarionette.py

-        timer = threading.Timer(wait_timeout, self._timeout)
-        timer.start()
-
-        self._run()


I guess this is the root cause? We are stuck here forever even though the timer thread has set the flag...

Hexcles · 2019-11-20T23:30:51Z

There's a few unused import sys in various executors.

And please take the Chromium export commit out before landing this PR.

Thanks for the quick (and proper!) fix, James!

Most executors were implementing very common variations on "run a function for up to some timeout and then mark the result as EXTERNAL-TIMEOUT". But the logic for these weren't shared and so fixes weren't shared between implementations. Move these into a common subclass such that the implementations can be shared, and adopt the logic from the WebDriverExecutor which runs the function in a thread and waits a given amount of time for the flag to be set. In contrast to the MarionetteExecutor implementation this actually works in the case where there's a browser hang.

In the case where the test just doesn't complete after the expected amount of time (plus some slack), forcibly end the test in the parent process. This eventually causes the child process to be restarted.

After #20321, we now sometimes see an error from mozlog when the new external timeout is triggered: "ERROR test_end for ... logged while not in progress" A theory is that we call test_ended from a different thread which might not have all the correct states. This attempted fix switches to sending a test_ended message to the queue instead, similar to what TestRunner does normally.

After #20321, we now sometimes see an error from mozlog when the new external timeout is triggered: "ERROR test_end for ... logged while not in progress" A theory is that we call test_ended from a different thread which might not have all the correct states. This attempted fix switches to sending a test_ended message to the queue instead, similar to what TestRunner does normally. In doing so, we also need to treat duplicate test_ended messages as non-fatal errors.

After #20321, we now sometimes see an error from mozlog when the new external timeout is triggered: "ERROR test_end for ... logged while not in progress" A theory is that we call test_ended from a different thread which might not have all the correct states. This attempted fix switches to sending a test_ended message to the queue instead, similar to what TestRunner does normally. In doing so, we also need to treat duplicate test_ended messages as non-fatal errors. Hopefully this fixes #20607 .

wpt-pr-bot added html infra wptrunner The automated test runner, commonly called through ./wpt run labels Nov 19, 2019

wpt-pr-bot assigned zcorpan Nov 19, 2019

wpt-pr-bot requested review from annevk, domenic, foolip, gsnedders, jdm, jugglinmike, tkent-google, zcorpan and zqzhang November 19, 2019 15:14

jgraham mentioned this pull request Nov 19, 2019

Limit backtracking on regexp called from blink. #20245

Merged

LukeZielinski approved these changes Nov 20, 2019

View reviewed changes

Hexcles approved these changes Nov 20, 2019

View reviewed changes

jgraham force-pushed the fixup_timeouts branch 2 times, most recently from 25b2843 to 1521212 Compare November 26, 2019 18:54

jgraham added 2 commits November 26, 2019 19:09

Add a timer of last resort to the TestRunner

78fc8a5

In the case where the test just doesn't complete after the expected amount of time (plus some slack), forcibly end the test in the parent process. This eventually causes the child process to be restarted.

jgraham force-pushed the fixup_timeouts branch from 1521212 to 78fc8a5 Compare November 26, 2019 19:09

jgraham merged commit 35951c3 into master Nov 26, 2019

jgraham deleted the fixup_timeouts branch November 26, 2019 19:24

foolip mentioned this pull request Dec 4, 2019

Safari stable runs produce duplicate, conflicting results for some tests #20607

Closed

Hexcles mentioned this pull request Dec 6, 2019

[wptrunner] Attempt to fix EXTERNAL-TIMEOUT #20664

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixup timeouts #20321

Fixup timeouts #20321

jgraham commented Nov 19, 2019

jgraham commented Nov 19, 2019

LukeZielinski left a comment

Hexcles left a comment

Hexcles Nov 20, 2019

Hexcles Nov 20, 2019

Hexcles Nov 20, 2019

Hexcles Nov 20, 2019

Hexcles Nov 20, 2019

jgraham Nov 26, 2019

Hexcles commented Nov 20, 2019 •

edited

Fixup timeouts #20321

Fixup timeouts #20321

Conversation

jgraham commented Nov 19, 2019

jgraham commented Nov 19, 2019

LukeZielinski left a comment

Choose a reason for hiding this comment

Hexcles left a comment

Choose a reason for hiding this comment

Hexcles Nov 20, 2019

Choose a reason for hiding this comment

Hexcles Nov 20, 2019

Choose a reason for hiding this comment

Hexcles Nov 20, 2019

Choose a reason for hiding this comment

Hexcles Nov 20, 2019

Choose a reason for hiding this comment

Hexcles Nov 20, 2019

Choose a reason for hiding this comment

jgraham Nov 26, 2019

Choose a reason for hiding this comment

Hexcles commented Nov 20, 2019 • edited

Hexcles commented Nov 20, 2019 •

edited