New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebkitGTK runs missing from wpt.fyi since 25th of Feb #33186
Comments
It seems that the test runs are timing out on TC. This may be related to #30834 I'm trying to reproduce the issue locally (on the wpt docker) to see what is going on. |
I think I found the problem that causes the hang, and is a tricky one. It seems the problem is a bug on You can reproduce the issue with this command:
You will see that the tests crashes (WebKit crash) but then the WPT runner hangs forever. That should not happen, the WPT runner should end in a defined aumount of time. What is happening is that after the WPT runner detects that the WebKit WebDriver has hanged it tries to kill it and for that it uses From the WPT/WebDriver side this is how the corner case that triggers the bug happens:
Killing (or sending the TERM) signal to WebKitWebDriver don't propagates that signal to the MiniBrowser or to the WebKitProcess and all those child process hold a descriptor to the main stdout. When a test ends normally WebKitWebDriver is asked to shut down the browser and that makes all those child process to end as expected. But when WebKitWebDriver is crashed (or the MiniBrowser not responding) then WebKitWebDriver is unable to shut-down its childs process. Those childs still hold a descriptor to stdout and then a hang happens when Proposed solution? Meanwhile the underlying issue on mozprocess is not fixed (and a new release of mozprocess is done and we update to it) we can workaround the issue by passing the timeout paremeter to the Passing the timeout paremeter seems a good idea in any case. I can't imagine any case where we want to wait for a process to be killed more than X seconds. Waiting means that wptrunner will be blocked and that the test suite won't continue running. That is bad. |
…adline There is a bug on mozprocess ProcessHandler.kill that may cause it to hang when the process to be killed has daemonized childs. To avoid this situation, pass the timeout parameter to ProcessHandler.kill to ensure it always returns within it. So wptrunner can continue executing other tests and the whole test suite doesn't hang. Fixes: web-platform-tests#33186
Proposed a fix on #33230 |
…adline There is a bug on mozprocess ProcessHandler.kill that may cause it to hang when the process to be killed has daemonized childs. To avoid this situation, pass the timeout parameter to ProcessHandler.kill to ensure it always returns within it. So wptrunner can continue executing other tests and the whole test suite doesn't hang. Fixes: #33186
Just checked that after landing ce18772 WebKitGTK runs seems to be working back: https://wpt.fyi/results/?run_id=5665473596227584 |
…er always returns within a deadline, a=testonly Automatic update from web-platform-tests Ensure that the call to kill the WebDriver always returns within a deadline There is a bug on mozprocess ProcessHandler.kill that may cause it to hang when the process to be killed has daemonized childs. To avoid this situation, pass the timeout parameter to ProcessHandler.kill to ensure it always returns within it. So wptrunner can continue executing other tests and the whole test suite doesn't hang. Fixes: web-platform-tests/wpt#33186 -- wpt-commits: ce18772029ded0d458d0cd600f1a58bb0c422d72 wpt-pr: 33230
…er always returns within a deadline, a=testonly Automatic update from web-platform-tests Ensure that the call to kill the WebDriver always returns within a deadline There is a bug on mozprocess ProcessHandler.kill that may cause it to hang when the process to be killed has daemonized childs. To avoid this situation, pass the timeout parameter to ProcessHandler.kill to ensure it always returns within it. So wptrunner can continue executing other tests and the whole test suite doesn't hang. Fixes: web-platform-tests/wpt#33186 -- wpt-commits: ce18772029ded0d458d0cd600f1a58bb0c422d72 wpt-pr: 33230
…er always returns within a deadline, a=testonly Automatic update from web-platform-tests Ensure that the call to kill the WebDriver always returns within a deadline There is a bug on mozprocess ProcessHandler.kill that may cause it to hang when the process to be killed has daemonized childs. To avoid this situation, pass the timeout parameter to ProcessHandler.kill to ensure it always returns within it. So wptrunner can continue executing other tests and the whole test suite doesn't hang. Fixes: web-platform-tests/wpt#33186 -- wpt-commits: ce18772029ded0d458d0cd600f1a58bb0c422d72 wpt-pr: 33230
…er always returns within a deadline, a=testonly Automatic update from web-platform-tests Ensure that the call to kill the WebDriver always returns within a deadline There is a bug on mozprocess ProcessHandler.kill that may cause it to hang when the process to be killed has daemonized childs. To avoid this situation, pass the timeout parameter to ProcessHandler.kill to ensure it always returns within it. So wptrunner can continue executing other tests and the whole test suite doesn't hang. Fixes: web-platform-tests/wpt#33186 -- wpt-commits: ce18772029ded0d458d0cd600f1a58bb0c422d72 wpt-pr: 33230
Thanks for the thorough investigation and work you've put into this @clopez Unfortunately, it seems there might be another (possibly related?) issue as we haven't had WebKitGTK run completed since Mar 30th. |
Thanks for the notice. This new issue has been caused by this change by me on the WebKit tooling: it introduces support for improved WebKitGTK bundles that are consumed by the WPT CI, but it also introduced a few regressions :(
|
I think the issue is fixed now, runs are working back: https://wpt.fyi/runs?label=master&max-count=100&product=webkitgtk |
Thanks a lot @clopez! Your work is much appreciated. |
The last successful experimental run for WebKitGTK is showing as Feb25th.
cc @clopez
The text was updated successfully, but these errors were encountered: