Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebkitGTK runs missing from wpt.fyi since 25th of Feb #33186

Closed
DanielRyanSmith opened this issue Mar 15, 2022 · 8 comments · Fixed by #33230
Closed

WebkitGTK runs missing from wpt.fyi since 25th of Feb #33186

DanielRyanSmith opened this issue Mar 15, 2022 · 8 comments · Fixed by #33230
Labels

Comments

@DanielRyanSmith
Copy link
Contributor

The last successful experimental run for WebKitGTK is showing as Feb25th.

cc @clopez

@clopez
Copy link
Contributor

clopez commented Mar 16, 2022

It seems that the test runs are timing out on TC.

This may be related to #30834

I'm trying to reproduce the issue locally (on the wpt docker) to see what is going on.

@clopez clopez changed the title WebkitGTK builds missing from wpt.fyi WebkitGTK runs missing from wpt.fyi since 25th of Feb Mar 16, 2022
@clopez
Copy link
Contributor

clopez commented Mar 17, 2022

I think I found the problem that causes the hang, and is a tricky one.

It seems the problem is a bug on mozprocess processhandler() which is used to start and manage the WebDriver process.

You can reproduce the issue with this command:

  1. Install the docker image of WPT
    $ docker pull webplatformtests/wpt:0.48

  2. Enter into it
    $ docker run -it ... webplatformtests/wpt:0.48

  3. Execute a test that causes a crash. As of today (with today's nightly this does the trick):

xvfb-run -a python3 ./wpt run \
  -y --no-fail-on-unexpected --no-pause --no-restart-on-unexpected \   
  --log-mach-level=debug --log-mach=- --test-type=testharness \
  --install-fonts --install-browser --channel=nightly \
  webkitgtk_minibrowser /measure-memory/detached.https.window.html

You will see that the tests crashes (WebKit crash) but then the WPT runner hangs forever.

That should not happen, the WPT runner should end in a defined aumount of time.

What is happening is that after the WPT runner detects that the WebKit WebDriver has hanged it tries to kill it and for that it uses mozprocess processhandler.kill() but there is a bug there that causes the kill() call to hang forever. Read this linked bugreport for more details.

From the WPT/WebDriver side this is how the corner case that triggers the bug happens:

  1. mozprocess processhandler() executes WebKitWebDriver
  2. WebKitWebDriver forks and executes MiniBrowser
  3. MiniBrowser forks two times and executes two processes: WebKitWebProcess, WebKitNetworkProcess

Killing (or sending the TERM) signal to WebKitWebDriver don't propagates that signal to the MiniBrowser or to the WebKitProcess and all those child process hold a descriptor to the main stdout.

When a test ends normally WebKitWebDriver is asked to shut down the browser and that makes all those child process to end as expected. But when WebKitWebDriver is crashed (or the MiniBrowser not responding) then WebKitWebDriver is unable to shut-down its childs process.
So when wptrunner ends killing the WebKitWebDriver main process via mozprocess processhandler.kill() the childs remain alive.

Those childs still hold a descriptor to stdout and then a hang happens when processhandler.kill() is blocked waiting for stdout to be closed.

Proposed solution?

Meanwhile the underlying issue on mozprocess is not fixed (and a new release of mozprocess is done and we update to it) we can workaround the issue by passing the timeout paremeter to the processhandler.kill() call to ensure it will always return from that call even if the stdout of the killed process is still not closed.

Passing the timeout paremeter seems a good idea in any case. I can't imagine any case where we want to wait for a process to be killed more than X seconds. Waiting means that wptrunner will be blocked and that the test suite won't continue running. That is bad.

clopez added a commit to clopez/wpt that referenced this issue Mar 17, 2022
…adline

There is a bug on mozprocess ProcessHandler.kill that may cause it to
hang when the process to be killed has daemonized childs.

To avoid this situation, pass the timeout parameter to ProcessHandler.kill
to ensure it always returns within it.
So wptrunner can continue executing other tests and the whole test suite
doesn't hang.

Fixes: web-platform-tests#33186
@clopez
Copy link
Contributor

clopez commented Mar 17, 2022

Proposed a fix on #33230

jgraham pushed a commit that referenced this issue Mar 22, 2022
…adline

There is a bug on mozprocess ProcessHandler.kill that may cause it to
hang when the process to be killed has daemonized childs.

To avoid this situation, pass the timeout parameter to ProcessHandler.kill
to ensure it always returns within it.
So wptrunner can continue executing other tests and the whole test suite
doesn't hang.

Fixes: #33186
@clopez
Copy link
Contributor

clopez commented Mar 24, 2022

Just checked that after landing ce18772 WebKitGTK runs seems to be working back: https://wpt.fyi/results/?run_id=5665473596227584
So this issue seems fixed

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Mar 26, 2022
…er always returns within a deadline, a=testonly

Automatic update from web-platform-tests
Ensure that the call to kill the WebDriver always returns within a deadline

There is a bug on mozprocess ProcessHandler.kill that may cause it to
hang when the process to be killed has daemonized childs.

To avoid this situation, pass the timeout parameter to ProcessHandler.kill
to ensure it always returns within it.
So wptrunner can continue executing other tests and the whole test suite
doesn't hang.

Fixes: web-platform-tests/wpt#33186

--

wpt-commits: ce18772029ded0d458d0cd600f1a58bb0c422d72
wpt-pr: 33230
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Mar 27, 2022
…er always returns within a deadline, a=testonly

Automatic update from web-platform-tests
Ensure that the call to kill the WebDriver always returns within a deadline

There is a bug on mozprocess ProcessHandler.kill that may cause it to
hang when the process to be killed has daemonized childs.

To avoid this situation, pass the timeout parameter to ProcessHandler.kill
to ensure it always returns within it.
So wptrunner can continue executing other tests and the whole test suite
doesn't hang.

Fixes: web-platform-tests/wpt#33186

--

wpt-commits: ce18772029ded0d458d0cd600f1a58bb0c422d72
wpt-pr: 33230
aosmond pushed a commit to aosmond/gecko that referenced this issue Mar 28, 2022
…er always returns within a deadline, a=testonly

Automatic update from web-platform-tests
Ensure that the call to kill the WebDriver always returns within a deadline

There is a bug on mozprocess ProcessHandler.kill that may cause it to
hang when the process to be killed has daemonized childs.

To avoid this situation, pass the timeout parameter to ProcessHandler.kill
to ensure it always returns within it.
So wptrunner can continue executing other tests and the whole test suite
doesn't hang.

Fixes: web-platform-tests/wpt#33186

--

wpt-commits: ce18772029ded0d458d0cd600f1a58bb0c422d72
wpt-pr: 33230
aosmond pushed a commit to aosmond/gecko that referenced this issue Mar 28, 2022
…er always returns within a deadline, a=testonly

Automatic update from web-platform-tests
Ensure that the call to kill the WebDriver always returns within a deadline

There is a bug on mozprocess ProcessHandler.kill that may cause it to
hang when the process to be killed has daemonized childs.

To avoid this situation, pass the timeout parameter to ProcessHandler.kill
to ensure it always returns within it.
So wptrunner can continue executing other tests and the whole test suite
doesn't hang.

Fixes: web-platform-tests/wpt#33186

--

wpt-commits: ce18772029ded0d458d0cd600f1a58bb0c422d72
wpt-pr: 33230
@DanielRyanSmith
Copy link
Contributor Author

DanielRyanSmith commented Apr 6, 2022

Thanks for the thorough investigation and work you've put into this @clopez

Unfortunately, it seems there might be another (possibly related?) issue as we haven't had WebKitGTK run completed since Mar 30th.
Edit: It seems we did have a successful run appear on April 7th

@clopez
Copy link
Contributor

clopez commented Apr 18, 2022

Thanks for the notice.

This new issue has been caused by this change by me on the WebKit tooling: it introduces support for improved WebKitGTK bundles that are consumed by the WPT CI, but it also introduced a few regressions :(

@clopez
Copy link
Contributor

clopez commented Apr 25, 2022

I think the issue is fixed now, runs are working back: https://wpt.fyi/runs?label=master&max-count=100&product=webkitgtk

@DanielRyanSmith
Copy link
Contributor Author

Thanks a lot @clopez! Your work is much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants