Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.2 EGL fixes backports #24018

Merged
merged 7 commits into from Nov 6, 2023
Merged

3.2 EGL fixes backports #24018

merged 7 commits into from Nov 6, 2023

Conversation

vadz
Copy link
Contributor

@vadz vadz commented Oct 29, 2023

This collects all the changes done in master and backports them to 3.2.

cc @joanbm

vadz and others added 7 commits October 30, 2023 00:49
No real changes, just clean up the code a bit to make sure we get a
-Wswitch if any elements are added to wxDisplayType enum in the future.

(cherry picked from commit d128afe)
Don't use m_readyToDraw in it, as the canvas is still shown even when
it's not ready to draw yet, and use m_wlSubsurface instead as it is
destroyed when the window is unmapped.

See wxWidgets#23899.

(cherry picked from commit cf44a87)
This is useful when there is more than one window using EGL.

(cherry picked from commit 547fb15)
Calling it from the frame callback might change the swap interval for
the wrong context if the active context got changed in the meanwhile,
but by the time SwapBuffers() is called the correct context must be set.

Doing it there also allows to unify X11 and Wayland code branches.

See wxWidgets#24014.

(cherry picked from commit 64afa9d)
If the wxGLCanvas is destroyed immediately (without hiding it first),
the GTKs widget's `unmap` signal which usually destroys the Wayland
resources is not emitted. Thus, we need to ensure they are destroyed
on the destructor instead.

This fixes an use-after-free issue, sometimes causing a crash, because
one of the leaked resources is the canvas's Wayland frame callback.

See wxWidgets#24013, wxWidgets#24016.

(cherry picked from commit 22ae7a5)
In the recent changes for handling map/unmap events on Wayland's
wxGLCanvasEGL, we use the following GTK widget signals:
* The "map-event" signal to create the canvas's subsurface.
  The earlier "map" signal can not be used, as the associated toplevel
  window's Wayland surface may not yet exist by the time we receive it.
* The "unmap" signal to destroy the canvas's subsurface.
  Using the later "unmap-event" signal is problematic, due to other
  resources we build upon being already destroyed by then.

Usually, the "map-event" signal comes before "unmap" and resources are
created and destroyed appropriately. However, there's an edge case:
If a canvas is shown and then immediately hidden (before wxWidgets can
pump from the event loop), "unmap" will come before "map-event".
This happens because signals like "map" and "unmap" are delivered
immediately (when calling e.g. `gtk_widget_hide`), while signals like
"map-event" and "unmap-event" are delivered later on the event loop.

For the same reason, showing a canvas, then immediately hiding it, then
immediately showing it again, will cause two "map-event"s to get
delivered enqueued without a "unmap" in between.

This condition can be hit quite easily when setting up a complex UIs,
and in particular it is triggered by Aegisub during startup, leading to
a crash (Wayland protocol error) when opening a video later, or when
specifying a video directly on the startup command line.

To avoid this breaking our resource management, add some checks to detect
those "map-event"s we shouldn't handle - either the ones that happen
after "unmap", or the duplicate ones without an "unmap" in between.

See wxWidgets#23961, wxWidgets#23968.

(cherry picked from commit 133f773)
This reverts the frame synchronization logic that was recently added
with the purpose of avoiding performance issues due to `eglSwapBuffers`
blocking when the canvas is hidden or occluded.

This logic should be unnecessary after `eglSwapInterval(display, 0)`
is called, since `eglSwapBuffers` should never block anymore.
Furthermore, as it stands now, it causes the canvas to continuously
repaint itself at the refresh rate of the display, which is wasteful
for applications which do not need to continuously refresh.

See wxWidgets#24012, wxWidgets#24017.

(cherry picked from commit abfdd18)
@Manolo-ES
Copy link
Contributor

After re-reading those previous threads, I still stand with not calling SwapInterval.

When you call it for a hidden or obscured window it is a workaround for Wayland and its 1 Hz weird decision (and not many GPU vendors agree; that's why such MR took so long to merge). It seems to me that WL's team see all us programmers as stupid, we must be corrected by them.

OK, drawing only at vertical blank seems the right thing to do, just because we can be saved from tearing and faster frames will not arrive to the display any how. OTOH the deactivation of vsync by SwapInterval(...,0) is used, for example, to benchmark the shaders. Or to get frames into some memory, even they are not shown. Or special tight rendering. In these cases SwapBuffers is called, and should not be dismissed.

But, if you deactivate vsync for a not-shown window, why not activate again at show-event?
And, what value do you "recover"? Perhaps the user set 1 (vsync activated), or perhaps not.
MSW (WGL) and GLX (and Apple?) have extensions (search for "swap control" at Khronos registry) to query the current value of SwapInterval. Some vendors provide "swap control tear" extension to allow a negative value (the driver takes control of the interval). Unfortunately, EGL has not a function nor an extension to query this value.

The thing is that playing with SwapInterval may well go against those users that do know what they're doing. And if it's the GPU driver who controls it, our call has no effect.

At least, WL team now provides "surface suspended/resumed" events. That's what the SDL_commit that Vadim found handles.
IMHO that's not enough, because the event may arrive too late, when the app has already called SwapBufferes, stalling the CPU during a whole second because of the 1Hz thing. But less is nothing.

The IsShownOnScreen() function for GTK relies on gtk_widget_get_child_visible which GTK docs say "This function does not check if the widget is obscured in any way."

What are the alternatives? Looking into Mesa code perhaps we can query some member of the surface struct or from the EGLDisplay.

I hope WL team to get back from this madness. In the while, I propose to tell wx users to take this issue in consideration, and let they play with SwapInterval instead of wx.

Lastly, I still think that it would be a good having bool wxGLCanvas::[Set][Get]SwapInterval(int& val).

@vadz
Copy link
Contributor Author

vadz commented Oct 30, 2023

I don't think there is a realistic probability of Wayland developers changing their mind about this and in any case we don't want to fatally break all the existing wxWidgets applications using wxGLCanvas until they do. I.e. today it seems painfully obvious to me that we must set the swap interval to 0 ourselves.

Of course, it would be better if we could reliably detect if the window is occluded and only do it in this case and I'd welcome any PRs doing this. And I don't see anything wrong with providing the possibility to set it from the application code (although I don't see any advantage in wrapping it in a wx function when you can already just call eglSwapInterval() directly). But this doesn't change for the situation right now, sorry.

@joanbm
Copy link
Contributor

joanbm commented Oct 30, 2023

@Manolo-ES As I see it, yes, eglSwapInterval(display, 0) is a hack, and we'd rather not do it, but it's a hack that allows us to fix existing applications that can't cope with eglSwapBuffers blocking for a long time, which is something that didn't use to happen.

And it's something that is reasonably simple to implement and offers a solution for the applications people want to use, now.

So far, I haven't seen any proposed solution to avoid those long eglSwapBuffers blocks that works everywhere eglSwapInterval(display, 0) does, is reasonably simple to implement, does not have tricky edge cases and is not subject to race conditions.

For example, the SDL thing you link, appears to depend on a version of the XDG Shell protocol that many Wayland compositors do not yet support. And it's not evident to me that the concept of a surface being "suspended" perfectly matches the situations under which eglSwapBuffers blocks for a long time. There's still the race condition where maybe the surface is suspended the instant before you call eglSwapBuffers. And even if it works, it's still a lot of work to understand, implement and test the entire thing.

@joanbm
Copy link
Contributor

joanbm commented Nov 5, 2023

I have run some tests on both GNOME Shell and Sway with KiCad / SLADE / Aegisub plus the various test applications / reproducers I have gathered from the various issues and haven't been able to reproduce any of the bugs/regressions, so LGTM.

@vadz vadz merged commit a87c857 into wxWidgets:3.2 Nov 6, 2023
27 checks passed
@vadz vadz deleted the 3.2-egl-fixes branch November 6, 2023 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants