Skip to content

Protocol error with tooltips in menus (race condition with GTK internals) #207

@dkondor

Description

@dkondor

Hi,

I've been trying to track down an issue when GTK tries to map a tooltip whose parent has already been unmapped. Note: originally, I've noticed this in Cairo-Dock (see e.g. here for a workaround that I've added, but now my understanding is that it only makes the problem less likely). Finally, I've found a way to reproduce this more reliably in a simple test case based on gtk-layer-demo.

To reproduce:

  1. Apply this commit: dkondor@31c2c33
  2. Run gtk-layer-demo
  3. Open the popup menu and navigate to the innermost nested menu
  4. Point the mouse on "Final item" and wait until the tooltip appears
  5. Click on the item while moving the mouse slightly (i.e. (1) press mouse button; (2) move mouse a bit; (3) release the mouse button -- it needs to be fast enough so that the tooltip does not reappear)
  6. Crash (Wayland error) with the following output:
** (gtk-layer-demo:127987): CRITICAL **: 17:30:09.777: xdg_popup_surface_get_popup () called when the xdg surface wayland object has not yet been created
** (gtk-layer-demo:127987): CRITICAL **: 17:30:09.777: xdg_popup_surface_map: assertion 'self->xdg_popup' failed

Note: the above error comes from GTK trying to show the tooltip again.

What I think happens here (note: I haven't actually debugged GTK, but this is my informed guess after looking at the gtktooltip.c source code):

  1. When the mouse button is pressed, GTK hides the tooltip (likely here).
  2. When the mouse moves (with the button still pressed), GTK starts a timer to re-show the tooltip after the appropriate delay again (likely here, delay is 60ms (BROWSE_TIMEOUT))
  3. When the button is released, GTK tears down the whole nested popup hierarchy of the menus and emits the "activate" signal. This means that the corresponding xdg_popup objects are destroyed.
  4. After the delay expired, GTK decides to show the tooltip again (likely here) triggering the critical warnings above and the error (since gtk-layer-shell does not set a parent)

Notes:

I cannot reproduce the issue without layer shell (--no-layer-shell option). If I remove the usleep() call I've added on line 29, it still happens, but very rarely (maybe 1 out of 20 times, I guess when I press the mouse button for exactly the right time).

In gtk_tooltip_show_tooltip() there is a check whether the parent surface is still alive here -- at least I believe that tooltip->last_window stores a reference to the GdkWindow the tooltip was made transient for; it is actually a weak reference that should be set to NULL here when the corresponding surface is destroyed. Based on this, I think what is happening is:

  • when running without layer-shell, tooltip->last_window gets destroyed earlier, so it is always reliably NULL when the timer fires
  • even with layer shell, this happens, but not in a deterministic way (at least it does not happen when unmapping the surfaces or anytime before the "activate" signal); the key seems to be adding a long enough delay to the "activate" handler, which makes the tooltip delay timer fire earlier than any logic that finishes destroying the parent's GdkWindow

I'm attaching a log with wayland-debug -- it starts from the mouse button press to open the menu, and to keep it short, I've filtered out all wl_pointer.motion and wl_pointer.frame events. The button press that triggers the error happens on line 484 (note: between here and the corresponding release even on line 495, there would be at least one wl_pointer.motion event).

layer-demo1.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions