Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia display freezing issue since F40 update #1202

Open
EccTM opened this issue Apr 25, 2024 · 9 comments
Open

Nvidia display freezing issue since F40 update #1202

EccTM opened this issue Apr 25, 2024 · 9 comments

Comments

@EccTM
Copy link

EccTM commented Apr 25, 2024

Describe the bug

Since the latest tag moved to bluefin-dx-nvidia:40, this issue has occurred maybe 4 times, and I never seen it when on bluefin-dx-nvidia:39. I'm not sure if this is actually a Bluefin issue, an upstream issue, or simply an aging GPU.

Output to my left monitor will just randomly hang on a frame and stop updating.

Tried turning the monitor physically off and on, and also tried going into GNOME settings and switching the screen off and on under the Display section - but both attempts just locked up the right monitor too, and lead to a hard reboot.

What did you expect to happen?

My screen(s) to not become unimpressive photo frames at random.

Output of rpm-ostree status

ecctm@athena ~ ❯ rpm-ostree status

State: idle
AutomaticUpdates: stage; rpm-ostreed-automatic.timer: no runs since boot
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:latest
                   Digest: sha256:54804a33df44857592b9639a2da8c6d3a9553bb96d78fa7ff6ea21677efecca5
                  Version: 40.20240423.0 (2024-04-24T14:54:46Z)
          LayeredPackages: ckb-next

  ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:latest
                   Digest: sha256:98c22f15e6a6034fecc5620204b40b8757fcb6b47da4866e6274e7e059cf6b05
                  Version: 40.20240423.0 (2024-04-23T16:58:18Z)
          LayeredPackages: ckb-next

Extra information or context

GTX 1080, Wayland
Flatpaks of Firefox and mpv open at time of crash usually.

I've skimmed the last hour or so of my journalctl, and the freezing happening around the same time (23:49~) gnome-shell complains about a null pointer issue. TBH, I don't know what I'm looking at in this log at all though.

last hour of journalctl

@m2Giles
Copy link
Member

m2Giles commented Apr 27, 2024

@EccTM
Copy link
Author

EccTM commented Apr 27, 2024

I'm not setting nvidia-drm.fbdev=1 anywhere or seeing any output in my journalctl about Flip event timeout on head 0, but it could easily be related though.

I'll capture a full journalctl and dmesg next time it happens and dig through them for any signs of this.

@EccTM
Copy link
Author

EccTM commented Apr 27, 2024

After looking into that link, I do have fbdev=1 set - /sys/module/nvidia_drm/parameters/fbdev contains 'Y' - apparently the Fedora package enables it by default.

I'm going to add nvidia-drm.fbdev=0 to /etc/default/grub and regenerate grub, then see if I still have the issue occur.

@citrixscu
Copy link
Contributor

Ah so this is the same error I've been getting when powering on an AVR that is connected to the HDMI port of my 3070 Ti. The system locks up and will only unfreeze if I turn off the AVR.

kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Flip event timeout on head 1 over and over in the log.

@EccTM
Copy link
Author

EccTM commented Apr 29, 2024

I haven't seen the issue since I used sudo rpm-ostree kargs --editor and added nvidia-drm.fbdev=0 as a kernel arg, which seems to be the temporary solution for that issue.

ujust regenerate-grub didn't use the arguments from /etc/default/grub when I tried adding it there.

I still haven't ruled this out as solved, because I never faced this issue prior to updating to F40 - my prior Arch Linux install had fbdev=1 explicitly enabled without issue, and bluefin-dx-nvidia:39 would've defaulted to fbdev=1 as well.

@citrixscu
Copy link
Contributor

citrixscu commented Apr 29, 2024

Will give this a shot and report back. I don't think adding the args to grub will do anything, and the arg should be added as you mentioned with rpm-ostree kargs

Added nvidia-drm.fbdev=0 kargs and it seems to be working. cat /sys/module/nvidia_drm/parameters/fbdev returns N.

@m2Giles
Copy link
Member

m2Giles commented May 1, 2024

I don't have the nvidia-drm.fbdev=0 set on my desktop. But I had a similar issue when on 39 and prior to us doing early loading of the nvidia drivers. After doing early loading I haven't seen the issue. However, since it didn't happen consistently (normally would only notice if gdm failed) I couldn't pin it down to anything but fbdev being flaky.

On 40 now and I haven't seen issue. But I think this is part of nvidia actually having a bit of difference between different generations and cards

@EccTM
Copy link
Author

EccTM commented May 24, 2024

Any word on where the Nvidia 555 beta driver sits in the fedora/bluefin packaging pipeline?
I'd like to re-test nvidia-drm.fbdev=1 with the new driver and see if it still causes issues.

@p5
Copy link
Member

p5 commented May 24, 2024

Any word on where the Nvidia 555 beta driver sits in the fedora/bluefin packaging pipeline? I'd like to re-test nvidia-drm.fbdev=1 with the new driver and see if it still causes issues.

We will be shipping Nvidia 555 once the stable driver has been released and is being packaged by RPMFusion. We believe it's too risky to ship a beta driver since Bluefin is meant to be a reliable desktop experience, and there have already been complaints about the desktop experience using the new driver from users on other distros.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants