Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

screen frozen linux6.6 and 6.7 #48473

Open
levdopa opened this issue Feb 1, 2024 · 11 comments
Open

screen frozen linux6.6 and 6.7 #48473

levdopa opened this issue Feb 1, 2024 · 11 comments
Labels
bug Something isn't working needs-testing Testing a PR or reproducing an issue needed

Comments

@levdopa
Copy link

levdopa commented Feb 1, 2024

Is this a new report?

Yes

System Info

Void 6.6.11_1 x86_64 musl

Package(s) Affected

linux6.6 linux6.7

Does a report exist for this bug with the project's home (upstream) and/or another distro?

No response

Expected behaviour

I turn on my computer (alienware m15 r4 i7-10870H rtx3060)
void boots like normal

Actual behaviour

IMG_8873

I am permanently stuck on this screen after grub.

Booting from 6.5.13 works normal.

Booting from 6.7.2 is broken.

Steps to reproduce

Install void using either chroot or void-installer (I did both)
sign in as root
xbps-install -Suy xbps
xbps-install -Suy
reboot

and I am now stuck. (Literally no other commands, just those)

@levdopa levdopa added bug Something isn't working needs-testing Testing a PR or reproducing an issue needed labels Feb 1, 2024
@abenson
Copy link
Contributor

abenson commented Feb 1, 2024

What troubleshooting steps have you taken? From the description, it sounds like you're using nouveau?

What happens if you boot with nomodeset=1?

@levdopa
Copy link
Author

levdopa commented Feb 1, 2024

The problem happens both when nouveau is installed or not installed.

I tried blacklisting nouveau in /etc/modprobe.d/blacklist.conf
it is still frozen

I tried booting with nomodset=1 with nouveau both installed and uninstalled
both are still frozen

@classabbyamp
Copy link
Member

nouveau can't be not installed, it's a kernel module

@levdopa
Copy link
Author

levdopa commented Feb 1, 2024

I mean xf86-video-nouveau, I just shortened it to nouveau

@nau5ea
Copy link

nau5ea commented Feb 6, 2024

I experienced this issue on my Core 2 Duo machine with no GPU on linux6.6.11 as well

@loukamb
Copy link

loukamb commented Feb 6, 2024

I had this problem a month-ish ago and it was due to nouveau. Blacklisting the module correctly prior to a system upgrade completely fixes the problem, but you will have to use proprietary drivers as a replacement until this is fixed (unless you don't care about not having proper drivers). You can see the relevant discussion here: https://old.reddit.com/r/voidlinux/comments/18w0mq9/upgrade_to_kernel_668_hangs_the_os_at_boot/

I was able to solve this by booting from the latest ISO, installing the system from local packages, blacklisting nouveau through dracut and the other initramfs mechanism (you can find instructions for both from the handbook), then performing a full system upgrade. The upgrade automatically rebuilds the ramfs configuration, so there's no need to do anything manually. That worked across multiple installations as well. As for why blacklisting didn't work for you earlier, make sure to blacklist before installing the new kernel and doing a system upgrade, or otherwise you will have to reconfigure ramfs manually.

@Sapein
Copy link

Sapein commented Feb 9, 2024

I'm having a similar -- if not the same -- issue on my install as well.

This issue does not occur on Kernels 6.5.5_2, 6.5.12_1, and 6.1.29_1. It does occur on the 6.6.8, 6.6.11, and 6.6.16. At least from my testing.

doing nomodeset=1 does allow the system to boot, but it breaks anything graphical, treating my two monitors as one and not allowing me to actually set a resolution with X, and sway still refuses to start (where-as on the 6.5 kernels it does start. It also does not start on the 6.1.29 kernel but with a different error).

My system information is as follows:

  • Void Linux x86_64 glibc
  • AMD Ryzen 7 5800X
  • Nvidia GeForce RTX 3060

I included the CPU as it does not include integrated graphics, IIRC.

Using the Proprietary Nvidia drivers does work, but I ran into the issue because sway won't start with them it seems, so I wanted to switch to the Nouveau ones to try out sway.

@Brixy
Copy link

Brixy commented Mar 9, 2024

Hi guys,

Experienced the same issue.

I ran the live image, mounted my partitions and installed an older kernel series (6.5)

Then ran

xbps-reconfigure -fa
update-grub

This solved the problem by chance. Maybe the kernel installation did some clever trick?! Anyway, void now also boots with new kernels; tested with 6.6.21.

@thomasxg
Copy link

I have the same issue when installing/running Void, so I'm stuck on the 6.5.13 kernel for now (NVIDIA GTX 1060 3G).

I did manage to login to the "frozen" machine via ssh and take a look at the log. I'll post the trace below from booting the 6.8.1 kernel, I get the same error from the 6.6.22 kernel:

[    2.231417] ------------[ cut here ]------------
[    2.231418] kernel BUG at include/linux/scatterlist.h:187!
[    2.231422] fbcon: Taking over console
[    2.231429] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[    2.231432] CPU: 3 PID: 333 Comm: systemd-udevd Not tainted 6.8.1_1 #1
[    2.231436] Hardware name: System manufacturer System Product Name/H170I-PRO, BIOS 3805 05/16/2018
[    2.231440] RIP: 0010:sg_init_one+0x77/0x80
[    2.231446] Code: 00 01 83 e1 03 a8 03 75 23 83 e2 01 75 20 48 09 c8 41 89 6c 24 08 49 89 04 24 41 89 5c 24 0c 5b 5d 41 5c 41 5d c3 cc cc cc cc <0f> 0b 0f 0b 0f 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90
[    2.231452] RSP: 0018:ffffb221804c78d0 EFLAGS: 00010246
[    2.231456] RAX: 0000000000000000 RBX: 0000000000005000 RCX: 0000000000000027
[    2.231459] RDX: 0000000000000036 RSI: 0000000000000000 RDI: ffffb22200599000
[    2.231462] RBP: 0000000000005000 R08: 0000000000000000 R09: 0000000000000000
[    2.231466] R10: ffff96bf02610058 R11: 0000000000000000 R12: ffff96bf02610058
[    2.231469] R13: ffffb22180599000 R14: ffffb22180265000 R15: ffffb22180265100
[    2.231472] FS:  00007fa35e14e740(0000) GS:ffff96c226d80000(0000) knlGS:0000000000000000
[    2.231476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.231479] CR2: 00007ffe8930eff8 CR3: 0000000100d5e005 CR4: 00000000003706f0
[    2.231482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    2.231485] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    2.231488] Call Trace:
[    2.231491]  <TASK>
[    2.231493]  ? die+0x36/0x90
[    2.231498]  ? do_trap+0xda/0x100
[    2.231501]  ? sg_init_one+0x77/0x80
[    2.231505]  ? do_error_trap+0x6a/0x90
[    2.231508]  ? sg_init_one+0x77/0x80
[    2.231511]  ? exc_invalid_op+0x50/0x70
[    2.231515]  ? sg_init_one+0x77/0x80
[    2.231518]  ? asm_exc_invalid_op+0x1a/0x20
[    2.231524]  ? sg_init_one+0x77/0x80
[    2.231529]  nvkm_firmware_ctor+0x1fd/0x260 [nouveau]
[    2.231657]  nvkm_falcon_fw_ctor_hs+0x113/0x360 [nouveau]
[    2.231768]  gm200_acr_hsfw_ctor+0xce/0xf0 [nouveau]
[    2.231878]  gp102_acr_load+0x206/0x370 [nouveau]
[    2.231989]  nvkm_acr_new_+0x208/0x2f0 [nouveau]
[    2.232098]  nvkm_device_ctor+0xd74/0x4610 [nouveau]
[    2.232239]  nvkm_device_pci_new+0x101/0x2c0 [nouveau]
[    2.232379]  nouveau_drm_probe+0xd5/0x280 [nouveau]
[    2.232513]  ? _raw_spin_unlock_irqrestore+0x27/0x50
[    2.232518]  local_pci_probe+0x42/0xa0
[    2.232522]  pci_device_probe+0xc1/0x220
[    2.232527]  really_probe+0x19b/0x3e0
[    2.232532]  ? __pfx___driver_attach+0x10/0x10
[    2.232536]  __driver_probe_device+0x78/0x160
[    2.232540]  driver_probe_device+0x1f/0x90
[    2.232545]  __driver_attach+0xd2/0x1c0
[    2.232549]  bus_for_each_dev+0x85/0xd0
[    2.232553]  bus_add_driver+0x116/0x220
[    2.232557]  driver_register+0x59/0x100
[    2.232561]  ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
[    2.232662]  do_one_initcall+0x58/0x320
[    2.232668]  do_init_module+0x60/0x240
[    2.232672]  __do_sys_init_module+0x17f/0x1b0
[    2.232677]  do_syscall_64+0x88/0x180
[    2.232681]  ? fpregs_assert_state_consistent+0x26/0x50
[    2.232687]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[    2.232691] RIP: 0033:0x7fa35e365c9a
[    2.232695] Code: 48 8b 0d 91 21 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5e 21 0d 00 f7 d8 64 89 01 48
[    2.232701] RSP: 002b:00007ffe89323e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[    2.232706] RAX: ffffffffffffffda RBX: 00007fa35ce00010 RCX: 00007fa35e365c9a
[    2.232709] RDX: 00007fa35e45aafd RSI: 0000000000730769 RDI: 00007fa35ce00010
[    2.232712] RBP: 0000555f37eb4570 R08: 0000000000007b80 R09: 0000000000000000
[    2.232715] R10: 00007fa35e438b20 R11: 0000000000000246 R12: 00007fa35e45aafd
[    2.232718] R13: 0000000000020000 R14: 0000555f37eaab00 R15: 0000000000000001
[    2.232723]  </TASK>
[    2.232725] Modules linked in: sd_mod nouveau(+) drm_gpuvm drm_exec gpu_sched i2c_algo_bit drm_display_helper cec ahci crct10dif_pclmul libahci rc_core crc32_pclmul polyval_clmulni xhci_pci polyval_generic libata drm_kms_helper gf128mul ghash_clmulni_intel xhci_pci_renesas sha512_ssse3 drm_ttm_helper sha256_ssse3 ttm sha1_ssse3 mxm_wmi aesni_intel agpgart xhci_hcd crypto_simd drm scsi_mod cryptd usbcore usb_common scsi_common video wmi button dm_mirror dm_region_hash dm_log dm_mod btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic crc32c_intel
[    2.232773] ---[ end trace 0000000000000000 ]---

Perhaps someone else can confirm the same error on their machine, it seems to be something related to firmware loading?

6.8.1.log

@loukamb
Copy link

loukamb commented Mar 23, 2024

@thomasxg I no longer have it but when I faced this issue months ago it had a very similar trace to what you just posted. The issue is definitely GPU-related, and doesn't happen on proprietary drivers, only nouveau.

With the release of the new live image on March 14 however the problem has become worse and could impact adoption. With the previous live image, you could boot into live Void without any issues, install the OS to disk using local packages, reboot then blacklist nouveau right before performing an update. That's what I did to get Void working on my Nvidia machine (2080 Ti). However, the new live image rolls in the regression, meaning you cannot even reach a terminal after GRUB without explicitly blacklisting the nouveau driver and modeset through kernel flags when launching the live image. I had to do this recently when I reinstalled Void, which is practically simple but difficult to figure out when you don't even have logs or any output telling you what's erroring.

@nezos
Copy link
Contributor

nezos commented Apr 16, 2024

Had the same problem hanging at "loading initial ramdisk".

My solution was to install the mainline kernel and the nvidia non-free driver. When I installed only the mainline the problem went down to nouveau, so the nvidia driver fixed it completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-testing Testing a PR or reproducing an issue needed
Projects
None yet
Development

No branches or pull requests

9 participants