Unexpected Linux kernel version upgrade when install cuda driver in GPU VMs

**Describe the bug**
We have an AKS 1.32.3 cluster with Azure Linux V3, all GPU nodes nvidia-device-plugin are crashloop backoff after node reboot, and we noticed GPU node OS kernel version was upgrade from 6.6.78.1-3.azl3 to 6.6.82.1-1.azl3 after reboot.
From AKS node /var/log/azure/cluster-provision.log (see as Screenshots), it shows that install cuda-0:560.35.03-1_6.6.78.1.3.azl3.x86_64 upgraded kernel version to 6.6.82.1-1.azl3 unexpectlly.
The cuda version is not compatible with the kernel version 6.6.82 which caused GPU drive cannot be loaded, and then nvidia-device-plugin pods are crash.

**To Reproduce**
Steps to reproduce the behavior:
1. Create AKS cluster 1.32 with AZLinux 3.0, and add a gpu nodepool
2. Reboot the node in the gpu nodepool

**Expected behavior**
keep the linux kernel version in 6.6.78.1.3

**Screenshots**

```
+ dnf install -y cuda-0:560.35.03-1_6.6.78.1.3.azl3.x86_64
Last metadata expiration check: 0:00:01 ago on Tue Apr 15 07:01:25 2025.
Dependencies resolved.
================================================================================================
 Package                   Arch    Version                     Repository                   Size
================================================================================================
Installing:
 cuda                      x86_64  560.35.03-1_6.6.78.1.3.azl3 azurelinux-official-nvidia  246 M
 kernel                    x86_64  6.6.82.1-1.azl3             azurelinux-official-base     41 M
Installing dependencies:
 kernel-drivers-gpu        x86_64  6.6.78.1-3.azl3             azurelinux-official-base    3.0 M
 libunwind                 x86_64  1.6.2-2.azl3                azurelinux-official-base     75 k
 mlnx-ofa_kernel           x86_64  24.10-13.azl3               azurelinux-official-base     44 k
 mlnx-ofa_kernel-modules   x86_64  24.10-13.azl3               azurelinux-official-base    1.6 M
 mlnx-tools                x86_64  24.10-1.azl3                azurelinux-official-base     80 k
 ofed-scripts              x86_64  24.10-1.azl3                azurelinux-official-base     71 k
 pciutils                  x86_64  3.11.1-1.azl3               azurelinux-official-base    449 k
 pciutils-libs             x86_64  3.11.1-1.azl3               azurelinux-official-base     57 k

Transaction Summary
================================================================================================
Install  10 Packages

Total download size: 293 M
Installed size: 792 M
Downloading Packages:
(1/10): kernel-drivers-gpu-6.6.78.1-3.azl3.x86_  11 MB/s | 3.0 MB     00:00    
(2/10): libunwind-1.6.2-2.azl3.x86_64.rpm       1.7 MB/s |  75 kB     00:00    
(3/10): mlnx-ofa_kernel-24.10-13.azl3.x86_64.rp 2.1 MB/s |  44 kB     00:00    
(4/10): mlnx-ofa_kernel-modules-24.10-13.azl3.x  36 MB/s | 1.6 MB     00:00    
(5/10): mlnx-tools-24.10-1.azl3.x86_64.rpm      2.3 MB/s |  80 kB     00:00    
(6/10): ofed-scripts-24.10-1.azl3.x86_64.rpm    2.0 MB/s |  71 kB     00:00    
(7/10): pciutils-3.11.1-1.azl3.x86_64.rpm        12 MB/s | 449 kB     00:00    
(8/10): pciutils-libs-3.11.1-1.azl3.x86_64.rpm  2.7 MB/s |  57 kB     00:00    
(9/10): kernel-6.6.82.1-1.azl3.x86_64.rpm        23 MB/s |  41 MB     00:01    
(10/10): cuda-560.35.03-1_6.6.78.1.3.azl3.x86_6  65 MB/s | 246 MB     00:03    
--------------------------------------------------------------------------------
Total                                            78 MB/s | 293 MB     00:03    
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                        1/1
  Installing       : ofed-scripts-24.10-1.azl3.x86_64                      1/10
  Running scriptlet: ofed-scripts-24.10-1.azl3.x86_64                      1/10
  Installing       : mlnx-tools-24.10-1.azl3.x86_64                        2/10
  Installing       : libunwind-1.6.2-2.azl3.x86_64                         3/10
  Installing       : kernel-6.6.82.1-1.azl3.x86_64                         4/10
  Running scriptlet: kernel-6.6.82.1-1.azl3.x86_64                         4/10
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.6.82.1-1.azl3
fgrep: warning: fgrep is obsolescent; using grep -F
Found linux image: /boot/vmlinuz-6.6.78.1-3.azl3
Found initrd image: /boot/initramfs-6.6.78.1-3.azl3.img
fgrep: warning: fgrep is obsolescent; using grep -F
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected Linux kernel version upgrade when install cuda driver in GPU VMs #13433

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected Linux kernel version upgrade when install cuda driver in GPU VMs #13433

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions