Skip to content

Pin nvidia driver version #691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Pin nvidia driver version #691

wants to merge 5 commits into from

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented May 28, 2025

TODO

@sjpb
Copy link
Collaborator Author

sjpb commented May 29, 2025

Checked this is idempotent.

@sjpb sjpb force-pushed the fix/cuda-12.8 branch from f82ea35 to c514556 Compare May 29, 2025 08:24
ansible.builtin.shell:
cmd: >-
dnf module info nvidia-driver:{{ cuda_nvidia_driver_stream }} |
grep -F {{ cuda_nvidia_driver_version }}.el{{ ansible_distribution_major_version }}.{{ ansible_architecture }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some packages don't have el9 suffix (distro independent) so does that mean this isn't a complete list?

Copy link
Collaborator Author

@sjpb sjpb May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah like nvidia-imex-570-0:570.124.06-1.x86_64

So dnf module info nvidia-driver:570-open only appears to show packages ending .el9.x86_64, el9.noarch and .x86_64. So maybe its enough to just suffix the version with a . when grepping - I was trying to avoid the case where you are after e.g. 570-0:570.124.06-1 and just grepping for that also gets you 570-0:570.124.06-10.

edit: I'd missed the fact its an el9 repo, so I think this should be ok

What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoying there are a few i686 packages too (which I guess we don't need)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good spot, I'd missed those 🤦

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So might need to filter for x86_64 and noarch on top of that you suggested? 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can just | unique the list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh no grep just version+ . obvs returns the entire package name

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or the other option is to just install the i686 packages too (not sure how much bloat that adds)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, even with this we get:

Depsolve Error occurred: 
 Problem 1: conflicting requests
  - nothing provides cuda-drivers-570 = 570.133.20 needed by cuda-drivers-fabricmanager-570-570.133.20-1.x86_64 from cuda-rhel9-x86_64
 Problem 2: package cuda-drivers-fabricmanager-570.133.20-1.x86_64 from cuda-rhel9-x86_64 requires cuda-drivers-fabricmanager-570 = 570.133.20, but none of the providers can be installed
  - conflicting requests
  - nothing provides cuda-drivers-570 = 570.133.20 needed by cuda-drivers-fabricmanager-570-570.133.20-1.x86_64 from cuda-rhel9-x86_64

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stange, I do note that one isn't installed with the standard dnf module install nvidia-driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants