Skip to content

Minikube start fails for nvidia gpus in compute only/ headless mode #20934

Open
@srampal

Description

@srampal

What Happened?

I have a RHEL 9.4 VM with NVidia driver 5.70-open installed, container toolkit installed and two nvidia GPUs. I see both GPUs with nvidia-smi cli directly on the VM and also in a plain docker or podman container running on the VM. However when installing minikube with docker runtime, the install fails because it seems to expect /dev/nvidia/mode-set (see error logs below) which (a) was not needed by plain docker/ podman containers and (b) I believe should not be needed in compute/ headless mode anyway. See this link for more info on compute only/ headless mode

https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/index.html#compute-only-headless-and-desktop-only-no-compute-installation

Attach the log file

Logs ..

`$ minikube start --driver docker --container-runtime docker --gpus all --memory no-limit --cpus no-limit
😄 minikube v1.36.0 on Redhat 9.4 (kvm/amd64)
✨ Using the docker driver based on user configuration
📌 Using Docker driver with root privileges
👍 Starting "minikube" primary control-plane node in "minikube" cluster
🚜 Pulling base image v0.0.47 ...
🔥 Creating docker container (CPUs=no-limit, Memory=no-limit) ...
🤦 StartHost failed, but will try again: creating host: create: creating: create kic node: create container: docker run -d -t --privileged --security-opt seccomp=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname minikube --name minikube --label created_by.minikube.sigs.k8s.io=true --label name.minikube.sigs.k8s.io=minikube --label role.minikube.sigs.k8s.io= --label mode.minikube.sigs.k8s.io=minikube --network minikube --ip 192.168.49.2 --gpus all --env NVIDIA_DRIVER_CAPABILITIES=all --volume minikube:/var --security-opt apparmor=unconfined -e container=docker --expose 8443 --publish=127.0.0.1::8443 --publish=127.0.0.1::22 --publish=127.0.0.1::2376 --publish=127.0.0.1::5000 --publish=127.0.0.1::32443 gcr.io/k8s-minikube/kicbase:v0.0.47@sha256:6ed579c9292b4370177b7ef3c42cc4b4a6dcd0735a1814916cbc22c8bf38412b: exit status 127
stdout:
45071bf2be654bdf56db4fbdb60f1f29ae38bb220af313f4f8589ec7d383b502

stderr:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: stat failed: /dev/nvidia-modeset: no such file or directory: unknown`

Operating System

Redhat/Fedora

Driver

Docker

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions