Description
What Happened?
I have a RHEL 9.4 VM with NVidia driver 5.70-open installed, container toolkit installed and two nvidia GPUs. I see both GPUs with nvidia-smi cli directly on the VM and also in a plain docker or podman container running on the VM. However when installing minikube with docker runtime, the install fails because it seems to expect /dev/nvidia/mode-set (see error logs below) which (a) was not needed by plain docker/ podman containers and (b) I believe should not be needed in compute/ headless mode anyway. See this link for more info on compute only/ headless mode
Attach the log file
Logs ..
`$ minikube start --driver docker --container-runtime docker --gpus all --memory no-limit --cpus no-limit
😄 minikube v1.36.0 on Redhat 9.4 (kvm/amd64)
✨ Using the docker driver based on user configuration
📌 Using Docker driver with root privileges
👍 Starting "minikube" primary control-plane node in "minikube" cluster
🚜 Pulling base image v0.0.47 ...
🔥 Creating docker container (CPUs=no-limit, Memory=no-limit) ...
🤦 StartHost failed, but will try again: creating host: create: creating: create kic node: create container: docker run -d -t --privileged --security-opt seccomp=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname minikube --name minikube --label created_by.minikube.sigs.k8s.io=true --label name.minikube.sigs.k8s.io=minikube --label role.minikube.sigs.k8s.io= --label mode.minikube.sigs.k8s.io=minikube --network minikube --ip 192.168.49.2 --gpus all --env NVIDIA_DRIVER_CAPABILITIES=all --volume minikube:/var --security-opt apparmor=unconfined -e container=docker --expose 8443 --publish=127.0.0.1::8443 --publish=127.0.0.1::22 --publish=127.0.0.1::2376 --publish=127.0.0.1::5000 --publish=127.0.0.1::32443 gcr.io/k8s-minikube/kicbase:v0.0.47@sha256:6ed579c9292b4370177b7ef3c42cc4b4a6dcd0735a1814916cbc22c8bf38412b: exit status 127
stdout:
45071bf2be654bdf56db4fbdb60f1f29ae38bb220af313f4f8589ec7d383b502
stderr:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: stat failed: /dev/nvidia-modeset: no such file or directory: unknown`
Operating System
Redhat/Fedora
Driver
Docker