Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Torch is not able to use GPU error #6

Open
juh9870 opened this issue Aug 31, 2023 · 16 comments
Open

Getting Torch is not able to use GPU error #6

juh9870 opened this issue Aug 31, 2023 · 16 comments

Comments

@juh9870
Copy link

juh9870 commented Aug 31, 2023

I have pulled the latest https://github.com/AUTOMATIC1111/stable-diffusion-webui, added all nix files from this repo to the SD folder, ran nix-shell and waited for it to finish. Then I ran ./webui.sh and it installed dependencies of SD ui, and then I got this error:

################################################################
Launching launch.py...
################################################################
ldconfig: Can't open cache file /nix/store/3n58xw4373jp0ljirf06d8077j15pc4j-glibc-2.37-8/etc/ld.so.cache
: No such file or directory
Cannot locate TCMalloc (improves CPU memory usage)
Python 3.10.12 (main, Jun  6 2023, 22:43:10) [GCC 12.3.0]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Traceback (most recent call last):
  File "/home/juh9870/games/StableDiffusion/launch.py", line 48, in <module>
    main()
  File "/home/juh9870/games/StableDiffusion/launch.py", line 39, in main
    prepare_environment()
  File "/home/juh9870/games/StableDiffusion/modules/launch_utils.py", line 356, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

I have a 3080 GPU on my machine and I managed to run Auto1111 UI before switching to NixOS

@RikudouSage
Copy link

Seems like you don't have the hardware drivers for your GPU. You need to install those.

@juh9870
Copy link
Author

juh9870 commented Sep 20, 2023

I have drivers installed via configuration.nix, I'm using 23.05 channel, and other apps (like Blender) work fine with my GPU

@RikudouSage
Copy link

Are you sure you're using the driver by Nvidia?

@juh9870
Copy link
Author

juh9870 commented Sep 20, 2023

I'm on nixos and I followed the guide on using Nvidia drivers. https://nixos.wiki/wiki/Nvidia
I'm not using a laptop and I don't have other GPUs installed

@JeremyKennedy
Copy link

I resolved this by ensuring my NixOS install was using the latest NVIDIA driver. I had to sudo nixos-rebuild switch --upgrade. You can check the version with nvidia-smi. It should match when you do it inside vs outside the nix develop shell.

@Itrekr
Copy link

Itrekr commented Oct 7, 2023

Having the same issue with the same GPU. Inside the develop shell there is a library version mismatch and outside of it it appears to be fine. I'm extremely new to NixOS so I'm not quite sure how to solve such a thing. Did you manage to fix it @juh9870 ?

@juh9870
Copy link
Author

juh9870 commented Oct 7, 2023

I'm using this for now: https://github.com/AbdBarho/stable-diffusion-webui-docker
Had to enable GPU use in docker, but otherwise it's pretty straightforward

@LiquidZulu
Copy link

LiquidZulu commented Oct 22, 2023

I am also getting this problem, updating did not work. I am installing the nvidia drivers in my system flake like so:

{ config, lib, pkgs, ... }:

{
  # See https://nixos.wiki/wiki/Nvidia
  services.xserver.videoDrivers = [
    "nvidia" # https://github.com/NixOS/nixpkgs/issues/80936#issuecomment-1003784682
  ];

  hardware = {
    opengl = {
      enable = true;
      driSupport = true;
      driSupport32Bit = true;
    };

    nvidia = {

      # Modesetting is needed for most wayland compositors
      modesetting.enable = true;

      # Use the open source version of the kernel module
      open = true;

      # Enable the nvidia settings menu
      nvidiaSettings = true;

      # Optionally, you may need to select the appropriate driver version for your specific GPU.
      package = config.boot.kernelPackages.nvidiaPackages.stable;
    };
  };
}

You can check the version with nvidia-smi.

@JeremyKennedy is there a specific package which provides this? I do not have it on my system and cannot find it on https://search.nixos.org/packages?channel=23.05&from=0&size=50&sort=relevance&type=packages&query=nvidia-smi

@jpentland
Copy link

jpentland commented Jan 11, 2024

I have the same error, but I get this message when using 'nvidia-smi':

Without the flake enabled:

Thu Jan 11 16:06:09 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:09:00.0  On |                  N/A |
| 53%   42C    P3              37W / 170W |   2568MiB / 12288MiB |     35%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+

With the flake enabled:

Failed to initialize NVML: Driver/library version mismatch

I also tried copying the sha256 revision from the 'flake.lock' of my system config flake to the flake.lock used by this project, in order to try to make the versions match, but I still seem to get the same error.

edit: And yes, I have rebooted since the last time running "nixos-rebuild switch"

@wyndon
Copy link

wyndon commented Feb 19, 2024

This error is because the driver version isn't the same between the one you use on your system config and the one pulled up from the devShell. NVIDIA is very strict regarding versions, it should be the exact same.

It either means you're using a different variant (if you're using the beta drivers for example, by default the devShell from this flake pulls up the regular version), or it could also mean the flake.lock from the repo is outdated, but it shouldn't, because in the README you're not supposed to copy the flake.lock anyways, so nix generates a new one with the latest versions available. It could also come from the fact that you're using a release channel (e.g. 23.11) instead of unstable, which could also causes versions mismatch.

To fix it you just need to use the drivers from unstable, or edit the flake to use a release channel. And if you're using a variant (e.g. the beta one) you need to edit the devShell.

linuxPackages.nvidia_x11_beta

@jpentland
Copy link

jpentland commented Feb 19, 2024

I have this in my system configuration:

  services.xserver.videoDrivers = [ "nvidia" ];
  hardware.nvidia = {
    modesetting.enable = true;
    powerManagement.enable = false;
    powerManagement.finegrained = false;
    open = false;
    nvidiaSettings = true;
    package = config.boot.kernelPackages.nvidiaPackages.stable;
  };

I'm not sure which underlying package that corresponds to.

edit: I have managed to solve the issue, by doing what I already mentioned above and copying my systems nixpkgs section of flake.lock into the projects flake.lock. Not sure why it wasn't working before.

@UlyssesZh
Copy link

I solved this by replacing nixos-unstable with nixos-23.11 in flake.nix.

@jasper-clarke
Copy link

jasper-clarke commented Apr 13, 2024

Can anyone help me with this, I"m on unstable 24.05 and I use a custom nvidia package, but I'm not sure how to port it to the flake.nix!

hardware.nvidia.package = let
  rcu_patch = pkgs.fetchpatch {
    url = "https://github.com/gentoo/gentoo/raw/c64caf53/x11-drivers/nvidia-drivers/files/nvidia-drivers-470.223.02-gpl-pfn_valid.patch";
    hash = "sha256-eZiQQp2S/asE7MfGvfe6dA/kdCvek9SYa/FFGp24dVg=";
  };
  in config.boot.kernelPackages.nvidiaPackages.mkDriver {
    version = "535.154.05";
    sha256_64bit = "sha256-fpUGXKprgt6SYRDxSCemGXLrEsIA6GOinp+0eGbqqJg=";
    sha256_aarch64 = "sha256-G0/GiObf/BZMkzzET8HQjdIcvCSqB1uhsinro2HLK9k=";
    openSha256 = "sha256-wvRdHguGLxS0mR06P5Qi++pDJBCF8pJ8hr4T8O6TJIo=";
    settingsSha256 = "sha256-9wqoDEWY4I7weWW05F4igj1Gj9wjHsREFMztfEmqm10=";
    persistencedSha256 = "sha256-d0Q3Lk80JqkS1B54Mahu2yY/WocOqFFbZVBh+ToGhaE=";
    patches = [ rcu_patch ];
  };

*Edit
I ended up just using the docker version with virtualization.docker.enableNvidia

@BenMac31
Copy link

Since I already had cudatoolkit on the host system, simply removing the cudatoolkit package from the impl.nix CUDA variant fixed the issue for me.

@Rexcrazy804
Copy link

Since I already had cudatoolkit on the host system, simply removing the cudatoolkit package from the impl.nix CUDA variant fixed the issue for me.

this worked for me as well, additionally you can run nvidia-smi inside the shell and it would report a cuda version error
from the looks of it the current cuda version is 12.4 but the the latest manifest at nixpkgs is 12.3 so its an upstream issue.
for the time being this workaround works like a charm 😄

@natervader
Copy link

I solved this by replacing nixos-unstable with nixos-23.11 in flake.nix.

This did the trick for me. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests