Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUFFT_INTERNAL_ERROR on RTX 4090 #200

Closed
dhstsw opened this issue Apr 1, 2023 · 7 comments
Closed

CUFFT_INTERNAL_ERROR on RTX 4090 #200

dhstsw opened this issue Apr 1, 2023 · 7 comments
Labels
documentation Improvements or additions to documentation

Comments

@dhstsw
Copy link

dhstsw commented Apr 1, 2023

Describe the bug
pytorch with cu117 causing CUFFT_INTERNAL_ERROR on RTX 4090 (and probably on RTX 4080 too, untested).

To Reproduce
Just run svc train on a RTX 4090.

Additional context
Problem has been reported (for cu177) in the end of october in pytorch/pytorch github.
A fix is (for other applications) to uninstall torch and install instead the latest torch 2.0.0 nightly with cu118, but it doesn't work with so-vits-fork.

more infos here:
pytorch/pytorch#88038

would it be possible to fix so-vits-fork to work with it?

thanks.

@dhstsw dhstsw added the bug Something isn't working label Apr 1, 2023
@34j
Copy link
Collaborator

34j commented Apr 1, 2023

Thanks for your report, does this package not work with cu118? What errors do you get?

@dhstsw
Copy link
Author

dhstsw commented Apr 1, 2023

Thanks for your report, does this package not work with cu118? What errors do you get?

Hi, as said, after starting training it quits giving:

CUFFT_INTERNAL_ERROR

Don't own a 4090 myself, tried on a cloud rented GPU.
Asking on their discord support they pointed me to that issue/solution in pytorch git (the one i linked above).

If necessary i could rent that gpu again, install everything a give you a better report.

@34j
Copy link
Collaborator

34j commented Apr 1, 2023

A fix is (for other applications) to uninstall torch and install instead the latest torch 2.0.0 nightly with cu118, but it doesn't work with so-vits-fork.

CUFFT_INTERNAL_ERROR

with cu118 version?

@dhstsw
Copy link
Author

dhstsw commented Apr 1, 2023

So, this is what i do:

-Connect to the rented machine terminal, then:

apt update
apt install python3.10-dev
python -m pip install -U pip setuptools wheel
pip install -U so-vits-svc-fork

pip uninstall torch -y
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118

svc pre-resample
svc pre-config
svc pre-hubert
svc train -t

I immediately get this:

root@989f522c2bd2:/workspace# svc train -t
[10:31:08] INFO     [10:31:08] Version: 2.1.5                                                                                                                                   __main__.py:20
Traceback (most recent call last):
  File "/usr/local/bin/svc", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/so_vits_svc_fork/__main__.py", line 99, in train
    from .train import train
  File "/usr/local/lib/python3.10/dist-packages/so_vits_svc_fork/train.py", line 22, in <module>
    from .data_utils import TextAudioCollate, TextAudioSpeakerLoader
  File "/usr/local/lib/python3.10/dist-packages/so_vits_svc_fork/data_utils.py", line 7, in <module>
    import torchaudio
  File "/usr/local/lib/python3.10/dist-packages/torchaudio/__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "/usr/local/lib/python3.10/dist-packages/torchaudio/_extension.py", line 135, in <module>
    _init_extension()
  File "/usr/local/lib/python3.10/dist-packages/torchaudio/_extension.py", line 105, in _init_extension
    _load_lib("libtorchaudio")
  File "/usr/local/lib/python3.10/dist-packages/torchaudio/_extension.py", line 52, in _load_lib
    torch.ops.load_library(path)
  File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 783, in load_library
    ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory

@34j
Copy link
Collaborator

34j commented Apr 1, 2023

Have you install cuda toolkit 11.8?

@34j 34j added documentation Improvements or additions to documentation and removed bug Something isn't working labels Apr 1, 2023
@dhstsw
Copy link
Author

dhstsw commented Apr 1, 2023

Have you install cuda toolkit 11.8?

MMmmm... nope.

Any way to verify on linux wich tookit are currently installed?

thx

@34j 34j closed this as not planned Won't fix, can't repro, duplicate, stale Apr 2, 2023
@34j 34j reopened this Apr 2, 2023
@34j
Copy link
Collaborator

34j commented Apr 2, 2023

Have you install cuda toolkit 11.8?

MMmmm... nope.

Any way to verify on linux wich tookit are currently installed?

thx

Search by yourself

@34j 34j closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants