Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory #364

apoorvagnihotri · 2023-04-17T08:02:41Z

Describe the bug
Not able train the model after successfully after running the following commands.

svc pre-resample
svc pre-config
svc pre-hubert -fm crepe
svc train -t # <- this gives me the following error.

❯ svc train -t
[07:44:07] INFO     [07:44:07] Version: 3.9.3                                                                                                                                                   __main__.py:22
[07:44:12] INFO     [07:44:12] Created a temporary directory at /tmp/tmpmhzyrk2f                                                                                                            instantiator.py:21
           INFO     [07:44:12] Writing /tmp/tmpmhzyrk2f/_remote_module_non_scriptable.py                                                                                                    instantiator.py:76
[07:44:13] INFO     [07:44:13] Server binary (from Python package v0.7.0):                                                                                                              server_ingester.py:290
                    /home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/tensorboard_data_server/bin/server
[07:44:16] WARNING  [07:44:16] Failed to communicate with data server at localhost:36617: <_InactiveRpcError of RPC that terminated with:                                               server_ingester.py:187
                            status = StatusCode.UNAVAILABLE
                            details = "DNS resolution failed for localhost:36617: C-ares status is not ARES_SUCCESS qtype=AAAA name=localhost is_balancer=0: Could not contact DNS
                    servers"
                            debug_error_string = "UNKNOWN:DNS resolution failed for localhost:36617: C-ares status is not ARES_SUCCESS qtype=AAAA name=localhost is_balancer=0: Could
                    not contact DNS servers {created_time:"2023-04-17T07:44:16.0903992+00:00", grpc_status:14}"
                    >
[07:44:19] INFO     [07:44:19] Using strategy: auto                                                                                                                                                train.py:82
INFO: GPU available: True (cuda), used: True
           INFO     [07:44:19] GPU available: True (cuda), used: True                                                                                                                          rank_zero.py:48
INFO: TPU available: False, using: 0 TPU cores
           INFO     [07:44:19] TPU available: False, using: 0 TPU cores                                                                                                                        rank_zero.py:48
INFO: IPU available: False, using: 0 IPUs
           INFO     [07:44:19] IPU available: False, using: 0 IPUs                                                                                                                             rank_zero.py:48
INFO: HPU available: False, using: 0 HPUs
           INFO     [07:44:19] HPU available: False, using: 0 HPUs                                                                                                                             rank_zero.py:48
[07:44:20] WARNING  [07:44:20] /home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/so_vits_svc_fork/modules/synthesizers.py:81: UserWarning: Unused arguments:          warnings.py:109
                    {'n_layers_q': 3, 'use_spectral_norm': False}
                      warnings.warn(f"Unused arguments: {kwargs}")

           INFO     [07:44:20] Decoder type: hifi-gan                                                                                                                                      synthesizers.py:100
[07:44:21] WARNING  [07:44:21] /home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/so_vits_svc_fork/utils.py:190: UserWarning: Keys not found in checkpoint state       warnings.py:109
                    dict:['emb_g.weight']
                      warnings.warn(f"Keys not found in checkpoint state dict:" f"{not_in_from}")

           INFO     [07:44:21] Loaded checkpoint 'logs/44k/G_0.pth' (iteration 0)                                                                                                                 utils.py:247
           INFO     [07:44:21] Loaded checkpoint 'logs/44k/D_0.pth' (iteration 0)                                                                                                                 utils.py:247
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[07:44:23] INFO     [07:44:23] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]                                                                                                                            cuda.py:57
┏━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃   ┃ Name  ┃ Type                     ┃ Params ┃
┡━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 0 │ net_g │ SynthesizerTrn           │ 45.2 M │
│ 1 │ net_d │ MultiPeriodDiscriminator │ 46.7 M │
└───┴───────┴──────────────────────────┴────────┘
Trainable params: 91.9 M
Non-trainable params: 0
Total params: 91.9 M
Total estimated model params size (MB): 367
           WARNING  [07:44:23] /home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:430: PossibleUserWarning: The warnings.py:109
                    dataloader, val_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 4 which is the number
                    of cpus on this machine) in the `DataLoader` init to improve performance.
                      rank_zero_warn(

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory
[1]    18763 IOT instruction (core dumped)  svc train -t

To Reproduce
I am on the latest version of arch linux with latest nvidia-drivers installed. When running nvidia-smi, I get the following output.

Additional context
Images of the error I am getting.

Further, I have observed that I get some semaphore error after the failed training script. The errors I get are pasted below.

╰─ /home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/joblib/externals/loky/backend/resource_tracker.py:310: UserWarning: resource_tracker: There appear to be 8 leaked semlock objects to clean up at shutdown
  warnings.warn(
/home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/joblib/externals/loky/backend/resource_tracker.py:310: UserWarning: resource_tracker: There appear to be 1 leaked folder objects to clean up at shutdown
  warnings.warn(

The text was updated successfully, but these errors were encountered:

34j · 2023-04-18T10:16:04Z

Sorry but it seemes there is nothing we can do about it. If you can verify that other applications using cudnn work, and you are confident that this is a problem with this repository, please reopen it.

apoorvagnihotri · 2023-04-18T11:01:54Z

People might find this useful. I followed the instructions as given here and I am able to train the model.

pytorch/pytorch#97041 (comment)

I think this is an issue with Pytorch 2.0.

apoorvagnihotri added the bug Something isn't working label Apr 17, 2023

34j changed the title ~~CUDA Issues while training.~~ Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory [1] 18763 IOT instruction (core dumped) svc train -t Apr 18, 2023

34j closed this as not planned Won't fix, can't repro, duplicate, stale Apr 18, 2023

pablogsal mentioned this issue Dec 9, 2023

Could not load library libcudnn_ops_infer.so.8. bloomberg/memray#212

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory #364

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory #364

apoorvagnihotri commented Apr 17, 2023 •

edited

34j commented Apr 18, 2023 •

edited

apoorvagnihotri commented Apr 18, 2023 •

edited

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory #364

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory #364

Comments

apoorvagnihotri commented Apr 17, 2023 • edited

34j commented Apr 18, 2023 • edited

apoorvagnihotri commented Apr 18, 2023 • edited

apoorvagnihotri commented Apr 17, 2023 •

edited

34j commented Apr 18, 2023 •

edited

apoorvagnihotri commented Apr 18, 2023 •

edited