You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Not able train the model after successfully after running the following commands.
svc pre-resample
svc pre-config
svc pre-hubert -fm crepe
svc train -t # <- this gives me the following error.
❯ svc train -t
[07:44:07] INFO [07:44:07] Version: 3.9.3 __main__.py:22
[07:44:12] INFO [07:44:12] Created a temporary directory at /tmp/tmpmhzyrk2f instantiator.py:21
INFO [07:44:12] Writing /tmp/tmpmhzyrk2f/_remote_module_non_scriptable.py instantiator.py:76
[07:44:13] INFO [07:44:13] Server binary (from Python package v0.7.0): server_ingester.py:290
/home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/tensorboard_data_server/bin/server
[07:44:16] WARNING [07:44:16] Failed to communicate with data server at localhost:36617: <_InactiveRpcError of RPC that terminated with: server_ingester.py:187
status = StatusCode.UNAVAILABLE
details = "DNS resolution failed for localhost:36617: C-ares status is not ARES_SUCCESS qtype=AAAA name=localhost is_balancer=0: Could not contact DNS servers"
debug_error_string = "UNKNOWN:DNS resolution failed for localhost:36617: C-ares status is not ARES_SUCCESS qtype=AAAA name=localhost is_balancer=0: Could not contact DNS servers {created_time:"2023-04-17T07:44:16.0903992+00:00", grpc_status:14}">
[07:44:19] INFO [07:44:19] Using strategy: auto train.py:82
INFO: GPU available: True (cuda), used: True
INFO [07:44:19] GPU available: True (cuda), used: True rank_zero.py:48
INFO: TPU available: False, using: 0 TPU cores
INFO [07:44:19] TPU available: False, using: 0 TPU cores rank_zero.py:48
INFO: IPU available: False, using: 0 IPUs
INFO [07:44:19] IPU available: False, using: 0 IPUs rank_zero.py:48
INFO: HPU available: False, using: 0 HPUs
INFO [07:44:19] HPU available: False, using: 0 HPUs rank_zero.py:48
[07:44:20] WARNING [07:44:20] /home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/so_vits_svc_fork/modules/synthesizers.py:81: UserWarning: Unused arguments: warnings.py:109
{'n_layers_q': 3, 'use_spectral_norm': False}
warnings.warn(f"Unused arguments: {kwargs}")
INFO [07:44:20] Decoder type: hifi-gan synthesizers.py:100
[07:44:21] WARNING [07:44:21] /home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/so_vits_svc_fork/utils.py:190: UserWarning: Keys not found in checkpoint state warnings.py:109
dict:['emb_g.weight']
warnings.warn(f"Keys not found in checkpoint state dict:" f"{not_in_from}")
INFO [07:44:21] Loaded checkpoint 'logs/44k/G_0.pth' (iteration 0) utils.py:247
INFO [07:44:21] Loaded checkpoint 'logs/44k/D_0.pth' (iteration 0) utils.py:247
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[07:44:23] INFO [07:44:23] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] cuda.py:57
┏━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ ┃ Name ┃ Type ┃ Params ┃
┡━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 0 │ net_g │ SynthesizerTrn │ 45.2 M │
│ 1 │ net_d │ MultiPeriodDiscriminator │ 46.7 M │
└───┴───────┴──────────────────────────┴────────┘
Trainable params: 91.9 M
Non-trainable params: 0
Total params: 91.9 M
Total estimated model params size (MB): 367
WARNING [07:44:23] /home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:430: PossibleUserWarning: The warnings.py:109
dataloader, val_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 4 which is the number of cpus on this machine) in the `DataLoader` init to improve performance. rank_zero_warn(Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory[1] 18763 IOT instruction (core dumped) svc train -t
To Reproduce
I am on the latest version of arch linux with latest nvidia-drivers installed. When running nvidia-smi, I get the following output.
Additional context
Images of the error I am getting.
Further, I have observed that I get some semaphore error after the failed training script. The errors I get are pasted below.
╰─ /home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/joblib/externals/loky/backend/resource_tracker.py:310: UserWarning: resource_tracker: There appear to be 8 leaked semlock objects to clean up at shutdown
warnings.warn(
/home/apoorvagnihotri/miniconda3/envs/so-vits/lib/python3.10/site-packages/joblib/externals/loky/backend/resource_tracker.py:310: UserWarning: resource_tracker: There appear to be 1 leaked folder objects to clean up at shutdown
warnings.warn(
The text was updated successfully, but these errors were encountered:
34j
changed the title
CUDA Issues while training.
Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory [1] 18763 IOT instruction (core dumped) svc train -t
Apr 18, 2023
34j
changed the title
Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory [1] 18763 IOT instruction (core dumped) svc train -t
Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory
Apr 18, 2023
Sorry but it seemes there is nothing we can do about it. If you can verify that other applications using cudnn work, and you are confident that this is a problem with this repository, please reopen it.
Describe the bug
Not able train the model after successfully after running the following commands.
svc pre-resample svc pre-config svc pre-hubert -fm crepe svc train -t # <- this gives me the following error.
To Reproduce
I am on the latest version of arch linux with latest nvidia-drivers installed. When running
nvidia-smi
, I get the following output.Additional context
Images of the error I am getting.
Further, I have observed that I get some semaphore error after the failed training script. The errors I get are pasted below.
The text was updated successfully, but these errors were encountered: