Skip to content
This repository has been archived by the owner on Oct 1, 2021. It is now read-only.

Training problem #14

Open
JokeCorleone opened this issue Jun 7, 2020 · 6 comments
Open

Training problem #14

JokeCorleone opened this issue Jun 7, 2020 · 6 comments

Comments

@JokeCorleone
Copy link

First of all, thank you for sharing the open-source of Multi-Tacotron-Voice-Cloning. I also just started learning about natural language processing programming. And I also started learning Python programming.
-I put the software in the directory: D: \ SV2TTS
-I put the dataset in the directory: D: \ Datasets, I have D: \ Datasets \ book and D: \ Datasets \ LibriSpeech

When using the code you provided, I had some training issues:

  1. I have finished the steps
  • Run python encoder_preprocess.py D: \ Datasets
    and the result is
    Arguments:
    datasets_root: D: \ Datasets
    out_dir: D: \ Datasets \ SV2TTS \ encoder
    datasets: ['preprocess_voxforge']
    skip_existing: False
    Done preprocessing book.
  1. Run visdom
  2. But I could not continue
  • Run python encoder_train.py my_run D: \ Datasets
    because the notice appeared
    C: \ Users \ Admin \ anaconda3 \ envs \ [Test_Voice] \ lib \ site-packages \ umap \ spectral.py: 4: NumbaDeprecationWarning: No direct replacement for 'numba.targets' available. Visit https://gitter.im/numba/numba-dev to request help. Thanks!
    import numba.targets
    usage: encoder_train.py [-h] [--clean_data_root CLEAN_DATA_ROOT]
    [-m MODELS_DIR] [-v VIS_EVERY] [-u UMAP_EVERY]
    [-s SAVE_EVERY] [-b BACKUP_EVERY] [-f]
    [--visdom_server VISDOM_SERVER] [--no_visdom]
    run_id
    encoder_train.py: error: unrecognized arguments: D: \ Datasets

My question: How can I fix this problem?

Thanks again for your sharing!!!

@vlomme
Copy link
Owner

vlomme commented Jun 7, 2020

Hello. use
python encoder_train.py my_run --clean_data_root D:\Datasets\SV2TTS\encoder

@JokeCorleone
Copy link
Author

Hello @vlomme
Thank for your support,
When I used python encoder_train.py my_run --clean_data_root D:\Datasets\SV2TTS\encoder,
The result is

File "encoder_train.py", line 46, in
train(**vars(args))
File "D:\SV2TTS\encoder\train.py", line 87, in train
model.do_gradient_ops()
File "D:\SV2TTS\encoder\model.py", line 39, in do_gradient_ops
clip_grad_norm_(self.parameters(), 3, norm_type=2)
File "C:\Users\Admin\anaconda3\envs[Test_Voice]\lib\site-packages\torch\nn\utils\clip_grad.py", line 30, in clip_grad_norm_
total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
RuntimeError: All input tensors must be on the same device. Received cpu and cuda:0

@ramanova
Copy link

ramanova commented Jun 8, 2020

Hello, getting the same error, torch==1.5.0
I see that we have

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    # FIXME: currently, the gradient is None if loss_device is cuda
    loss_device = torch.device("cpu")

after that if we are using clip_grad_norm_ from torch, it performs operation on all of the parameters, two of which are on cpu, and the rest on cuda:0

total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)

which throws the error.
Could it be that the torch version is incorrect? I'm using 1.5.0

[UPDATE]
Reinstalled torch and it started training!
pip uninstall torch # you might need to ran it twice, check with
pip list | grep torch # that you don't have torch left
pip install torch # or pip install torch==1.5.0 to ensure the version

@JokeCorleone
Copy link
Author

Hello,
When I trained vocoder (run python vocoder_train.py my_run D:\Datasets), I encountered an error:

+------------+--------+--------------+
| Batch size | LR | Sequence Len |
+------------+--------+--------------+
| 60 | 0.0001 | 1000 |
+------------+--------+--------------+

RuntimeError: CUDA out of memory. Tried to allocate 118.00 MiB (GPU 0; 4.00 GiB total capacity; 2.87 GiB already allocated; 10.61 MiB free; 32.29 MiB cached)

how can i solve this error ?

@vlomme
Copy link
Owner

vlomme commented Jun 16, 2020

not enough video memory. Reduce the Batch size

@JokeCorleone
Copy link
Author

Thank @vlomme

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants