Làm sao để train vậy bạn? #2

dugduy · 2022-08-16T15:39:43Z

Bạn có thể hướng đẫn mình cách train chi tiết được không ạ?
Nếu được thì bạn cho mình xin thử pretrained model với!

v-nhandt21 · 2022-08-17T10:11:08Z

Để hôm nào mình rảnh mình tổng hợp lại nha, do giờ mình đã chuyển sang research Voice Cloning bằng model khác rồi, không có dùng cái này nữa

dugduy · 2022-08-19T11:06:52Z

Để hôm nào mình rảnh mình tổng hợp lại nha, do giờ mình đã chuyển sang research Voice Cloning bằng model khác rồi, không có dùng cái này nữa

Bạn đang dùng model nào vậy? Cho mình tham khảo với!

UncleBob2 · 2023-10-17T04:53:26Z

Cho mình tham khảo với!

v-nhandt21 · 2023-10-17T05:20:56Z

Cho mình tham khảo với!

Bạn thử cái này xem, mình mới cập nhật á, test VIVOS trước xong extend ra dataset của bạn nè

I have updated the pipeline for training: https://github.com/v-nhandt21/ViSV2TTS/blob/master/README.md

Try to test the pipeline first with VIVOS then config it to run with your data

kingkong135 · 2023-10-25T10:57:55Z

Cho mình tham khảo với!

Bạn thử cái này xem, mình mới cập nhật á, test VIVOS trước xong extend ra dataset của bạn nè

I have updated the pipeline for training: https://github.com/v-nhandt21/ViSV2TTS/blob/master/README.md

Try to test the pipeline first with VIVOS then config it to run with your data

Cho mình hỏi là phần tiền xử lý sử dụng vi2IPA_split có thể áp dụng cho thuật toán TTS: VITS, VITS2 được không bạn nhỉ ?

v-nhandt21 · 2023-10-27T02:12:24Z

Cho mình tham khảo với!

Bạn thử cái này xem, mình mới cập nhật á, test VIVOS trước xong extend ra dataset của bạn nè
I have updated the pipeline for training: https://github.com/v-nhandt21/ViSV2TTS/blob/master/README.md
Try to test the pipeline first with VIVOS then config it to run with your data

Cho mình hỏi là phần tiền xử lý sử dụng vi2IPA_split có thể áp dụng cho thuật toán TTS: VITS, VITS2 được không bạn nhỉ ?

Được hết á bạn, vi2IPA thì nó convert raw text thành grapheme dạng IPA, ngoài ra bạn cũng có thể thử dạng ARPAbet

https://github.com/v-nhandt21/ViMFA/blob/main/phoneme_dict/viARPAbet.txt

UncleBob2 · 2023-11-01T07:10:24Z

cảm ơn bạn.
it seems that I have to convert the utf16 to utf8

For Train Model - where is the train_ms.py file?
python train_ms.py -c configs/vivos.json -m vivos

UncleBob2 · 2023-11-02T08:30:06Z

this is a mistake here

cat vivos/test/prompts.txt > DATA/val.txt
cat vivos/test/prompts.txt > DATA/train.txt
cat vivos/train/prompts.txt >> DATA/train.txt

it should be text to val and train to train? Why are we putting the test into the training?

cat vivos/test/prompts.txt > DATA/val.txt
cat vivos/train/prompts.txt >> DATA/train.txt

v-nhandt21 · 2023-11-02T11:18:35Z

this is a mistake here

cat vivos/test/prompts.txt > DATA/val.txt cat vivos/test/prompts.txt > DATA/train.txt cat vivos/train/prompts.txt >> DATA/train.txt

it should be text to val and train to train? Why are we putting the test into the training?

cat vivos/test/prompts.txt > DATA/val.txt cat vivos/train/prompts.txt >> DATA/train.txt

No, I did it intentionally, I try to merge it:

val.txt = test set
train.txt = test set + train set

So that makes to train more data because the test in speech synthesis is not too important

P/S: the vivos is for checking source code only, we really need more data for this stuff

UncleBob2 · 2023-11-02T17:13:14Z

thanks a lot. I am fighting with this thing all the way i.e. running on windows 10 instead of linux

(viclone) C:\Users\aiwinsor\Documents\dev\ViSV2TTS>python app.py
Traceback (most recent call last):
File "app.py", line 84, in
object = VoiceClone("vits/logs/vivos/G_7700000.pth")
File "app.py", line 58, in init
_ = utils.load_checkpoint(checkpoint_path, self.net_g, None)
File "C:\Users\aiwinsor\Documents\dev\ViSV2TTS\vits\utils.py", line 19, in load_checkpoint
assert os.path.isfile(checkpoint_path)
AssertionError

(viclone) C:\Users\aiwinsor\Documents\dev\ViSV2TTS>

v-nhandt21 · 2023-11-03T02:01:32Z

File "app.py", line 84, in
object = VoiceClone("vits/logs/vivos/G_7700000.pth")

You can try to use the absolute path like "C:\Users\aiwinsor\Documents\dev\ViSV2TTS\vits\logs\vivos\G_7700000.pth"

UncleBob2 · 2023-11-06T07:19:12Z

I gave up on running the code in Windows 10 and am running on Ubuntu using

working with VIVOS
wget http://ailab.hcmus.edu.vn/assets/vivos.tar.gz
tar xzf vivos.tar.gz

I was able to run all the install environment without any issues

python Step1_data_processing.py. OK
python Step2_extract_feature.py OK

But I am getting the error here.

python train_ms.py -c configs/vivos.json -m vivos

Below is my errors

_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:800.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/functional.py:606: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:800.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/functional.py:606: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:800.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/functional.py:606: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:800.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/functional.py:606: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:800.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
Traceback (most recent call last):
File "train_ms.py", line 294, in
main()
File "train_ms.py", line 50, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/aiwinsor/vits/train_ms.py", line 118, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/home/aiwinsor/vits/train_ms.py", line 148, in train_and_evaluate
mel = spec_to_mel_torch(
File "/home/aiwinsor/vits/mel_processing.py", line 78, in spec_to_mel_torch
mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax)
TypeError: mel() takes 0 positional arguments but 5 were given

v-nhandt21 · 2023-11-06T07:32:53Z

I gave up on running the code in Windows 10 and am running on Ubuntu using

working with VIVOS wget http://ailab.hcmus.edu.vn/assets/vivos.tar.gz tar xzf vivos.tar.gz

I was able to run all the install environment without any issues

python Step1_data_processing.py. OK python Step2_extract_feature.py OK

But I am getting the error here.

python train_ms.py -c configs/vivos.json -m vivos

Below is my errors

_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:800.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] /home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/functional.py:606: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:800.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] /home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/functional.py:606: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:800.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] /home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/functional.py:606: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:800.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] /home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/functional.py:606: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:800.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] Traceback (most recent call last): File "train_ms.py", line 294, in main() File "train_ms.py", line 50, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes while not context.join(): File "/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/aiwinsor/miniconda3/envs/viclone/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/home/aiwinsor/vits/train_ms.py", line 118, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/home/aiwinsor/vits/train_ms.py", line 148, in train_and_evaluate mel = spec_to_mel_torch( File "/home/aiwinsor/vits/mel_processing.py", line 78, in spec_to_mel_torch mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax) TypeError: mel() takes 0 positional arguments but 5 were given

I think this error may be caused by library version: https://librosa.org/doc/main/generated/librosa.filters.mel.html

My librosa version is librosa=0.8.0, could you try:

conda install librosa=0.8.0

or

python -m pip install librosa==0.8.0

UncleBob2 · 2023-11-06T22:37:06Z

Cảm ơn bạn nhiều. Mình bây giờ mới bắt đầu training. Xin hỏi bạn có xem qua https://github.com/Plachtaa/VITS-fast-fine-tuning

ppthanhtn · 2023-11-07T09:29:06Z

File "app.py", line 84, in object = VoiceClone("vits/logs/vivos/G_7700000.pth")

You can try to use the absolute path like "C:\Users\aiwinsor\Documents\dev\ViSV2TTS\vits\logs\vivos\G_7700000.pth"

Trong folder vits không thấy có folder logs nào vậy bạn?

UncleBob2 · 2023-11-07T18:03:11Z

File "app.py", line 84, in object = VoiceClone("vits/logs/vivos/G_7700000.pth")
You can try to use the absolute path like "C:\Users\aiwinsor\Documents\dev\ViSV2TTS\vits\logs\vivos\G_7700000.pth"

Trong folder vits không thấy có folder logs nào vậy bạn?

It is a big file; hence, it may be the reason why he did not upload it.

ppthanhtn · 2023-11-07T23:56:52Z

Can you share us?

…

On Wed, 8 Nov 2023 at 01:03 UncleBob2 ***@***.***> wrote: File "app.py", line 84, in object = VoiceClone("vits/logs/vivos/G_7700000.pth") You can try to use the absolute path like "C:\Users\aiwinsor\Documents\dev\ViSV2TTS\vits\logs\vivos\G_7700000.pth" Trong folder vits không thấy có folder logs nào vậy bạn? It is a big file; hence, it may be the reason why he did not upload it. — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABBJ7DR7YTV3GRQFIQGEFH3YDJZWXAVCNFSM56WKBLV2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZZHEZTMMBTGA3A> . You are receiving this because you commented.Message ID: ***@***.***>

kingkong135 · 2023-11-08T04:01:00Z

Mọi người có thể dùng model ở đây vivos_ViSV2TTS, mình train tới 150k step thấy nghe cũng ổn.

ppthanhtn · 2023-11-08T04:28:58Z

@kingkong135 hình như source code này không còn work nữa, bạn có thể cho mình xin cái working source của bạn được ko?

Cám ơn bạn!

kingkong135 · 2023-11-08T04:45:42Z

@kingkong135 hình như source code này không còn work nữa, bạn có thể cho mình xin cái working source của bạn được ko?

Cám ơn bạn!

Mình vẫn chạy bình thường mà, có chăng là sửa trong cái file mel_preocessing.py 2 câu lệnh sau là do phiên bản python mình dùng.

   spec = torch.stft(y, n_fft= n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[wnsize_dtype_device],
                  center=center, pad_mode='reflect', normalized=False, `onesided=True)

    mel = librosa_mel_fn(sr=sampling_rate, n_fft = n_fft, n_mels= num_mels, fmin=fmin, fmax=fmax)

UncleBob2 · 2023-11-08T05:56:48Z

Mọi người có thể dùng model ở đây vivos_ViSV2TTS, mình train tới 150k step thấy nghe cũng ổn.

Cảm ơn Bạn, it is working for me in ubuntu.

UncleBob2 · 2023-11-08T06:01:35Z

@kingkong135 hình như source code này không còn work nữa, bạn có thể cho mình xin cái working source của bạn được ko?

Cám ơn bạn!

follow the instructions here:
conda create -y -n viclone python=3.8
conda activate viclone
conda install cudatoolkit=11.3.1 cudnn=8.2.1

python -m pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu116
cd vits
python -m pip install -r requirements.txt

make sure that you downgrade librosa==0.8.0

You will need to downgrade gradio and httpx

UncleBob2 · 2023-11-08T19:16:44Z

Xin hỏi các bạn có dùng qua website này https://ttsmaker.com/. Nó có thể thay đổi - Voice Speed, Pitch Adjustment và v.v. Mình đang muốn làm cách software như vậy.

v-nhandt21 · 2023-11-13T09:25:37Z

Mọi người có thể dùng model ở đây vivos_ViSV2TTS, mình train tới 150k step thấy nghe cũng ổn.

Have you tried to train on larger data :))

The data from VIVOS is for source code and env validation only, I think it would not be enough for the model to perform cloning. The data I used has size from 200-1000 hours of audio

UncleBob2 · 2023-11-14T04:53:51Z

Mọi người có thể dùng model ở đây vivos_ViSV2TTS, mình train tới 150k step thấy nghe cũng ổn.

Have you tried to train on larger data :))

The data from VIVOS is for source code and env validation only, I think it would not be enough for the model to perform cloning. The data I used has size from 200-1000 hours of audio

I think that the data size of 200-1000 hours of audio is too much. In the past, I was able to clone a voice using 1 hour or less of the voice. BTW, I am currently testing this model and it is working quite well.

https://github.com/rhasspy/piper-phonemize

kingkong135 · 2023-11-15T10:39:01Z

Mọi người có thể dùng model ở đây vivos_ViSV2TTS, mình train tới 150k step thấy nghe cũng ổn.

Have you tried to train on larger data :))

The data from VIVOS is for source code and env validation only, I think it would not be enough for the model to perform cloning. The data I used has size from 200-1000 hours of audio

Mình chưa, một phần do tài nguyên không phép, thường mình test với bộ dữ liệu dưới 25h. VỚi voice clone, mình nghĩ sử dụng càng ít dữ liệu nhưng độ chính xác vẫn cao thì tốt, 1 số model chỉ cần thời gian dưới 10 phút như RVC hoặc so-vits-svc (dĩ nhiên đầu vào là audio =))

v-nhandt21 · 2023-11-15T14:24:49Z

Mọi người có thể dùng model ở đây vivos_ViSV2TTS, mình train tới 150k step thấy nghe cũng ổn.

Have you tried to train on larger data :))
The data from VIVOS is for source code and env validation only, I think it would not be enough for the model to perform cloning. The data I used has size from 200-1000 hours of audio

Mình chưa, một phần do tài nguyên không phép, thường mình test với bộ dữ liệu dưới 25h. VỚi voice clone, mình nghĩ sử dụng càng ít dữ liệu nhưng độ chính xác vẫn cao thì tốt, 1 số model chỉ cần thời gian dưới 10 phút như RVC hoặc so-vits-svc (dĩ nhiên đầu vào là audio =))

Uhm, so-vits thì nó là voice conversion rồi á, là speech2speech :))

thanhlong1997 · 2023-11-29T07:51:06Z

Mọi người có thể dùng model ở đây vivos_ViSV2TTS, mình train tới 150k step thấy nghe cũng ổn.

Have you tried to train on larger data :))

The data from VIVOS is for source code and env validation only, I think it would not be enough for the model to perform cloning. The data I used has size from 200-1000 hours of audio

Hello bro, How many training step for the convergence of your dataset to reach the quantity as in your demo file (vits/audio/sontung_clone2.wav)?

v-nhandt21 · 2023-12-01T03:43:25Z

Mọi người có thể dùng model ở đây vivos_ViSV2TTS, mình train tới 150k step thấy nghe cũng ổn.

Have you tried to train on larger data :))
The data from VIVOS is for source code and env validation only, I think it would not be enough for the model to perform cloning. The data I used has size from 200-1000 hours of audio

Hello bro, How many training step for the convergence of your dataset to reach the quantity as in your demo file (vits/audio/sontung_clone2.wav)?

I get this audio at 1M iters: https://github.com/v-nhandt21/ViSV2TTS/blob/master/vits/audio/sontung_clone.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Làm sao để train vậy bạn? #2

Làm sao để train vậy bạn? #2

dugduy commented Aug 16, 2022

v-nhandt21 commented Aug 17, 2022

dugduy commented Aug 19, 2022

UncleBob2 commented Oct 17, 2023

v-nhandt21 commented Oct 17, 2023

kingkong135 commented Oct 25, 2023

v-nhandt21 commented Oct 27, 2023

UncleBob2 commented Nov 1, 2023 •

edited

Loading

UncleBob2 commented Nov 2, 2023

v-nhandt21 commented Nov 2, 2023

UncleBob2 commented Nov 2, 2023 •

edited

Loading

v-nhandt21 commented Nov 3, 2023

UncleBob2 commented Nov 6, 2023

v-nhandt21 commented Nov 6, 2023

UncleBob2 commented Nov 6, 2023

ppthanhtn commented Nov 7, 2023

UncleBob2 commented Nov 7, 2023

ppthanhtn commented Nov 7, 2023 via email

kingkong135 commented Nov 8, 2023

ppthanhtn commented Nov 8, 2023 •

edited

Loading

kingkong135 commented Nov 8, 2023

UncleBob2 commented Nov 8, 2023

UncleBob2 commented Nov 8, 2023

UncleBob2 commented Nov 8, 2023

v-nhandt21 commented Nov 13, 2023

UncleBob2 commented Nov 14, 2023

kingkong135 commented Nov 15, 2023 •

edited

Loading

v-nhandt21 commented Nov 15, 2023 •

edited

Loading

thanhlong1997 commented Nov 29, 2023

v-nhandt21 commented Dec 1, 2023

Làm sao để train vậy bạn? #2

Làm sao để train vậy bạn? #2

Comments

dugduy commented Aug 16, 2022

v-nhandt21 commented Aug 17, 2022

dugduy commented Aug 19, 2022

UncleBob2 commented Oct 17, 2023

v-nhandt21 commented Oct 17, 2023

kingkong135 commented Oct 25, 2023

v-nhandt21 commented Oct 27, 2023

UncleBob2 commented Nov 1, 2023 • edited Loading

UncleBob2 commented Nov 2, 2023

v-nhandt21 commented Nov 2, 2023

UncleBob2 commented Nov 2, 2023 • edited Loading

v-nhandt21 commented Nov 3, 2023

UncleBob2 commented Nov 6, 2023

v-nhandt21 commented Nov 6, 2023

UncleBob2 commented Nov 6, 2023

ppthanhtn commented Nov 7, 2023

UncleBob2 commented Nov 7, 2023

ppthanhtn commented Nov 7, 2023 via email

kingkong135 commented Nov 8, 2023

ppthanhtn commented Nov 8, 2023 • edited Loading

kingkong135 commented Nov 8, 2023

UncleBob2 commented Nov 8, 2023

UncleBob2 commented Nov 8, 2023

UncleBob2 commented Nov 8, 2023

v-nhandt21 commented Nov 13, 2023

UncleBob2 commented Nov 14, 2023

kingkong135 commented Nov 15, 2023 • edited Loading

v-nhandt21 commented Nov 15, 2023 • edited Loading

thanhlong1997 commented Nov 29, 2023

v-nhandt21 commented Dec 1, 2023

UncleBob2 commented Nov 1, 2023 •

edited

Loading

UncleBob2 commented Nov 2, 2023 •

edited

Loading

ppthanhtn commented Nov 8, 2023 •

edited

Loading

kingkong135 commented Nov 15, 2023 •

edited

Loading

v-nhandt21 commented Nov 15, 2023 •

edited

Loading