Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to load Model + Config on a fresh venv install #91

Closed
ThrowawayAccount01 opened this issue Mar 24, 2023 · 6 comments
Closed

Failed to load Model + Config on a fresh venv install #91

ThrowawayAccount01 opened this issue Mar 24, 2023 · 6 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@ThrowawayAccount01
Copy link

Created a fresh venv, ran:

pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu117
pip install -U so-vits-svc-fork

In the GUI, specified Model path and Config file to a pre-trained model.

Trying the infer/start real time inference gives the following error:

(venv) C:\Users\LXC PC\Desktop\testvits\venv\Scripts>svcg
[09:07:13] INFO     [09:07:13] Version: 1.3.0                                                             __main__.py:47
[09:07:19] INFO     [09:07:19] Event model_path, values {'model_path': 'C:/Users/LXC                          gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path': '',
                    'config_path_browse': '', 'cluster_model_path': '', 'cluster_model_path_browse': '',
                    'speaker': '', 'silence_threshold': -35.0, 'transpose': 12.0, 'auto_predict_f0': False,
                    'f0_method': 'dio', 'cluster_infer_ratio': 0.0, 'noise_scale': 0.4, 'pad_seconds': 0.1,
                    'chunk_seconds': 0.5, 'absolute_thresh': True, 'input_path': '', 'input_path_browse': '',
                    'auto_play': False, 'crossfade_seconds': 0.05, 'block_seconds': 0.35,
                    'additional_infer_before_seconds': 0.15, 'additional_infer_after_seconds': 0.1,
                    'realtime_algorithm': '1 (Divide constantly)', 'input_device': 'Microsoft Sound Mapper -
                    Input', 'output_device': 'Primary Sound Driver', 'passthrough_original': False,
                    'presets': 'Default VC (GPU, GTX 1060)', 'preset_name': '', 'use_gpu': True}
           INFO     [09:07:19] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000020621B09760> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:19] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFC2430> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:19] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFC25B0> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:19] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFCA160> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
[09:07:22] INFO     [09:07:22] Event config_path, values {'model_path': 'C:/Users/LXC                         gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/config.json',
                    'config_path_browse': 'C:/Users/LXC
                    PC/Desktop/testvits/venv/Scripts/testhapi/config.json', 'cluster_model_path': '',
                    'cluster_model_path_browse': '', 'speaker': '', 'silence_threshold': -35.0, 'transpose':
                    12.0, 'auto_predict_f0': False, 'f0_method': 'dio', 'cluster_infer_ratio': 0.0,
                    'noise_scale': 0.4, 'pad_seconds': 0.1, 'chunk_seconds': 0.5, 'absolute_thresh': True,
                    'input_path': '', 'input_path_browse': '', 'auto_play': False, 'crossfade_seconds': 0.05,
                    'block_seconds': 0.35, 'additional_infer_before_seconds': 0.15,
                    'additional_infer_after_seconds': 0.1, 'realtime_algorithm': '1 (Divide constantly)',
                    'input_device': 'Microsoft Sound Mapper - Input', 'output_device': 'Primary Sound
                    Driver', 'passthrough_original': False, 'presets': 'Default VC (GPU, GTX 1060)',
                    'preset_name': '', 'use_gpu': True}
           INFO     [09:07:22] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x0000020621B09760> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:22] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFC2430> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:22] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFC25B0> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:07:22] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000002065FFCA160> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
[09:07:50] INFO     [09:07:50] Event start_vc, values {'model_path': 'C:/Users/LXC                            gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/config.json',
                    'config_path_browse': 'C:/Users/LXC
                    PC/Desktop/testvits/venv/Scripts/testhapi/config.json', 'cluster_model_path': '',
                    'cluster_model_path_browse': '', 'speaker': 'hapiraw', 'silence_threshold': -35.0,
                    'transpose': 0.0, 'auto_predict_f0': False, 'f0_method': 'dio', 'cluster_infer_ratio':
                    0.0, 'noise_scale': 0.4, 'pad_seconds': 0.1, 'chunk_seconds': 0.5, 'absolute_thresh':
                    True, 'input_path': '', 'input_path_browse': '', 'auto_play': False, 'crossfade_seconds':
                    0.05, 'block_seconds': 0.35, 'additional_infer_before_seconds': 0.15,
                    'additional_infer_after_seconds': 0.1, 'realtime_algorithm': '1 (Divide constantly)',
                    'input_device': 'CABLE Output (VB-Audio Virtual ', 'output_device': 'Realtek HD Audio 2nd
                    output (Realtek(R) Audio)', 'passthrough_original': False, 'presets': 'Default VC (GPU,
                    GTX 1060)', 'preset_name': '', 'use_gpu': True}
[09:07:54] ERROR    [09:07:54] Error in realtime:                                                             gui.py:598
           ERROR    [09:07:54] [WinError 6] The handle is invalid                                             gui.py:602
                    pebble.common.RemoteTraceback: Traceback (most recent call last):
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\pebble\common.py", line
                    174, in process_execute
                        return function(*args, **kwargs)
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\inference_main.py", line 109,
                    in realtime
                        svc_model = Svc(
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\inference\infer_tool.py",
                    line 108, in __init__
                        self.hubert_model = utils.get_hubert_model().to(self.dev)
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\utils.py", line 333, in
                    get_hubert_model
                        vec_path = ensure_hubert_model()
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\utils.py", line 328, in
                    ensure_hubert_model
                        download_file(url, vec_path, desc="Downloading Hubert model")
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\utils.py", line 295, in
                    download_file
                        with temppath.open("wb") as f, tqdm(
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 1095,
                    in __init__
                        self.refresh(lock_args=self.lock_args)
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 1344,
                    in refresh
                        self.display()
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 1492,
                    in display
                        self.sp(self.__str__() if msg is None else msg)
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 347,
                    in print_status
                        fp_write('\r' + s + (' ' * max(last_len[0] - len_s, 0)))
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 340,
                    in fp_write
                        fp.write(str(s))
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\utils.py", line 127,
                    in inner
                        return func(*args, **kwargs)
                    OSError: [WinError 6] The handle is invalid


                    The above exception was the direct cause of the following exception:

                    Traceback (most recent call last):
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\gui.py",
                    line 600, in main
                        future.result()
                      File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 439, in result
                        return self.__get_result()
                      File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 391, in
                    __get_result
                        raise self._exception
                    OSError: [WinError 6] The handle is invalid
@34j 34j added bug Something isn't working help wanted Extra attention is needed labels Mar 24, 2023
@34j
Copy link
Collaborator

34j commented Mar 24, 2023

Can you also post the output for file inference?

@ThrowawayAccount01
Copy link
Author

Sure, here is the stack trace for file inference:

(venv) C:\Users\LXC PC\Desktop\testvits\venv\Scripts>svcg
[09:47:19] INFO     [09:47:19] Version: 1.3.0                                                             __main__.py:47
[09:47:23] INFO     [09:47:23] Event model_path, values {'model_path': 'C:/Users/LXC                          gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path': '',
                    'config_path_browse': '', 'cluster_model_path': '', 'cluster_model_path_browse': '',
                    'speaker': '', 'silence_threshold': -35.0, 'transpose': 12.0, 'auto_predict_f0': False,
                    'f0_method': 'dio', 'cluster_infer_ratio': 0.0, 'noise_scale': 0.4, 'pad_seconds': 0.1,
                    'chunk_seconds': 0.5, 'absolute_thresh': True, 'input_path': '', 'input_path_browse': '',
                    'auto_play': False, 'crossfade_seconds': 0.05, 'block_seconds': 0.35,
                    'additional_infer_before_seconds': 0.15, 'additional_infer_after_seconds': 0.1,
                    'realtime_algorithm': '1 (Divide constantly)', 'input_device': 'Microsoft Sound Mapper -
                    Input', 'output_device': 'Primary Sound Driver', 'passthrough_original': False,
                    'presets': 'Default VC (GPU, GTX 1060)', 'preset_name': '', 'use_gpu': True}
           INFO     [09:47:23] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x00000208EF109760> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:47:23] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x00000208AD6C2430> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:47:23] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x00000208AD6C25B0> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:47:23] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x00000208AD6CA160> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
[09:47:26] INFO     [09:47:26] Event config_path, values {'model_path': 'C:/Users/LXC                         gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/config.json',
                    'config_path_browse': 'C:/Users/LXC
                    PC/Desktop/testvits/venv/Scripts/testhapi/config.json', 'cluster_model_path': '',
                    'cluster_model_path_browse': '', 'speaker': '', 'silence_threshold': -35.0, 'transpose':
                    12.0, 'auto_predict_f0': False, 'f0_method': 'dio', 'cluster_infer_ratio': 0.0,
                    'noise_scale': 0.4, 'pad_seconds': 0.1, 'chunk_seconds': 0.5, 'absolute_thresh': True,
                    'input_path': '', 'input_path_browse': '', 'auto_play': False, 'crossfade_seconds': 0.05,
                    'block_seconds': 0.35, 'additional_infer_before_seconds': 0.15,
                    'additional_infer_after_seconds': 0.1, 'realtime_algorithm': '1 (Divide constantly)',
                    'input_device': 'Microsoft Sound Mapper - Input', 'output_device': 'Primary Sound
                    Driver', 'passthrough_original': False, 'presets': 'Default VC (GPU, GTX 1060)',
                    'preset_name': '', 'use_gpu': True}
           INFO     [09:47:26] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x00000208EF109760> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:47:26] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x00000208AD6C2430> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:47:26] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x00000208AD6C25B0> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [09:47:26] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x00000208AD6CA160> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
[09:47:39] INFO     [09:47:39] Event infer, values {'model_path': 'C:/Users/LXC                               gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/config.json',
                    'config_path_browse': 'C:/Users/LXC
                    PC/Desktop/testvits/venv/Scripts/testhapi/config.json', 'cluster_model_path': '',
                    'cluster_model_path_browse': '', 'speaker': 'hapiraw', 'silence_threshold': -35.0,
                    'transpose': 0.0, 'auto_predict_f0': False, 'f0_method': 'dio', 'cluster_infer_ratio':
                    0.0, 'noise_scale': 0.4, 'pad_seconds': 0.1, 'chunk_seconds': 0.5, 'absolute_thresh':
                    True, 'input_path': 'C:/Users/LXC PC/Desktop/sovitsinout/gunjou/gunjouraw.wav',
                    'input_path_browse': 'C:/Users/LXC PC/Desktop/sovitsinout/gunjou/gunjouraw.wav',
                    'auto_play': False, 'crossfade_seconds': 0.05, 'block_seconds': 0.35,
                    'additional_infer_before_seconds': 0.15, 'additional_infer_after_seconds': 0.1,
                    'realtime_algorithm': '1 (Divide constantly)', 'input_device': 'Microsoft Sound Mapper -
                    Input', 'output_device': 'Primary Sound Driver', 'passthrough_original': False,
                    'presets': 'Default VC (GPU, GTX 1060)', 'preset_name': '', 'use_gpu': True}
[09:47:41] ERROR    [09:47:41] [WinError 6] The handle is invalid                                             gui.py:539
                    Traceback (most recent call last):
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\gui.py",
                    line 509, in main
                        infer(
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\inference_main.py", line 47,
                    in infer
                        svc_model = Svc(
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\inference\infer_tool.py",
                    line 108, in __init__
                        self.hubert_model = utils.get_hubert_model().to(self.dev)
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\utils.py", line 333, in
                    get_hubert_model
                        vec_path = ensure_hubert_model()
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\utils.py", line 328, in
                    ensure_hubert_model
                        download_file(url, vec_path, desc="Downloading Hubert model")
                      File "C:\Users\LXC
                    PC\Desktop\testvits\venv\lib\site-packages\so_vits_svc_fork\utils.py", line 295, in
                    download_file
                        with temppath.open("wb") as f, tqdm(
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 1095,
                    in __init__
                        self.refresh(lock_args=self.lock_args)
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 1344,
                    in refresh
                        self.display()
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 1492,
                    in display
                        self.sp(self.__str__() if msg is None else msg)
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 347,
                    in print_status
                        fp_write('\r' + s + (' ' * max(last_len[0] - len_s, 0)))
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\std.py", line 340,
                    in fp_write
                        fp.write(str(s))
                      File "C:\Users\LXC PC\Desktop\testvits\venv\lib\site-packages\tqdm\utils.py", line 127,
                    in inner
                        return func(*args, **kwargs)
                    OSError: [WinError 6] The handle is invalid

@34j
Copy link
Collaborator

34j commented Mar 24, 2023

I have tried both virtualenv and conda and cannot reproduce. However, I have received other reports besides yours. I missed that you were using venv: ......

@34j
Copy link
Collaborator

34j commented Mar 24, 2023

Does the CLI work?

@ThrowawayAccount01
Copy link
Author

Using CLI, It downloads the HuBERT model first before doing inference:

(venv) C:\Users\LXC PC\Desktop\testvits\venv\Scripts>svc infer -m "C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi\G_21600.pth" -c "C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi\config.json" "C:\Users\LXC PC\Desktop\sovitsinout\gunjou\gunjouraw.wav"
[10:50:05] INFO     [10:50:05] Version: 1.3.0                                                             __main__.py:47
Downloading Hubert model: 100%|██████████████████████████████████████████████████| 1.24G/1.24G [00:59<00:00, 22.4MiB/s]
[10:51:08] INFO     [10:51:08] current directory is C:\Users\LXC                               hubert_pretraining.py:116
                    PC\Desktop\testvits\venv\Scripts
           INFO     [10:51:08] HubertPretrainingTask Config {'_name': 'hubert_pretraining',    hubert_pretraining.py:117
                    'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir':
                    'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False,
                    'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000,
                    'min_sample_size': 32000, 'single_target': False, 'random_crop': True,
                    'pad_audio': False}
           INFO     [10:51:08] HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0,                 hubert.py:250
                    'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768,
                    'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu,
                    'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1,
                    'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1,
                    'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True,
                    'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 +
                    [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False,
                    'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection':
                    static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1,
                    'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static,
                    'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space':
                    1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995],
                    'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False,
                    'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '',
                    'pos_enc_type': 'abs', 'fp16': False}
[10:51:10] INFO     [10:51:10] Loaded checkpoint 'C:/Users/LXC                                              utils.py:416
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth' (iteration 1801)
[10:51:15] INFO     [10:51:15] Chunk: Chunk(Speech: False, 61740.0)                                    infer_tool.py:259
           INFO     [10:51:15] Chunk: Chunk(Speech: True, 264600.0)                                    infer_tool.py:259
           WARNING  [10:51:15] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
[10:51:21] INFO     [10:51:21] F0 inference time:       6.136s, RTF: 0.877                                  utils.py:265
           INFO     [10:51:21] HuBERT inference time  : 0.057s, RTF: 0.008                                  utils.py:363
[10:51:22] INFO     [10:51:22] Inferece time: 0.71s, RTF: 0.10                                         infer_tool.py:213
           INFO     [10:51:22] Chunk: Chunk(Speech: False, 8820.0)                                     infer_tool.py:259
           INFO     [10:51:22] Chunk: Chunk(Speech: True, 326340.0)                                    infer_tool.py:259
           WARNING  [10:51:22] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
[10:51:23] INFO     [10:51:23] F0 inference time:       0.676s, RTF: 0.080                                  utils.py:265
           INFO     [10:51:23] HuBERT inference time  : 0.025s, RTF: 0.003                                  utils.py:363
           INFO     [10:51:23] Inferece time: 0.14s, RTF: 0.02                                         infer_tool.py:213
           INFO     [10:51:23] Chunk: Chunk(Speech: False, 8820.0)                                     infer_tool.py:259
           INFO     [10:51:23] Chunk: Chunk(Speech: True, 476280.0)                                    infer_tool.py:259
           WARNING  [10:51:23] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
[10:51:24] INFO     [10:51:24] F0 inference time:       0.874s, RTF: 0.074                                  utils.py:265
           INFO     [10:51:24] HuBERT inference time  : 0.023s, RTF: 0.002                                  utils.py:363
[10:51:25] INFO     [10:51:25] Inferece time: 0.13s, RTF: 0.01                                         infer_tool.py:213
           INFO     [10:51:25] Chunk: Chunk(Speech: False, 8820.0)                                     infer_tool.py:259
           INFO     [10:51:25] Chunk: Chunk(Speech: True, 564480.0)                                    infer_tool.py:259
           WARNING  [10:51:25] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
[10:51:28] INFO     [10:51:28] F0 inference time:       3.422s, RTF: 0.248                                  utils.py:265
[10:51:29] INFO     [10:51:29] HuBERT inference time  : 0.025s, RTF: 0.002                                  utils.py:363
           INFO     [10:51:29] Inferece time: 0.15s, RTF: 0.01                                         infer_tool.py:213
           INFO     [10:51:29] Chunk: Chunk(Speech: False, 26460.0)                                    infer_tool.py:259
           INFO     [10:51:29] Chunk: Chunk(Speech: True, 1146600.0)                                   infer_tool.py:259
           WARNING  [10:51:29] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
[10:51:31] INFO     [10:51:31] F0 inference time:       2.193s, RTF: 0.081                                  utils.py:265
[10:51:32] INFO     [10:51:32] HuBERT inference time  : 0.025s, RTF: 0.001                                  utils.py:363
           INFO     [10:51:32] Inferece time: 0.17s, RTF: 0.01                                         infer_tool.py:213
[10:51:33] INFO     [10:51:33] Chunk: Chunk(Speech: False, 167580.0)                                   infer_tool.py:259
           INFO     [10:51:33] Chunk: Chunk(Speech: True, 264600.0)                                    infer_tool.py:259
           WARNING  [10:51:33] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
           INFO     [10:51:33] F0 inference time:       0.546s, RTF: 0.078                                  utils.py:265
[10:51:34] INFO     [10:51:34] HuBERT inference time  : 0.009s, RTF: 0.001                                  utils.py:363
           INFO     [10:51:34] Inferece time: 0.03s, RTF: 0.00                                         infer_tool.py:213
           INFO     [10:51:34] Chunk: Chunk(Speech: False, 8820.0)                                     infer_tool.py:259
           INFO     [10:51:34] Chunk: Chunk(Speech: True, 326340.0)                                    infer_tool.py:259
           WARNING  [10:51:34] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
           INFO     [10:51:34] F0 inference time:       0.657s, RTF: 0.078                                  utils.py:265
[10:51:35] INFO     [10:51:35] HuBERT inference time  : 0.010s, RTF: 0.001                                  utils.py:363
           INFO     [10:51:35] Inferece time: 0.03s, RTF: 0.00                                         infer_tool.py:213
           INFO     [10:51:35] Chunk: Chunk(Speech: False, 8820.0)                                     infer_tool.py:259
           INFO     [10:51:35] Chunk: Chunk(Speech: True, 529200.0)                                    infer_tool.py:259
           WARNING  [10:51:35] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
[10:51:36] INFO     [10:51:36] F0 inference time:       1.022s, RTF: 0.079                                  utils.py:265
           INFO     [10:51:36] HuBERT inference time  : 0.023s, RTF: 0.002                                  utils.py:363
           INFO     [10:51:36] Inferece time: 0.14s, RTF: 0.01                                         infer_tool.py:213
[10:51:37] INFO     [10:51:37] Chunk: Chunk(Speech: False, 61740.0)                                    infer_tool.py:259
           INFO     [10:51:37] Chunk: Chunk(Speech: True, 2302020.0)                                   infer_tool.py:259
           WARNING  [10:51:37] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
[10:51:41] INFO     [10:51:41] F0 inference time:       4.037s, RTF: 0.076                                  utils.py:265
[10:51:42] INFO     [10:51:42] HuBERT inference time  : 0.028s, RTF: 0.001                                  utils.py:363
           INFO     [10:51:42] Inferece time: 0.35s, RTF: 0.01                                         infer_tool.py:213
[10:51:44] INFO     [10:51:44] Chunk: Chunk(Speech: False, 26460.0)                                    infer_tool.py:259
           INFO     [10:51:44] Chunk: Chunk(Speech: True, 255780.0)                                    infer_tool.py:259
           WARNING  [10:51:44] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
           INFO     [10:51:44] F0 inference time:       0.468s, RTF: 0.069                                  utils.py:265
[10:51:45] INFO     [10:51:45] HuBERT inference time  : 0.022s, RTF: 0.003                                  utils.py:363
           INFO     [10:51:45] Inferece time: 0.14s, RTF: 0.02                                         infer_tool.py:213
           INFO     [10:51:45] Chunk: Chunk(Speech: False, 8820.0)                                     infer_tool.py:259
           INFO     [10:51:45] Chunk: Chunk(Speech: True, 837900.0)                                    infer_tool.py:259
           WARNING  [10:51:45] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
[10:51:46] INFO     [10:51:46] F0 inference time:       1.560s, RTF: 0.078                                  utils.py:265
[10:51:47] INFO     [10:51:47] HuBERT inference time  : 0.025s, RTF: 0.001                                  utils.py:363
           INFO     [10:51:47] Inferece time: 0.14s, RTF: 0.01                                         infer_tool.py:213
[10:51:48] INFO     [10:51:48] Chunk: Chunk(Speech: False, 97020.0)                                    infer_tool.py:259
           INFO     [10:51:48] Chunk: Chunk(Speech: True, 1199520.0)                                   infer_tool.py:259
           WARNING  [10:51:48] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
[10:51:50] INFO     [10:51:50] F0 inference time:       2.190s, RTF: 0.078                                  utils.py:265
           INFO     [10:51:50] HuBERT inference time  : 0.025s, RTF: 0.001                                  utils.py:363
           INFO     [10:51:50] Inferece time: 0.17s, RTF: 0.01                                         infer_tool.py:213
[10:51:51] INFO     [10:51:51] Chunk: Chunk(Speech: False, 17640.0)                                    infer_tool.py:259
           INFO     [10:51:51] Chunk: Chunk(Speech: True, 1481760.0)                                   infer_tool.py:259
           WARNING  [10:51:51] Speaker None is not found. Use speaker 0 instead.                       infer_tool.py:190
[10:51:54] INFO     [10:51:54] F0 inference time:       2.670s, RTF: 0.077                                  utils.py:265
[10:51:55] INFO     [10:51:55] HuBERT inference time  : 0.029s, RTF: 0.001                                  utils.py:363
           INFO     [10:51:55] Inferece time: 0.20s, RTF: 0.01                                         infer_tool.py:213
[10:51:56] INFO     [10:51:56] Chunk: Chunk(Speech: False, 36157.0)                                    infer_tool.py:259

After that, I tried using the GUI again and it works:

(venv) C:\Users\LXC PC\Desktop\testvits\venv\Scripts>svcg
[10:52:36] INFO     [10:52:36] Version: 1.3.0                                                             __main__.py:47
[10:53:38] INFO     [10:53:38] Event model_path, values {'model_path': 'C:/Users/LXC                          gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path': '',
                    'config_path_browse': '', 'cluster_model_path': '', 'cluster_model_path_browse': '',
                    'speaker': '', 'silence_threshold': -35.0, 'transpose': 12.0, 'auto_predict_f0': False,
                    'f0_method': 'dio', 'cluster_infer_ratio': 0.0, 'noise_scale': 0.4, 'pad_seconds': 0.1,
                    'chunk_seconds': 0.5, 'absolute_thresh': True, 'input_path': '', 'input_path_browse': '',
                    'auto_play': False, 'crossfade_seconds': 0.05, 'block_seconds': 0.35,
                    'additional_infer_before_seconds': 0.15, 'additional_infer_after_seconds': 0.1,
                    'realtime_algorithm': '1 (Divide constantly)', 'input_device': 'Microsoft Sound Mapper -
                    Input', 'output_device': 'Primary Sound Driver', 'passthrough_original': False,
                    'presets': 'Default VC (GPU, GTX 1060)', 'preset_name': '', 'use_gpu': True}
           INFO     [10:53:38] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000001CA9AC29760> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [10:53:38] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000001CAD9082430> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [10:53:38] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000001CAD90825B0> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [10:53:38] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000001CAD9089160> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
[10:53:40] INFO     [10:53:40] Event config_path, values {'model_path': 'C:/Users/LXC                         gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/config.json',
                    'config_path_browse': 'C:/Users/LXC
                    PC/Desktop/testvits/venv/Scripts/testhapi/config.json', 'cluster_model_path': '',
                    'cluster_model_path_browse': '', 'speaker': '', 'silence_threshold': -35.0, 'transpose':
                    12.0, 'auto_predict_f0': False, 'f0_method': 'dio', 'cluster_infer_ratio': 0.0,
                    'noise_scale': 0.4, 'pad_seconds': 0.1, 'chunk_seconds': 0.5, 'absolute_thresh': True,
                    'input_path': '', 'input_path_browse': '', 'auto_play': False, 'crossfade_seconds': 0.05,
                    'block_seconds': 0.35, 'additional_infer_before_seconds': 0.15,
                    'additional_infer_after_seconds': 0.1, 'realtime_algorithm': '1 (Divide constantly)',
                    'input_device': 'Microsoft Sound Mapper - Input', 'output_device': 'Primary Sound
                    Driver', 'passthrough_original': False, 'presets': 'Default VC (GPU, GTX 1060)',
                    'preset_name': '', 'use_gpu': True}
           INFO     [10:53:40] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000001CA9AC29760> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [10:53:40] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000001CAD9082430> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [10:53:40] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000001CAD90825B0> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
           INFO     [10:53:40] Updating browser <PySimpleGUI.PySimpleGUI.Button object at 0x000001CAD9089160> gui.py:474
                    to C:\Users\LXC PC\Desktop\testvits\venv\Scripts\testhapi
[10:53:54] INFO     [10:53:54] Event infer, values {'model_path': 'C:/Users/LXC                               gui.py:467
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'model_path_browse':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth', 'config_path':
                    'C:/Users/LXC PC/Desktop/testvits/venv/Scripts/testhapi/config.json',
                    'config_path_browse': 'C:/Users/LXC
                    PC/Desktop/testvits/venv/Scripts/testhapi/config.json', 'cluster_model_path': '',
                    'cluster_model_path_browse': '', 'speaker': 'hapiraw', 'silence_threshold': -35.0,
                    'transpose': 0.0, 'auto_predict_f0': False, 'f0_method': 'dio', 'cluster_infer_ratio':
                    0.0, 'noise_scale': 0.4, 'pad_seconds': 0.1, 'chunk_seconds': 0.5, 'absolute_thresh':
                    True, 'input_path': 'C:/Users/LXC PC/Desktop/sovitsinout/gunjou/gunjouraw.wav',
                    'input_path_browse': 'C:/Users/LXC PC/Desktop/sovitsinout/gunjou/gunjouraw.wav',
                    'auto_play': False, 'crossfade_seconds': 0.05, 'block_seconds': 0.35,
                    'additional_infer_before_seconds': 0.15, 'additional_infer_after_seconds': 0.1,
                    'realtime_algorithm': '1 (Divide constantly)', 'input_device': 'Microsoft Sound Mapper -
                    Input', 'output_device': 'Primary Sound Driver', 'passthrough_original': False,
                    'presets': 'Default VC (GPU, GTX 1060)', 'preset_name': '', 'use_gpu': True}
[10:53:55] INFO     [10:53:55] current directory is C:\Users\LXC                               hubert_pretraining.py:116
                    PC\Desktop\testvits\venv\Scripts
           INFO     [10:53:55] HubertPretrainingTask Config {'_name': 'hubert_pretraining',    hubert_pretraining.py:117
                    'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir':
                    'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False,
                    'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000,
                    'min_sample_size': 32000, 'single_target': False, 'random_crop': True,
                    'pad_audio': False}
           INFO     [10:53:55] HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0,                 hubert.py:250
                    'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768,
                    'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu,
                    'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1,
                    'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1,
                    'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True,
                    'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 +
                    [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False,
                    'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection':
                    static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1,
                    'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static,
                    'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space':
                    1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995],
                    'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False,
                    'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '',
                    'pos_enc_type': 'abs', 'fp16': False}
[10:53:58] INFO     [10:53:58] Loaded checkpoint 'C:/Users/LXC                                              utils.py:416
                    PC/Desktop/testvits/venv/Scripts/testhapi/G_21600.pth' (iteration 1801)
[10:54:03] INFO     [10:54:03] Chunk: Chunk(Speech: False, 44100.0)                                    infer_tool.py:259
           INFO     [10:54:03] Chunk: Chunk(Speech: True, 282240.0)                                    infer_tool.py:259
[10:54:07] INFO     [10:54:07] F0 inference time:       4.016s, RTF: 0.608                                  utils.py:265
           INFO     [10:54:07] HuBERT inference time  : 0.023s, RTF: 0.003                                  utils.py:363
           INFO     [10:54:07] Inferece time: 0.48s, RTF: 0.07                                         infer_tool.py:213
[10:54:08] INFO     [10:54:08] Chunk: Chunk(Speech: False, 8820.0)                                     infer_tool.py:259
           INFO     [10:54:08] Chunk: Chunk(Speech: True, 811440.0)                                    infer_tool.py:259
[10:54:09] INFO     [10:54:09] F0 inference time:       1.487s, RTF: 0.080                                  utils.py:265
           INFO     [10:54:09] HuBERT inference time  : 0.030s, RTF: 0.002                                  utils.py:363
[10:54:10] INFO     [10:54:10] Inferece time: 0.12s, RTF: 0.01                                         infer_tool.py:213
           INFO     [10:54:10] Chunk: Chunk(Speech: False, 8820.0)                                     infer_tool.py:259
           INFO     [10:54:10] Chunk: Chunk(Speech: True, 564480.0)                                    infer_tool.py:259
[10:54:11] INFO     [10:54:11] F0 inference time:       1.015s, RTF: 0.078                                  utils.py:265
           INFO     [10:54:11] HuBERT inference time  : 0.025s, RTF: 0.002                                  utils.py:363
           INFO     [10:54:11] Inferece time: 0.13s, RTF: 0.01                                         infer_tool.py:213
[10:54:12] INFO     [10:54:12] Chunk: Chunk(Speech: False, 26460.0)                                    infer_tool.py:259
           INFO     [10:54:12] Chunk: Chunk(Speech: True, 1146600.0)                                   infer_tool.py:259
[10:54:14] INFO     [10:54:14] F0 inference time:       1.986s, RTF: 0.076                                  utils.py:265
           INFO     [10:54:14] HuBERT inference time  : 0.025s, RTF: 0.001                                  utils.py:363
[10:54:15] INFO     [10:54:15] Inferece time: 0.14s, RTF: 0.01                                         infer_tool.py:213
           INFO     [10:54:15] Chunk: Chunk(Speech: False, 149940.0)                                   infer_tool.py:259
           INFO     [10:54:15] Chunk: Chunk(Speech: True, 1164240.0)                                   infer_tool.py:259
[10:54:17] INFO     [10:54:17] F0 inference time:       2.027s, RTF: 0.076                                  utils.py:265
[10:54:18] INFO     [10:54:18] HuBERT inference time  : 0.031s, RTF: 0.001                                  utils.py:363
           INFO     [10:54:18] Inferece time: 0.15s, RTF: 0.01                                         infer_tool.py:213
[10:54:19] INFO     [10:54:19] Chunk: Chunk(Speech: False, 44100.0)                                    infer_tool.py:259
           INFO     [10:54:19] Chunk: Chunk(Speech: True, 2310840.0)                                   infer_tool.py:259
[10:54:23] INFO     [10:54:23] F0 inference time:       4.109s, RTF: 0.078                                  utils.py:265
[10:54:24] INFO     [10:54:24] HuBERT inference time  : 0.034s, RTF: 0.001                                  utils.py:363
           INFO     [10:54:24] Inferece time: 0.24s, RTF: 0.00                                         infer_tool.py:213
[10:54:26] INFO     [10:54:26] Chunk: Chunk(Speech: False, 26460.0)                                    infer_tool.py:259
           INFO     [10:54:26] Chunk: Chunk(Speech: True, 255780.0)                                    infer_tool.py:259
           INFO     [10:54:26] F0 inference time:       0.478s, RTF: 0.080                                  utils.py:265
[10:54:27] INFO     [10:54:27] HuBERT inference time  : 0.028s, RTF: 0.005                                  utils.py:363
           INFO     [10:54:27] Inferece time: 0.14s, RTF: 0.02                                         infer_tool.py:213
           INFO     [10:54:27] Chunk: Chunk(Speech: False, 8820.0)                                     infer_tool.py:259
           INFO     [10:54:27] Chunk: Chunk(Speech: True, 837900.0)                                    infer_tool.py:259
[10:54:28] INFO     [10:54:28] F0 inference time:       1.389s, RTF: 0.072                                  utils.py:265
[10:54:29] INFO     [10:54:29] HuBERT inference time  : 0.028s, RTF: 0.001                                  utils.py:363
           INFO     [10:54:29] Inferece time: 0.14s, RTF: 0.01                                         infer_tool.py:213
           INFO     [10:54:29] Chunk: Chunk(Speech: False, 97020.0)                                    infer_tool.py:259
           INFO     [10:54:29] Chunk: Chunk(Speech: True, 1199520.0)                                   infer_tool.py:259
[10:54:31] INFO     [10:54:31] F0 inference time:       2.141s, RTF: 0.078                                  utils.py:265
[10:54:32] INFO     [10:54:32] HuBERT inference time  : 0.026s, RTF: 0.001                                  utils.py:363
           INFO     [10:54:32] Inferece time: 0.16s, RTF: 0.01                                         infer_tool.py:213
[10:54:33] INFO     [10:54:33] Chunk: Chunk(Speech: False, 8820.0)                                     infer_tool.py:259
           INFO     [10:54:33] Chunk: Chunk(Speech: True, 1490580.0)                                   infer_tool.py:259
[10:54:35] INFO     [10:54:35] F0 inference time:       2.478s, RTF: 0.073                                  utils.py:265
[10:54:36] INFO     [10:54:36] HuBERT inference time  : 0.030s, RTF: 0.001                                  utils.py:363
           INFO     [10:54:36] Inferece time: 0.16s, RTF: 0.00                                         infer_tool.py:213
[10:54:37] INFO     [10:54:37] Chunk: Chunk(Speech: False, 36157.0)                                    infer_tool.py:259

I believe it failed to work initially because it did not auto download the Hubert model when using the GUI for inferencing. When the Hubert model is present, it resolved itself.

@34j
Copy link
Collaborator

34j commented Mar 24, 2023

It is designed to work exactly the same way with the GUI, but unfortunately it doesn't seem to work in some environments for some reason.
We need to change the design to download the model before launching the GUI, but I guess we can't even display a different GUI in that case. tqdm.tk should be the alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants