Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper unstability #45

Closed
chrisk414 opened this issue Jul 29, 2023 · 2 comments
Closed

Whisper unstability #45

chrisk414 opened this issue Jul 29, 2023 · 2 comments

Comments

@chrisk414
Copy link

chrisk414 commented Jul 29, 2023

Hi, STT (Whisper) is the biggest use-case for me. I think it's probably the most important feature for now until I can use it reliably.
Hopefully, it's the same for everyone as it's the starting point for using TTSVoiceWizard.

Anyway, there is what I find using the latest v.1.5.0 from the github main.

In the Log View, I see the new "Whisper Debug: ..." output. When STT mode is on, it will always shows randomly shows one of the followings. I think it's clear what it means.
(A) "Listening" (listening and there is no sound input)
(B) "Listening, Voice" (listening and sound input is detected)
(C) "Listening, Transcribing" (processing recorded voice)

But the problem is that they do no accurately represent what's really happening, and the behaviors are bit random.

Here are my observations. (I always launch it from VS Debug but I think the behaviors are the same from .exe)

  1. When STT is first activated, it will always start at (B) although there is no voice input. And it will stuck at (B) until speak several times. (yes, I waited sufficient time until ggml model loads) And when it unfreezes from (B), it will output several strings bunched up that I spoke into.
  2. After the initial hiccups, it will become more responsive. However, sometimes (A), (B), (C) will cycle through on it's own without any sound inputs.
  3. I then, wait until it stabilizes to (A), and then start speaking again. It will sometimes go to (B) immediately, and sometimes it doesn't. And it will start cycle through (A), (B), (C) on it's own again.

Here is another observation/question.
I see the following logs in VS Console Output.
image
It seems to recreate the same threads infinitely.
Can you please tell me what these threads are for?
Perhaps the unstability is related to these thread constantly being recreated?

Many thanks.

@VRCWizard
Copy link
Owner

VRCWizard commented Aug 1, 2023

  1. The log will now let you know when Whisper is starting up and when it is actually ready for audio. Although audio recording during startup will still be processed after fully started as you observed. https://github.com/VRCWizard/TTS-Voice-Wizard/releases/tag/v1.5.1

  2. The states (A,B,C) should definitely not be random. I mentioned to you in discord that you can turn on Filtered Text Appears in Log to see when Whisper heard sounds that were not voices. Although for the state any sound picked up shows as "Listening, Voice". When "Listening, Transcribing" appears if you have Filtered Text Appears in Log enabled something should always appear in log.

  3. ^^^

  4. Threads exiting
    What does it mean:
    https://stackoverflow.com/a/12410591
    How to remove the spam:
    https://stackoverflow.com/a/19199801

@chrisk414
Copy link
Author

Thanks for the info.
I'll let you know once I know if I can find ways to improve.

BTW, regarding #4, I understand it's thread exiting. What is the thread about? It doesn't tell me anything about the thread itself. If you can point out the thread on the source, I'll take a look to understand it better.
Perhaps, I was thinking... it might be better if the thread is to stay on the loop instead of exiting if it were to recreate itself infinitely??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants