-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VAD capabilities #19
Conversation
rnnoise can return the probability of a chunk being voice when processed. We can use this probability to define a threshold under which we return silence, giving us VAD (Voice Activity Detection) capabilities. This is an intended usecase of rnnoise. Currently this threshold is hardcoded in a #define. The different plugins (LADSPA, VST, LV2) will need to be expanded with the respective configuration APIs and then this value can be passed to the CommonPlugin.
This adds a grace period to the voice activity detection. After a frame was identified as speech, we allow N samples after it to pass through the VAD clamp, irregardless of rnnoise' classification. Without this we can have cut outs in the middle of words which, while intelligible, make the voice quality sound bad.
Thanks, and what are the practical implications of this change? Instead of highly muffled noises - there will be just silence because it doesn't pass threshold? |
Yeah, so my usecase for this is using it for things like teamspeak/mumble/discord where I usually use voice activation. But that usually just measures signal noise ratio (if even that). Pushing the voice activity detection into this plugin means that things like sneezes, coughs or other noise can often be completely filtered out when without this patch the reduced noise would sometimes still be enough to trigger the VAD in mumble etc. So if there is only noise and no speech: this patch can often filter it out completely, because if rnnoise thinks it's not speech, we just throw the microohone input away and send silence instead. When you speak, it will work as it always did. Happy to answer any questions if there's more. |
Could you allow me to modify the branch? I added threshold support to LADSPA plugin. |
Sure, should be done. ~~ Edit: Huh, seems like the checkbox is actually bugged.
Setting doesn't get saved. Seems like need to wait a while for github to push a fix for this, tried in two different browsers, actually can't set the checkbox, always gives a JS error and doesn't save. ~~ Edit 2: OK, worked around it. Actually set now. |
I love modern websites... I pushed the commit. I also slightly changed naming style. Could you take a look whether anything stands out? |
Haven't tested it yet, but looks good to me! |
Thanks, then I'll merge it tomorrow. |
Update README with the new parameter.
rnnoise can return the probability of a chunk being voice when processed. We can use this probability to define a threshold under which we return silence, giving us VAD (Voice Activity Detection) capabilities. This is an intended usecase of rnnoise.
I've been using this patch for a while for my VOIPing and it works great. @werman asked me to open a PR, which I've indeed been holding off for too long.
The issue here is that currently the threshold for which to activate the VAD is hardcoded. From my (limited) testing this cannot be shipped with a hard-coded value. The worse the microphone, the less certain rnnoise is that a sample is speech, and thus the threshold needs to be lowered.
As per commit message in b5285c9
However, I'm currently not interested in implementing this, so I'm opening this as a draft-PR, to see if anyone is interested in making this configurable and shippable, or discuss if there's another way to ship this.
Side note, there's another constant
VAD_GRACE_PERIOD
that I think is fine to just hard-code and doesn't need to be configurable.