Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VAD capabilities #19

Merged
merged 3 commits into from May 11, 2020
Merged

Add VAD capabilities #19

merged 3 commits into from May 11, 2020

Conversation

lawl
Copy link
Contributor

@lawl lawl commented May 5, 2020

rnnoise can return the probability of a chunk being voice when processed. We can use this probability to define a threshold under which we return silence, giving us VAD (Voice Activity Detection) capabilities. This is an intended usecase of rnnoise.

I've been using this patch for a while for my VOIPing and it works great. @werman asked me to open a PR, which I've indeed been holding off for too long.

The issue here is that currently the threshold for which to activate the VAD is hardcoded. From my (limited) testing this cannot be shipped with a hard-coded value. The worse the microphone, the less certain rnnoise is that a sample is speech, and thus the threshold needs to be lowered.

As per commit message in b5285c9

Currently this threshold is hardcoded in a #define. The different plugins (LADSPA, VST, LV2) will need to be expanded with the respective configuration APIs and then this value can be passed to the CommonPlugin.

However, I'm currently not interested in implementing this, so I'm opening this as a draft-PR, to see if anyone is interested in making this configurable and shippable, or discuss if there's another way to ship this.

Side note, there's another constant VAD_GRACE_PERIOD that I think is fine to just hard-code and doesn't need to be configurable.

rnnoise can return the probability of a chunk being voice when
processed. We can use this probability to define a threshold under which
we return silence, giving us VAD (Voice Activity Detection)
capabilities. This is an intended usecase of rnnoise.

Currently this threshold is hardcoded in a #define. The different
plugins (LADSPA, VST, LV2) will need to be expanded with the respective
configuration APIs and then this value can be passed to the
CommonPlugin.
This adds a grace period to the voice activity detection. After a frame
was identified as speech, we allow N samples after it to pass through
the VAD clamp, irregardless of rnnoise' classification.

Without this we can have cut outs in the middle of words which, while
intelligible, make the voice quality sound bad.
@werman
Copy link
Owner

werman commented May 5, 2020

Thanks, and what are the practical implications of this change? Instead of highly muffled noises - there will be just silence because it doesn't pass threshold?

@lawl
Copy link
Contributor Author

lawl commented May 5, 2020

Yeah, so my usecase for this is using it for things like teamspeak/mumble/discord where I usually use voice activation. But that usually just measures signal noise ratio (if even that). Pushing the voice activity detection into this plugin means that things like sneezes, coughs or other noise can often be completely filtered out when without this patch the reduced noise would sometimes still be enough to trigger the VAD in mumble etc.

So if there is only noise and no speech: this patch can often filter it out completely, because if rnnoise thinks it's not speech, we just throw the microohone input away and send silence instead. When you speak, it will work as it always did.

Happy to answer any questions if there's more.

@werman
Copy link
Owner

werman commented May 10, 2020

Could you allow me to modify the branch? I added threshold support to LADSPA plugin.

@lawl
Copy link
Contributor Author

lawl commented May 10, 2020

Sure, should be done.

~~ Edit: Huh, seems like the checkbox is actually bugged.

 Uncaught QueryError: Element not found: <HTMLElement> .js-status-indicator
    at b (https://github.githubassets.com/assets/vendor-df6ed70b.js:1:236)
    at HTMLInputElement.<anonymous> (https://github.githubassets.com/assets/github-bootstrap-6c3ff0a4.js:16:185508)
    at HTMLDocument.pt (https://github.githubassets.com/assets/vendor-df6ed70b.js:120:7531)

Setting doesn't get saved.

Seems like need to wait a while for github to push a fix for this, tried in two different browsers, actually can't set the checkbox, always gives a JS error and doesn't save. ~~

Edit 2: OK, worked around it. Actually set now.

@werman
Copy link
Owner

werman commented May 10, 2020

Edit 2: OK, worked around it. Actually set now.

I love modern websites...

I pushed the commit. I also slightly changed naming style.

Could you take a look whether anything stands out?

@lawl lawl marked this pull request as ready for review May 10, 2020 19:01
@lawl
Copy link
Contributor Author

lawl commented May 10, 2020

Haven't tested it yet, but looks good to me!

@werman
Copy link
Owner

werman commented May 10, 2020

Haven't tested it yet, but looks good to me!

Thanks, then I'll merge it tomorrow.

Update README with the new parameter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants