Add VAD capabilities #19

lawl · 2020-05-05T17:51:38Z

rnnoise can return the probability of a chunk being voice when processed. We can use this probability to define a threshold under which we return silence, giving us VAD (Voice Activity Detection) capabilities. This is an intended usecase of rnnoise.

I've been using this patch for a while for my VOIPing and it works great. @werman asked me to open a PR, which I've indeed been holding off for too long.

The issue here is that currently the threshold for which to activate the VAD is hardcoded. From my (limited) testing this cannot be shipped with a hard-coded value. The worse the microphone, the less certain rnnoise is that a sample is speech, and thus the threshold needs to be lowered.

As per commit message in b5285c9

Currently this threshold is hardcoded in a #define. The different plugins (LADSPA, VST, LV2) will need to be expanded with the respective configuration APIs and then this value can be passed to the CommonPlugin.

However, I'm currently not interested in implementing this, so I'm opening this as a draft-PR, to see if anyone is interested in making this configurable and shippable, or discuss if there's another way to ship this.

Side note, there's another constant VAD_GRACE_PERIOD that I think is fine to just hard-code and doesn't need to be configurable.

rnnoise can return the probability of a chunk being voice when processed. We can use this probability to define a threshold under which we return silence, giving us VAD (Voice Activity Detection) capabilities. This is an intended usecase of rnnoise. Currently this threshold is hardcoded in a #define. The different plugins (LADSPA, VST, LV2) will need to be expanded with the respective configuration APIs and then this value can be passed to the CommonPlugin.

This adds a grace period to the voice activity detection. After a frame was identified as speech, we allow N samples after it to pass through the VAD clamp, irregardless of rnnoise' classification. Without this we can have cut outs in the middle of words which, while intelligible, make the voice quality sound bad.

werman · 2020-05-05T19:45:39Z

Thanks, and what are the practical implications of this change? Instead of highly muffled noises - there will be just silence because it doesn't pass threshold?

lawl · 2020-05-05T20:29:33Z

Yeah, so my usecase for this is using it for things like teamspeak/mumble/discord where I usually use voice activation. But that usually just measures signal noise ratio (if even that). Pushing the voice activity detection into this plugin means that things like sneezes, coughs or other noise can often be completely filtered out when without this patch the reduced noise would sometimes still be enough to trigger the VAD in mumble etc.

So if there is only noise and no speech: this patch can often filter it out completely, because if rnnoise thinks it's not speech, we just throw the microohone input away and send silence instead. When you speak, it will work as it always did.

Happy to answer any questions if there's more.

werman · 2020-05-10T17:48:57Z

Could you allow me to modify the branch? I added threshold support to LADSPA plugin.

lawl · 2020-05-10T18:33:32Z

Sure, should be done.

~~ Edit: Huh, seems like the checkbox is actually bugged.

 Uncaught QueryError: Element not found: <HTMLElement> .js-status-indicator
    at b (https://github.githubassets.com/assets/vendor-df6ed70b.js:1:236)
    at HTMLInputElement.<anonymous> (https://github.githubassets.com/assets/github-bootstrap-6c3ff0a4.js:16:185508)
    at HTMLDocument.pt (https://github.githubassets.com/assets/vendor-df6ed70b.js:120:7531)

Setting doesn't get saved.

Seems like need to wait a while for github to push a fix for this, tried in two different browsers, actually can't set the checkbox, always gives a JS error and doesn't save. ~~

Edit 2: OK, worked around it. Actually set now.

werman · 2020-05-10T18:53:51Z

Edit 2: OK, worked around it. Actually set now.

I love modern websites...

I pushed the commit. I also slightly changed naming style.

Could you take a look whether anything stands out?

lawl · 2020-05-10T19:01:59Z

Haven't tested it yet, but looks good to me!

werman · 2020-05-10T19:04:41Z

Haven't tested it yet, but looks good to me!

Thanks, then I'll merge it tomorrow.

src/common/src/RnNoiseCommonPlugin.cpp

Update README with the new parameter.

lawl added 2 commits March 24, 2020 08:43

lawl marked this pull request as ready for review May 10, 2020 19:01

lawl commented May 10, 2020

View reviewed changes

src/common/src/RnNoiseCommonPlugin.cpp Outdated Show resolved Hide resolved

Add VAD threshold parameter to LADSPA

48952cc

Update README with the new parameter.

werman force-pushed the feature-vad branch from 1b10bbe to 48952cc Compare May 11, 2020 15:30

werman merged commit bfbf41d into werman:master May 11, 2020

siikamiika mentioned this pull request Jul 18, 2020

Configurable VAD grace period #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VAD capabilities #19

Add VAD capabilities #19

lawl commented May 5, 2020

werman commented May 5, 2020

lawl commented May 5, 2020 •

edited

werman commented May 10, 2020

lawl commented May 10, 2020 •

edited

werman commented May 10, 2020

lawl commented May 10, 2020

werman commented May 10, 2020

Add VAD capabilities #19

Add VAD capabilities #19

Conversation

lawl commented May 5, 2020

werman commented May 5, 2020

lawl commented May 5, 2020 • edited

werman commented May 10, 2020

lawl commented May 10, 2020 • edited

werman commented May 10, 2020

lawl commented May 10, 2020

werman commented May 10, 2020

lawl commented May 5, 2020 •

edited

lawl commented May 10, 2020 •

edited