Voice activity detection using a threshold slider #562

fkwp · 2022-08-31T14:31:09Z

Your use case

#492

Have you considered any alternatives?

No response

Additional context

No response

fkwp · 2022-08-31T14:33:17Z

@hugohutri can you pls update the issue and give some more context

DashieTM · 2022-08-31T16:55:36Z

Hugo is enjoying some vacation right now, but since we did this together I can of course fill in.

The idea is to have the same feature as mumble, discord and teamspeak provide.
They only send the microphone stream to other users if the mic is above a threshold. This threshold is usually editable in some settings page with a slider indicating whether or not you are currently above this threshold.

What does this solve?
It's an extremely fast solution to background noise as you only send data when you talk. No more keyboard clacking, birds, or other noises unless you talk.
Aka it's a poor mans noise suppression that is cheap to include and is effective for everything.
In apps such as teams and currently jitsi, it's extremely annoying to hear people talking that aren't in the call.

You can right now do this on your own system with things like EasyEffects, but this would guarantee that every user has this.

Some info on the current PR.
We tried to use the existing Volume Looper as it seemed to be intuitive.
The only problem is, the activation point is always just a tad too slow, we are talking a few ms here.
The solution I have thought of is using the createDelay() function, but I have not been able to find a place to hook that node up to what we send to other users.

In short, the idea is, create a small delay (ex. 5ms) for every stream we send to other users in order to analyze and manipulate this stream. This delay could then also be used in the future to do more advanced processing of audio.
In the end the short delay should not matter for other users, as they will have a delay either way -> ping. What matters is that all streams, voice and video arrive at the same time.

The actual implementation is done on a clone of said audiostream. Important is, this stream should not have the delay.
Then when we check if the user is loud enough (on the cloned stream), we can enable the tracks for the real stream. This ensures we are enabling the microphone before the start of a sentence.

links to pr:
#492
matrix-org/matrix-js-sdk#2556

fkwp added T-Enhancement New features, changes in functionality, performance boosts, user-facing improvements X-Needs-Product More input needed from the Product team labels Aug 31, 2022

fkwp mentioned this issue Aug 31, 2022

Feat: Voice activity threshold slider #492

Closed

4 tasks

robintown added O-Occasional Affects or can be seen by some users regularly or most users rarely A-Speech-Enhancement Techniques to enhance the intelligibility of speech in calls labels Aug 31, 2022

SimonBrandner changed the title ~~Voice activity auto unmute using a threshold slider~~ Voice activity detection using a threshold slider Aug 31, 2022

robintown moved this to Triaged in Element Call issue triage Oct 31, 2022

robintown added this to Element Call issue triage Oct 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice activity detection using a threshold slider #562

Voice activity detection using a threshold slider #562

fkwp commented Aug 31, 2022

fkwp commented Aug 31, 2022

DashieTM commented Aug 31, 2022 •

edited

Loading

Voice activity detection using a threshold slider #562

Voice activity detection using a threshold slider #562

Comments

fkwp commented Aug 31, 2022

Your use case

Have you considered any alternatives?

Additional context

fkwp commented Aug 31, 2022

DashieTM commented Aug 31, 2022 • edited Loading

DashieTM commented Aug 31, 2022 •

edited

Loading