Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voice activity detection using a threshold slider #562

Open
fkwp opened this issue Aug 31, 2022 · 2 comments
Open

Voice activity detection using a threshold slider #562

fkwp opened this issue Aug 31, 2022 · 2 comments
Labels
A-Speech-Enhancement Techniques to enhance the intelligibility of speech in calls O-Occasional Affects or can be seen by some users regularly or most users rarely T-Enhancement New features, changes in functionality, performance boosts, user-facing improvements X-Needs-Product More input needed from the Product team

Comments

@fkwp
Copy link
Contributor

fkwp commented Aug 31, 2022

Your use case

#492

Have you considered any alternatives?

No response

Additional context

No response

@fkwp fkwp added T-Enhancement New features, changes in functionality, performance boosts, user-facing improvements X-Needs-Product More input needed from the Product team labels Aug 31, 2022
@fkwp
Copy link
Contributor Author

fkwp commented Aug 31, 2022

@hugohutri can you pls update the issue and give some more context

@robintown robintown added O-Occasional Affects or can be seen by some users regularly or most users rarely A-Speech-Enhancement Techniques to enhance the intelligibility of speech in calls labels Aug 31, 2022
@SimonBrandner SimonBrandner changed the title Voice activity auto unmute using a threshold slider Voice activity detection using a threshold slider Aug 31, 2022
@DashieTM
Copy link

DashieTM commented Aug 31, 2022

Hugo is enjoying some vacation right now, but since we did this together I can of course fill in.

The idea is to have the same feature as mumble, discord and teamspeak provide.
They only send the microphone stream to other users if the mic is above a threshold. This threshold is usually editable in some settings page with a slider indicating whether or not you are currently above this threshold.

What does this solve?
It's an extremely fast solution to background noise as you only send data when you talk. No more keyboard clacking, birds, or other noises unless you talk.
Aka it's a poor mans noise suppression that is cheap to include and is effective for everything.
In apps such as teams and currently jitsi, it's extremely annoying to hear people talking that aren't in the call.

You can right now do this on your own system with things like EasyEffects, but this would guarantee that every user has this.

Some info on the current PR.
We tried to use the existing Volume Looper as it seemed to be intuitive.
The only problem is, the activation point is always just a tad too slow, we are talking a few ms here.
The solution I have thought of is using the createDelay() function, but I have not been able to find a place to hook that node up to what we send to other users.

In short, the idea is, create a small delay (ex. 5ms) for every stream we send to other users in order to analyze and manipulate this stream. This delay could then also be used in the future to do more advanced processing of audio.
In the end the short delay should not matter for other users, as they will have a delay either way -> ping. What matters is that all streams, voice and video arrive at the same time.

The actual implementation is done on a clone of said audiostream. Important is, this stream should not have the delay.
Then when we check if the user is loud enough (on the cloned stream), we can enable the tracks for the real stream. This ensures we are enabling the microphone before the start of a sentence.

links to pr:
#492
matrix-org/matrix-js-sdk#2556

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Speech-Enhancement Techniques to enhance the intelligibility of speech in calls O-Occasional Affects or can be seen by some users regularly or most users rarely T-Enhancement New features, changes in functionality, performance boosts, user-facing improvements X-Needs-Product More input needed from the Product team
Projects
No open projects
Development

No branches or pull requests

3 participants