Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Willows in earshot of one another should not all wake and stream on wake word #74

Closed
kristiankielhofner opened this issue May 20, 2023 · 0 comments · Fixed by #264
Closed

Comments

@kristiankielhofner
Copy link
Contributor

kristiankielhofner commented May 20, 2023

Willow far-field audio and wake word detection is good enough that in many cases, when installed in typical environments, multiple Willows will wake with a given wake word. Nothing terrible happens, it's just annoying to get multiple confirmation beeps and beat up your Willow Command Endpoint (such as Home Assistant).

Generally there are two main ways to go about addressing this:

  1. Multicast. When devices wake they send a packet to a multicast group the other Willows have joined. It kind of turns into a race but without getting into that now a Willow would win and the other Willows would silently back off.

Pros:

  • All local. Works with any speech recognition mode (server, local).
  • We should be able to read the amplitude of the audio input signal, include that in the message, and ensure that the Willow "closest" to the speech wins the election.

Cons:

  • I've had really spotty experiences with multicast on wifi and I'm a little scared of it in the real world with so many diverse environments. It's almost guaranteed to be problematic for some users.
  • Quite a bit of work.
  • A lot of extra CPU on device.
  1. Let the Willow Inference Server figure it out. All devices wake and start early streaming. I have a branch (referenced below) that includes the ability to generate and use an anonymous random identifier to group Willows within the same group/installation/proximity/etc. This identifier is provided to the Willow Inference Server. In this case the Willow Inference Server would essentially handle the election, drop all but the preferred source within the group, and the dropped Willows would silently deactivate just like the multicast case above.

Pros:

  • Doesn't depend on anything special regarding Wifi. Universally compatible.
  • We can still read and include the amplitude of the audio input signal to pick the closest Willow.
  • Using the Willow Inference Server's processing abilities opens up all kinds of other interesting possibilities like potentially being able to do speaker identification to allow multiple simultaneous activations within a group if the speakers (people) are different.

Cons:

  • Could be considered intrusive by those using the Tovera community hosted WIS. I insist again we don't log anything and we'll be documenting this formally soon. Of course this doesn't apply when you are hosting your own WIS (after we release next week you absolutely should)!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant