Skip to content

ai/live: Terminate stream on ICE disconnect. #3629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

j0sh
Copy link
Collaborator

@j0sh j0sh commented Jun 18, 2025

While the disconnected state is not necessarily terminal - media may start flowing again before the ICE timeout [1] - this happens rarely enough [2] so let's just kill the peerconnection to avoid other timeouts later on in the process, eg segment copy.

[1] Easy way to test: plug in an Ethernet cable, disable WiFi,
unplug the cable, wait a little more than 5 seconds, re-plug.
Things should be fine in this case.

[2] Three occurrences of the state sequence "disconnected -> connected"
in prod and 1 on staging, both in the past 30 days.

While the `disconnected` state is not necessarily terminal -
media may start flowing again before the ICE timeout [1] - this
happens rarely enough [2] so let's just kill the peerconnection
to avoid other timeouts later on in the process, eg segment copy.

[1] Easy way to test: plug in an Ethernet cable, disable WiFi,
    unplug the cable, wait a little more than 5 seconds, re-plug.
    Things should be fine in this case.

[2] Three occurrences of the state sequence "disconnected  -> connected"
    in prod and 1 on staging, both in the past 30 days.
@j0sh j0sh requested review from victorges, leszko and mjh1 June 18, 2025 00:08
@github-actions github-actions bot added the go Pull requests that update Go code label Jun 18, 2025
Copy link
Contributor

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One question, so after this PR is merged, when the WHIP is disconnected, we'll kick the ingest. So what will happen now? The frontend broadcast compoenent will retry it, right?

@j0sh
Copy link
Collaborator Author

j0sh commented Jun 18, 2025

So what will happen now? The frontend broadcast compoenent will retry it, right?

If the app is still there, yes it should - which happens rarely. (See the dashboards that I shared in Discord; typically less than 1% of user connections exhibit this behavior) More often, the app just goes away which is why we just terminate the stream now.

Testing with the Daydream app does show that it reconnects on a disconnect so that seems okay.

On a mechanical level: the app enters the disconnected state because there is no connectivity between the app and the server. While the server will tear down the connection with this PR, the DTLS close_notify usually won't make it to the client because there is no connectivity. My tiny client app times out into a failed state a few seconds after disconnect.

I am not sure exactly what are the conditions that make Daydream retry, but that seems to be OK.

@leszko
Copy link
Contributor

leszko commented Jun 23, 2025

So what will happen now? The frontend broadcast compoenent will retry it, right?

If the app is still there, yes it should - which happens rarely. (See the dashboards that I shared in Discord; typically less than 1% of user connections exhibit this behavior) More often, the app just goes away which is why we just terminate the stream now.

Testing with the Daydream app does show that it reconnects on a disconnect so that seems okay.

On a mechanical level: the app enters the disconnected state because there is no connectivity between the app and the server. While the server will tear down the connection with this PR, the DTLS close_notify usually won't make it to the client because there is no connectivity. My tiny client app times out into a failed state a few seconds after disconnect.

I am not sure exactly what are the conditions that make Daydream retry, but that seems to be OK.

Ok, LGTM

@j0sh
Copy link
Collaborator Author

j0sh commented Jun 24, 2025

I have been re-reviewing the data - the rate of non-terminal disconnects is a bit higher when excluding e2e tests so I'm now leaning more towards #3642

@leszko leszko mentioned this pull request Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants