Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Participant disconnect event is fired too late #99

Closed
piyushtank opened this issue Mar 30, 2017 · 30 comments
Closed

Participant disconnect event is fired too late #99

piyushtank opened this issue Mar 30, 2017 · 30 comments
Assignees
Labels

Comments

@piyushtank
Copy link
Contributor

If A and B are connected to a Room, when A crashes or loses network connectivity, user B's device does not receive participant:didDisconnect event immediately. Our infrastructure takes up to 120 seconds to dispatch the event to B.

I'll update this ticket with the proposal we land on and with an ETA for a fix.

Related: twilio/video-quickstart-android#80

@dirtydanee
Copy link

Hey guys, i am wondering, when could we expect a fix on this bug?

@moksamedia
Copy link

Any update on this?

@rfbrazier
Copy link
Contributor

Hello @moksamedia, we do plan to address this--but don't have a timeline on a release to share at this time.

Essentially, we need to make some improvements to our signaling transport to improve disconnect detection. That change is fairly large, but we'll begin to tackle it in January.

In the meantime, what cases are you having trouble handling? There may be workarounds we can recommend.

@moksamedia
Copy link

Thanks for the replies. The biggest problem is detecting unexpected or abrupt endings of calls. For example if a user shuts off their phone or goes out of service. The best solution I can think of is a web-socket based server ping setup. Thoughts?

@piyushtank
Copy link
Contributor Author

piyushtank commented Dec 26, 2017

Yes, the workaround of web-socket based server ping would do the trick I think. Once it is detected that a participant is abruptly terminated, you can notify this to Twilio Infra using REST API. Upon receiving the REST request, Twilio Infra will notify the other participants in the Room.

Let me know if you have any questions.

@mikesholiu
Copy link

Hi -- any updates on the fix?

@piyushtank
Copy link
Contributor Author

@mikesholiu We are planning to make changes in our signaling layer in later this quarter and in the next quarter. We are planning to move away from the SIP and implement better signaling for mobile SDKs. With the new signaling implementation, the participant disconnect event will not take up to 120 seconds. With new implementation, Participant disconnected may get invoked within 30 seconds but please note that we are still discussing on how sooner the event should get fired because the tradeoff is heartbeat between the SDK and the infrastructure.

I will update the ticket when I have more information.

@embirico
Copy link

embirico commented Jul 9, 2018

Thanks @piyushtank. Just to clarify: does this is issue only affect disconnects "when A crashes or loses network connectivity", or does it also affect disconnects when A quits the app or locks the phone?

Also, sounds like this ticket is describing a delay for participant B on mobile receiving the event. Do you know if there is a similar delay for room status HTTP callbacks?

@piyushtank
Copy link
Contributor Author

@embirico This issue affects when -

  1. "A crashes or kills the app."
  2. "A loses the network connectivity"
  3. "A's app goes to background suspended." - this generally happens if A is neither playing nor recording any audio and moves the app to background. iOS suspends the app in the background if an app is not playing or recording any audio.

The root cause of the problem is if our infrastructure loses the connection with a participant, it takes up to 120 seconds (session timer expiry) to detect the failure. So, you should observe the similar delay of room status HTTP callbacks as well.

Let me know if you have any questions.

@koedal
Copy link

koedal commented Sep 27, 2018

Any updates on this? This is causing issues for us too as 120 seconds is entirely too long to know you aren't talking to anyone.

@ceaglest
Copy link
Contributor

ceaglest commented Oct 19, 2018

Hi @koedal,

Any updates on this? This is causing issues for us too as 120 seconds is entirely too long to know you aren't talking to anyone.

I completely agree with you and we are working on a solution to this problem in the form of a new signaling transport. This is being actively developed by our C++, JavaScript and Java (backend) engineers this quarter. I can't give a precise time estimate, but I do promise that we will continue to update this ticket once the new signaling transport is available to try.

Regards,
Chris

@malavagile
Copy link

Hello @piyushtank & @ceaglest ,

Is there any update on this issue?

@ceaglest
Copy link
Contributor

ceaglest commented Feb 15, 2019

Hi @malavagile,

Is there any update on this issue?

Yes, our JavaScript team is finishing up 2.0.0-beta6 which will allow you to select the new signaling transport in ConnectOptions. As for mobile SDKs, we are continuing with implementation and testing of our new C++ signaling Client.

Our backend engineers have also deployed the new signaling gateway in a few regions, and will continue to harden it as we expand globally. You should still expect that JavaScript will be the first SDK where you can try the new transport, followed by mobile platforms.

Regards,
Chris

@zbagley
Copy link

zbagley commented Feb 21, 2019

@ceaglest
Appreciate the update on this issue. We're considering implementing the workaround mentioned above in order to better manage the upload of a pixel buffer.

We've noticed a lot of improvements with 2.5.6 / 2.6.0 as far as ability to reconnect goes, but we're still seeing that when using the ARKit code that the video feed will often not recover.

We're using Group rooms on the latest release, this swap will often drop the video feed and never recover when swapping from WiFi <-> LTE. We've considered integrating custom network detector strategy to tie into knowing when to pause and restart the pixel buffer as needed.

The current plan as a fix is to perform basic speed test/ping tests to our private server. Having a hook to tie into in that uses network diagnostics from the same data center as the hosted Group room server to better manage our sink and notify users of connection stats would be very helpful.

@ceaglest
Copy link
Contributor

ceaglest commented Feb 21, 2019

Hi @zbagley,

We're using Group rooms on the latest release, this swap will often drop the video feed and never recover when swapping from WiFi <-> LTE. We've considered integrating custom network detector strategy to tie into knowing when to pause and restart the pixel buffer as needed.

I think this is a separate issue than delayed Participant disconnected events, but I would be happy to investigate further if you have some Room SIDs to share. Would you mind filing a separate issue for this?

The current plan as a fix is to perform basic speed test/ping tests to our private server. Having a hook to tie into in that uses network diagnostics from the same data center as the hosted Group room server to better manage our sink and notify users of connection stats would be very helpful.

You can use Room.getStats() to find out round trip time after you connect to a Group Room. We are also working on bringing the network quality API to mobile SDKs. As for pre-flight checks, this is also something that is on our roadmap for 2019, but we haven't set a firm date yet.

Best,
Chris

@zbagley
Copy link

zbagley commented Feb 21, 2019

I'll see if we can get our QA to reproduce on the example apps to provide some room sid (is that really all you'll need?) asap and open an issue accordingly! Thanks again.

@ceaglest
Copy link
Contributor

ceaglest commented Jun 6, 2019

Hi @zbagley,

You are correct about the network handover issue in Group Rooms. We are close to deploying a fix and are tracking it in #388.

Thanks,
Chris

@ceaglest
Copy link
Contributor

ceaglest commented Jun 7, 2019

Hello Video Developers,

After some runtime with JavaScript 2.0.0-betas, we are now ready with 3.0.0-beta1 on iOS. The 3.0 release provides several relevant features:

  • Global Low Latency (GLL) signaling Servers.
  • IPv4 and IPv6 network support.
  • Region selection APIs.
  • Faster detection and recovery from connection failures.
  • Improved Swift APIs for a more idiomatic Swift development experience.

The 3.0.0-beta1 release is timed with a rollout of our Signaling Servers to all regions. This means that you can now connect a Participant from any region where you could place a Media Server before.

The Client and Server use heartbeats to detect broken signaling connections within 15 seconds or less. If a failure is detected, the Participant then has up to 30 seconds to reconnect to the Room before being permanently disconnected. We will consider tuning the heartbeats, and room session intervals based upon your feedback during the beta period.

Thank you for your patience. We know that detection and management of signaling connections has not been our strongest suit, but we look at 3.0 as an opportunity to increase reliability in this area. We have updated both our Objective-C and Swift sample code, please do give them a try.

Best,
Chris

@ceaglest
Copy link
Contributor

ceaglest commented Sep 7, 2019

Hi Developers,

We have recently released 3.0.0-beta4, and plan to GA 3.0 by the end of September.

The team has been refining the heartbeat and session timeout logic for Participant signaling connections and we are pretty happy with the result. In beta4, the Room's media monitoring logic has also been modified to disconnect in scenarios where recovery is unlikely.

Looking out beyond GA, we would like to introduce a reconnecting state for the RemoteParticipant, and provide events regarding media interruptions for Tracks. If you have any feedback about the changes in the beta, especially as they relate to Participant disconnect events please let us know.

Best,
Chris

@skwny
Copy link

skwny commented Sep 26, 2019

Just came upon this issue after experience the same, where participantDisconnected does not fire until about 40 or so seconds in. Is this currently the desired behavior based on the recent improvements?

My use case is to remove a participants visual representation (i.e. video track) from a room when this event emits, but this currently takes too long. The video track freezes and is not a good experience. Is there any other way to detect an interruption, or something? I noticed a warning in my console after about 12 seconds:

2019-09-26 06:59:09.426Z | WARN in [PeerConnectionV2 #3: d5f03cc4-d5e9-4c96-8d8d-889ebfa2b74b]: ICE Connection Monitor detected inactivity; attempting to restart ICE

It would be great if I could receive near instant feedback on participant service interruption, so as to display a placeholder in the UI accordingly, until permanent disconnection.

@zbagley
Copy link

zbagley commented Sep 26, 2019

@skwny

One method we tested that worked well was to simply use a Data Track handshake. Every second send an "Okay" message. If after 3 seconds no "Okay" messages are received you can assume the user disconnected. If "Okay" resumes, reconnect user.

Their new API more or less has this feature implemented already, so I'd suggest just trying out their beta to see if it solves the issue first!

Good luck!

@skwny
Copy link

skwny commented Sep 26, 2019

@zbagley that's a clever idea. Although, I'm not sure it will be feasible for me due to expected usage and cost. I will have rooms of 8 (and more in other cases) open for potentially 4-6 hours. This would be over 115,000 data track messages sent per room, and could amount to high costs rather quickly since several of these rooms will be open at a time.

You mentioned the API has this feature, could you tell me what it is specifically that you are referring to? If I could do this technique for free I'd love to!

@andschdk
Copy link

This is not part of the "Known issues" section for 3.x releases/changelog. I think it should still be there. However I experience some improvement compared to v. 2.x.

@ceaglest
Copy link
Contributor

ceaglest commented May 2, 2020

Hi Developers,

Thanks for the feedback on this issue. We know there are still improvements to be made, but we feel like 3.x is a significant improvement over the behavior of 2.x when it comes to Participant events.

This is not part of the "Known issues" section for 3.x releases/changelog. I think it should still be there. However I experience some improvement compared to v. 2.x.

You should know within 15 seconds if your LocalParticipant is live or "reconnecting" to a Room. For RemoteParticipants there is a grace period of up to 45 seconds to detect failed connections and attempt to recover them before they are disconnected from the Room.

We still plan to surface reconnecting events for RemoteParticipants in the mobile SDKs, on par with what is possible in our JavaScript SDK. This will allow you to know if a RemoteParticipant is reconnecting within 15 seconds or less.

My use case is to remove a participants visual representation (i.e. video track) from a room when this event emits, but this currently takes too long. The video track freezes and is not a good experience.

We are also planning on adding Track interruption events for each of our SDKs. These will provide a fine grained representation of which audio/video/data tracks are usable at any given time.

I'm closing this issue since the original problem of significantly delayed Participant disconnected events has been solved.

Thanks,
Chris

@ceaglest ceaglest closed this as completed May 2, 2020
@ThoseGuysInTown
Copy link

For anyone still referencing this, this is a work around I used to make disconnecting instant.

disclaimer
this only covers the case of the the user closing the app
DOES NOT cover:

  • app crashing
  • losing network connectivity

Add to your AppDelegate.swift

func applicationWillTerminate(_ application: UIApplication) {
    print("Will Terminate")

    VideoCallModel.currentRoom?.disconnect()
    sleep(3) // ensures the disconnect has time to finish before app is completely terminated
		
}

@ceaglest
Copy link
Contributor

Hey @ThoseGuysInTown,

I really like your suggestion! App termination is an area where we should provide better guidance and sample code. I will discuss your feedback with the team.

Thanks,
Chris

@andschdk
Copy link

Hi @ceaglest

We still plan to surface reconnecting events for RemoteParticipants in the mobile SDKs, on par with what is possible in our JavaScript SDK. This will allow you to know if a RemoteParticipant is reconnecting within 15 seconds or less.

Is this still in the pipeline?

I guess the isConnected: Bool on RemoteParticipant is not reflecting any reconnecting state...

@paynerc
Copy link
Contributor

paynerc commented Nov 28, 2020

Is this still in the pipeline?

I guess the isConnected: Bool on RemoteParticipant is not reflecting any reconnecting state...

The Remote Participant Reconnecting functionality will be available in the next 4.0.0 Beta release.

Ryan

@SmitSonani
Copy link

Hey @ThoseGuysInTown,

I really like your suggestion! App termination is an area where we should provide better guidance and sample code. I will discuss your feedback with the team.

Thanks,
Chris

Hi @ceaglest, can you point me to the updated documentation for handling disconnections while app is terminating?
Making main thread sleep for some more seconds to do cleanup work seem kinda patch. I was wondering if we could have better alternative.

@ceaglest
Copy link
Contributor

ceaglest commented May 6, 2021

Hi @SmitSonani,

There isn't any updated docs unfortunately. I filed an internal escalation so that we could revisit this and provide better guidance. I suggest calling Room.disconnect() and waiting for the delegate callback while running the main thread's runloop so that your app does not terminate. Sleeping and blocking for a fixed amount of seconds is not a great idea as disconnect takes a variable amount of time.

Best,
Chris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests