When is an IceGatherer allowed to prune host, reflexive and relay candidates? #174

Closed
robin-raymond opened this Issue Feb 6, 2015 · 7 comments

Projects

None yet

3 participants

@robin-raymond
Contributor

Issue:
An IceGatherer is responsible for gathering potential host, reflexive and relay candidates but it's unclear when host, reflexive or relay candidates can/should be pruned because outstanding "offers" to connect might exist at any point in time.

Scenario A:

  1. Alice sends out her IceGatherer candidates where 1 to N remote sessions might reply back (i.e. forking)
  2. Alice begins connectivity checks with the first remote session where local candidates are eliminated when checks to remote candidates fail (or are deemed of less priority).

Scenario B:

  1. Alice had a single IceGatherer with multiple IceTransports connected with several remote IceTransports (i.e. forking)
  2. Another fork appears (or an existing fork goes down and needs a restart), the IceGatherer need to warm up again and gather a new set of host, reflexive and relay candidates to exchange with the new fork for an unknown period of time.

Concerns:
a) Alice cannot prune candidates from the IceGatherer based upon the elimination of candidates from the first session because a second session might be successful with those same local candidates;
b) Alice cannot even reasonably be sure new remote candidates might arrive a bit later on an existing IceTransport where those candidates (host, reflexive or relay candidates) might be required for connectivity against these new remote candidates (i.e. "end of remote candidates (for now)" is unclear);
c) Keeping all host, reflexive and relay candidates "hot" continuously is not desirable because it causes firewall pinholes to remain needlessly active, TURN resources to be consumed, and expensive interfaces like 3G/4G drain mobile batteries for the sake of STUN/TURN maintenance packets;
d) Creating a new IceGatherer per forked IceTransport is needlessly wasteful when a single IceGatherer would be sufficient;

Possible solutions:
(a) IceGatherer is told to "gather()" with a prune timeout, and candidates remain warm until the timeout has completed (and can be optionally extended by gather() again with another future timeout before the last timeout completes).
(b) IceGatherer keeps all host, reflexive and relay candidates "hot" until a "IceGatherer.prune()" method is called where it's now allowed to prune out any non-warm candidates (i.e. candidates that have not sent or received checks / data within a standard ICE consent timeout).
(c) IceGatherer keeps all host, reflexive and relay candidates "hot" so long as a single incomplete IceTransport is attached to the IceGatherer, i.e. keeping the IceGatherer warm could be done by keeping an unused IceTransport around until it's disposed.

I personally prefer solution (a) or (b). The good thing about (a) is that the IceGatherer will prune at some point if nothing is done and can also be extended and the remote side can be guarantee a timeframe for which received candidates are valid. But (a) also has a bad point of not being able to prune at absolute time points (although gather() with immediate timeout would accomplish this). The API surface would require IceGatherer.gather() to accept an optional timeout parameter.

Solution (b) allows for absolute control of when IceGatherer candidates can be pruned but pruning requires active intervention or candidates will remain needlessly "hot". The API surface would need an additional IceGatherer.prune() method.

Solution (c) is beneficial that the API surface is not expanded. The drawback is that it feels like a bit of a hack to keep an IceTransport alive for the sake of keeping candidates "hot" [although that is debatable]. The larger drawback is that it's unclear when an IceTransport is truly complete unless we add a "this is the final remote candidate expected [for now]" notification to the IceTransport so it can know that it's now safe to prune candidates. Another draw back would be that an IceTransport object might not be immediately disposed because of garbage collection nor would necessarily an IceTransport.stop() be called if a IceTransport.start() was never called in the first place.

[will be cross posted to ORTC CG email list so please reply there]

@robin-raymond robin-raymond added the 1.1 label Feb 6, 2015
@aboba
Contributor
aboba commented Feb 14, 2015

Proposed resolution:

In Section 5.3, add the timeout attribute to the constructor and interface as follows:

[Constructor(RTCIceGatherOptions options, optional unsigned long timeout)]
partial interface RTCIceGatherer {
readonly attribute unsigned long timeout;
};

timeout of type unsigned long, readonly
The time in ms after construction of the RTCIceGatherer object which the implementation must wait prior to expiring local candidates. After the timeout expires, the implementation may remove local candidates but is not required to do so. If not set in the constructor, the default timeout value is 300000 (5 minutes).

@robin-raymond robin-raymond pushed a commit that referenced this issue Mar 25, 2015
Robin Raymond - Added Section 8.3.1 on RTP matching rules, as noted in: Issue #48
- Revised the text relating to RTCDtlsTransport.start(), as noted in: Issue #168
- Clarified pruning of local candidates within the RTCIceGatherer, as noted in: Issue #174
- Clarified handling of incoming connectivity checks by the RTCIceGatherer, as noted in: Issue #170
- Added Section 9.3.2.1, defining DTMF capabilities and settings, as noted in: Issue #177
- Clarified pre-requisites for insertDTMF(), based on: Issue #178
- Added Section 8.3.2 and updated Section 9.5.1 to clarify aspects of RTCP sending and receiving, based on: Issue #180
b03c129
@pthatcherg

Two options:

Option A
IceGatherer.setKeepAlive(bool alive) // false = prune, defaults to false

Option B
// Doesn't do anything except keep the IceGatherer alive/warm
[Constructor(), Constructor(IceGatherer)]
IceTransport

I like Option A, but I could live with Option B.

@robin-raymond
Contributor

I don't think a reference counting solution (like B) will work properly but there is another option that could work. I know that a non-started Transport could hold a reference to all candidates so effectively Option B can behave like option A but the principle of reference counting on started transports seems problematic.

The local IceTransport may receive trickled remote host candidates that might all fail, thus the local IceTransport might assume there's no reason to keep warm any reflexive/relay candidates (since all known pairing fail). The local IceGatherer could therefor incorrectly assume that no reflexive/relay candidates are needed. But later some remote reflexive / relay candidates can trickle in causing valid pairings with local reflexive/relay candidates. Unfortunately the gather might have pruned those already due to not being referenced. The local IceTransport has no way to know if more remote candidates are still pending to keep local gatherer reflexive/relay candidates valid. The only way it could work is that the candidates are kept "warm" while an RTCIceTransport is not in the completed state and not related to the reference counting in the candidate pairing.

So option A works but the downside to A is that reflexive/relay candidates will be kept "warm" until the programmer does something.

Option B can only work if we base the candidates being kept alive on a transport being attached which is not in the "completed" state rather than doing candidate pair reference counting. We currently have no way to associate a non-started transport (i.e. absent remote parameters) with a gather so we would need to add that to the transport constructor.

Option C - have a timeout mechanism where the gatherer is kept warm for a period of time and unused candidates can be pruned after that timeout. Thus a programmer would not need to do anything and a reasonable (configurable) timeout will prune out non used candidates. The option to renew could be calling a "gather()" method again to extend the timeout longer or prune earlier (e.g. immediately).

@robin-raymond
Contributor

Now in further thought, I like option B best if we:

  1. Add an IceGatherer to the constructor
  2. Keep candidates "alive" until the attached IceTransport(s) go into the "completed" state (or "disconnected" after receiving the final candidate)

I think that would make the behaviour clear and there would be a clear point in which the candidates are to be pruned. This does not solve the "respond to incoming connectivity checks early" problem #170 .

@aboba
Contributor
aboba commented Apr 27, 2015

If the IceGatherer is added to the constructor, is this optional (as Peter suggested) or required? If required (which would seem to be needed to track all iceTransport(s) associated with an IceGatherer), do we need iceTransport.createAssociatedTransport(RTCIceGatherer gatherer)?

@robin-raymond
Contributor

I think optional is fine with me. The reason to associate is to allow a transport to be in a non-completed state to ensure candidates are kept warm, which is an optional thing people will need to do....

@aboba
Contributor
aboba commented Apr 27, 2015

Proposal:
In Section 3, add the following WebIDL:
[Constructor(optional RTCIceGatherer gatherer)]

In Section 3.3.1 modify the text to:
The iceGatherer attribute is set to the value of gatherer if passed in the constructor or in the latest call to start().

Add to Section 5.1:
The RTCIceGatherer does not prune local candidates until at least one RTCIceTransport object has become associated and all associated RTCIceTransport objects are in the "completed" or "disconnected" state.

@robin-raymond robin-raymond pushed a commit that referenced this issue May 7, 2015
Robin Raymond - sender.setTrack() updated to return a Promise, as noted in:
#148

- Clarified handling of incoming connectivity checks prior to calling iceTransport.start(), as noted in:
#170

- Clarified handling of incoming DTLS packets, as noted in:
#173

- Added RTCIceGatherer as an optional argument to the RTCIceTransport constructor, as noted in:
#174

- Clarified handling of contradictory RTP/RTCP multiplexing settings, as noted in:
#185

- Clarified error handling relating to RTCIceTransport, RTCDtlsTransport and RTCIceGatherer objects in the "closed" state, as noted in:
#186

- Added component method and createAssociatedGatherer() method to the RTCIceGatherer object, as noted in:
#188

- Added close() method to the RTCIceGatherer object as noted in:
#189

- Clarified behavior of TCP candidate types, as noted in:
#190

- Clarified behavior of iceGatherer.onlocalcandidate, as noted in:
#191

- Updated terminology in Section 1.1 as noted in:
#193

- Updated RTCDtlsTransportState definitions, as noted in:
#194

- Updated RTCIceTransportState definitions, as noted in:
#197
c006197
@aboba aboba closed this May 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment