A single RtpReciever can be configured to receive a simulcast stream (i.e. receives from potentially multiple stream sources but one one is received and rendered at a time from the RtpReceiver which outputs a single Media Stream Track).
When an RtpReceiver is configured for simulcasting certain configurations can lead to a rapid switch from one RtpSender's stream to another and then back which can be confusing to a receiver on how to demux the RTP packets properly.
The difference from scenario A and B are difficult to determine from an engine perspective. Are the packets arriving on the original SSRC 5 because of a switch back to 5 or because they were late to arrive? Also, if SSRC 5 had some late packets arriving and the switch to SSRC 6 occurred then the remaining SSRC 5 packets would get clipped (potentially cutting the end of an audio stream slightly short).
(a) Have a set of timing rules that can be applied to determine scenario A versus B to resolve the ambiguity.
(b) Render in two (or more) hidden RtpReceivers with individual tracks being output from each where the simulcast RtpReceiver is the rendered output of the combined audio for audio and the active video for video.
(c) Do not allow simulcasting and require separate RtpReceivers where the Media Stream Tracks indicate their activity (active/inactive) state allowing switching from an application between the streams (as well as sending all audio to render so that it doesn't matter which stream is output).
(d) Do a simple method of "last packet wins" and watch the jitter happen :)
Personally, I think (c) must always be an option to the programmer if they prefer to switch to active Media Stream Tracks manually in the application layer thus I don't see dropping support for simulcast in the RtpReceiver as advantageous, which eliminates (c) as the "solution" for me. I assume that Media Stream Track already has the ability to fire an "active / inactive" state to know when a stream is actively receiving or has become inactive [sending party has stopped transmitting](but I've not verified this is actually true or not that such an event exists).
I don't like (d) because I think the user experience will be bad.
I think (b) is a lot of additional work for a marginal improvement in rendering where (c) could already be used if an application programmer cared about those marginal improvements.
That leaves (a) for me. The question is what set of rules / timings would need to be used?
[cross posting to ortc list so please reply on the list]
For WebRTC 1.0, simulcast functionality has been restricted to the situation where an RtpSender sends multiple streams, but the RtpReceiver only receives a single stream. The assumption is that the SFU receives multiple simulcast streams from participants and selects between the simulcast streams, providing the receiver with a single stream that switches resolutions and/or frame rates based on SFU switching decisions. To date, my understanding is that SFUs supporting the interoperable WebRTC video codecs (e.g. VP8, VP9, H.264/AVC) all support this model.
Non-support for simulcast reception and SVC codecs with MRST
Fix for Issue #175
@robin-raymond In practice, support for RFC 6051 is often used to align the timestamps of the received streams. This allows the receiver to distinguish Scenarios A and B. The other (codec-specific) way that the streams can be aligned is through the Decoding Order Number (DON) field in H.264/SVC.
Without RFC 6051 or DON, heuristics can be used, but this will be prone to error. For example, the implementation could switch from SSRC 5 to SSRC 6 and assume that the last received SSRC 5 timestamp represents the switching point. However, if delayed packets subsequently arrive on SSRC 5 they might need to be thrown away. Whereas with RFC 6051, the implementation might realize that there is a gap between the last received packets in SSRC 5 and the newly receive packets in SSRC 6, and wait for that gap to be filled in by late arriving SSRC 5 packets.
ORTC Lib solved this internally and locks onto the latest active stream and obsolete packets do not causes switching.
ORTC Lib does a mixture of a) and b).
TODO: add a spec note to the spec to describe how to be a good implementation and handle this... 6051 is great if possible but a) and b) can be done in that absence.
Edge supports the H.264/SVC DON field to help line up the received multiple streams without support for RFC 6051 rapid sync, but that is a codec-specific approach (e.g. no equivalent of DON in VP8/VP9).
Simulcasting to RtpReceiver and switched stream rapid switches and cl…
Fix for Issue #175
Fixed with note that RFC6051 was not used in any implementation and added note how ORTC Lib handles this particular case as an example.