Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: API to explicit MID ordering #44

Closed
ovelius opened this issue May 15, 2020 · 20 comments
Closed

Feature request: API to explicit MID ordering #44

ovelius opened this issue May 15, 2020 · 20 comments
Assignees

Comments

@ovelius
Copy link

ovelius commented May 15, 2020

With Unified Plan Semantics, offer and answer needs to match in the following ways:

  1. Order
  2. Type
  3. MID itself.

This can create some tricky problems in SFU scenarios. Basically the challenge is that the SFU describing all the streams that can be received requires the SFU to know about the ordering of the MIDs - something that could otherwise be an implementation detail in the client.

Basically forcing the SFU to understand the SDP - something that isn't always desired.

The transceivers() API does expose the MID, but does not explicitly say what the index of the MID is in the SDP. Perhaps a simple extension there can be made?

In addition, there isn't a way to get the index of m=application MID.

@alvestrand
Copy link
Collaborator

This seems easy to solve by adding a mLineIndex attribute on the transceiver, matching the mid/mLineIndex duality of candidates.
Having an mLineIndex on the SctpTransport is architecturally ugly, but seems hard to avoid if we want the list to be complete.

@alvestrand
Copy link
Collaborator

Another option is to parse the SDP (grep for "m=" and "a=mid", and generate a table from the result).
Less invasive of the API.

@jan-ivar
Copy link
Member

jan-ivar commented Jun 4, 2020

From editor's meeting: Doesn't this use case rely on generating SDP? If so it doesn't feel like a hardship to expect the same code to be able to read SDP.

It also feels a bit late to be adding new features to this spec, given how late in the process it is.

@ovelius
Copy link
Author

ovelius commented Jun 4, 2020

Is there an a SFU scenario that does not rely on generating an SDP somewhere I wonder?

Indeed in this case it does - but using the creator inside WebRTC itself, so the SDP generator part is the actual WebRTC source code here - not the application or some other library.

Thus it becomes quite a step to make the application understand SDP anatomy for this bit of information - which seems like it could be trivially available in some API.

Another option is to simply tell this very same code that generates the SDP to parse the full local SDP to get the m sections. Heavier hammer for this simple problem.

@jan-ivar
Copy link
Member

jan-ivar commented Jun 8, 2020

This spec does not cover all aspects of WebRTC, just the client API (it does not cover reception of simulcast for instance). It's not clear to me how an SFU would have access to transceivers, thus it seems out of scope.

@jan-ivar jan-ivar transferred this issue from w3c/webrtc-pc Jul 9, 2020
@henbos
Copy link
Collaborator

henbos commented Jul 10, 2020

Basically forcing the SFU to understand the SDP - something that isn't always desired.

In WebRTC, SDP is the protocol used to establish the session, so what is the issue with WebRTC endpoints having to understand SDP?

@ovelius
Copy link
Author

ovelius commented Jul 20, 2020

FWIW we've built and successfully launched the SFU product with Unified Plan now, and this remains the only hack where we had to add code to manually parse the SDP in the application because we can't get the MID ordering figured out otherwise.

Let me try and clarify this more, and maybe we can make progress in the right direction for the future regardless of what that means :)

The SFU isn't a traditional WebRTC endpoint, it is much more lightweight, very few fields from the SDP are actually used in practice in such scenarios.
Let's reason about this from a simple user story: Someone joins a video conference, providing one more remote video stream for you to receive.

Your SFU now gives you an updated list of n ssrcs representing video streams (or a delta with the new video stream if you have that in your protocol - but the scenario is you now have one additional video stream)

Now you need to configure you WebRTC endpoint to receive the new video stream. If your solution to that is to have the SFU send an updated SDP over the network you will be very inefficient - but also then your SFU needs to know about the existing MIDs and ordering of to construct a valid SDP for you - which will give you a lot of extra latency.

So the real problem is, someone joins the conference and provides one additional video stream to the conference - no other changes - how do you configure the client to start receiving that stream?

My naive proposal to address this is the subject line of this issue - give the client only the extra ssrc and have it update transceivers and build a remote SDP to match the local SDP - not doing this dance of the network with the SFU as those round trips would be too slow.

@ovelius
Copy link
Author

ovelius commented Jul 20, 2020

This spec does not cover all aspects of WebRTC, just the client API (it does not cover reception of simulcast for instance). It's not clear to me how an SFU would have access to transceivers, thus it seems out of scope.

I don't understand this transfer, wasn't this the right component for the PC Web API?
I tried to clarify above that this has nothing todo with SFU understanding transceivers, quite the opposite in fact :)

@henbos
Copy link
Collaborator

henbos commented Jul 20, 2020

Forgive me if I'm still confused, I'm trying to reply based on my understanding.

Even if the SFU uses a custom protocol, the the client (browser endpoint) needs to perform setRemoteDescription, which means either the SFU or the client is constructing SDP. The entity that is constructing the SDP needs to know the index. And if it is powerful enough to generate SDP, it needs to know the m= line index in order to do its job anyway.

  • Option A: If the SFU is constructing the SDP, it seems the m= line index is known by the SFU, and client and SFU can correlate transceivers by mid. So this would be a non issue, but also a case you want to avoid due to latency if I understand correctly?
  • Option B: If the client is constructing the SDP, it seems the m= line index is already known by the client, or else it couldn't have constructed the SDP in the first place.

So the real problem is, someone joins the conference and provides one additional video stream to the conference - no other changes - how do you configure the client to start receiving that stream?

You need valid remote SDP, and it sounds to me like the client is generating it. Because SDP needs to be consistent between offer/answer rounds, you would need a "virtual remote endpoint" that is generating SDP consistently between O/As. You could generate SDP to add an additional m= line for the new stream, or it could be more intelligent and reuse old m= lines that have become inactive, I think it is up to this "virtual endpoint" that understands and generates SDP. So I don't quite understand why on top of this, you would then also need to parse the SDP after setting it?

@henbos
Copy link
Collaborator

henbos commented Jul 20, 2020

Once an m= line is associated with a transceiver, the order in createAnswer() will match the offer SDP.

@ovelius
Copy link
Author

ovelius commented Jul 20, 2020

Forgive me if I'm still confused, I'm trying to reply based on my understanding.

Even if the SFU uses a custom protocol, the the client (browser endpoint) needs to perform setRemoteDescription, which means either the SFU or the client is constructing SDP. The entity that is constructing the SDP needs to know the index. And if it is powerful enough to generate SDP, it needs to know the m= line index in order to do its job anyway.

  • Option A: If the SFU is constructing the SDP, it seems the m= line index is known by the SFU, and client and SFU can correlate transceivers by mid. So this would be a non issue, but also a case you want to avoid due to latency if I understand correctly?
  • Option B: If the client is constructing the SDP, it seems the m= line index is already known by the client, or else it couldn't have constructed the SDP in the first place.

Yes exactly this, Option B :) You'll find all kinds of issues with Option A - that single ssrc you could send instead becomes a full SDP for instance.

So the real problem is, someone joins the conference and provides one additional video stream to the conference - no other changes - how do you configure the client to start receiving that stream?

You need valid remote SDP, and it sounds to me like the client is generating it. Because SDP needs to be consistent between offer/answer rounds, you would need a "virtual remote endpoint" that is generating SDP consistently between O/As. You could generate SDP to add an additional m= line for the new stream, or it could be more intelligent and reuse old m= lines that have become inactive, I think it is up to this "virtual endpoint" that understands and generates SDP. So I don't quite understand why on top of this, you would then also need to parse the SDP after setting it?

Here exactly is the problem when we think about Option B. There is no clean way of knowing how the virtual remote endpoint should build the remote SDP in regards to the order of the MIDs without parsing the local description of itself. Which is an OK solution, but pretty heavy weight for such a simple thing that could just be in some API that lists transceivers or something.

@henbos
Copy link
Collaborator

henbos commented Jul 21, 2020

OK thanks for clarifying.

Here exactly is the problem when we think about Option B. There is no clean way of knowing how the virtual remote endpoint should build the remote SDP in regards to the order of the MIDs without parsing the local description of itself. Which is an OK solution, but pretty heavy weight for such a simple thing that could just be in some API that lists transceivers or something.

Unless I am missing something, the remote endpoint is required to remember mappings between mid and index from previous offers, but that seems like a small feat for a virtual client that proclaims to speak SDP.

So assuming the virtual endpoint has a small amount of memory - here's my confusion: If the virtual endpoint is the one generating the SDP, then it already knows the m= lines, and if you want to see which transceiver got mapped to which m= line, you can correlate them by the mid. The transceiver knows the mid; the virtual endpoint knows the index. Voíla, it can generate the SDP correctly the next time as well, as long as it remembers which indices it used in the previous offers (which, if it understands SDP, it is required to do, otherwise it is not consistent with its own offers). Is there an issue here that I am missing?

Predicting which indices will be used for transceivers that have not been mapped yet is a different story. I'll go over the two scenarios I see for this as well for the sake of completeness:

  • Option A: The client PC is the one generating the offer. In this case, the RTCPeerConnection takes care of mapping them and making sure the offer is valid, so so-far we don't need the index. And the virtual endpoint, which does need to know indices for future O/As, will be able to tell which m= lines are new and on offer and which ones it has seen before. This might be a case where you have to parse the SDP, but if you are talking to a virtual endpoint that proclaims to understand SDP, this seems like a small thing to ask.
  • Option B: The remote virtual endpoint is the one generating the SDP. Again in this scenario, because the remote endpoint is the one generating the SDP, it is the one "offering to receive", and it is up to it whether it wants to reuse existing m= lines (which is pre-existing knowledge that it has) or it wants to add a new m= line for receiving (in which case it would add it at the end; also a known position).

Based on this, are you able to pinpoint which step would be eased by having a mid ordering API?

@ovelius
Copy link
Author

ovelius commented Jul 22, 2020

OK thanks for clarifying.

Here exactly is the problem when we think about Option B. There is no clean way of knowing how the virtual remote endpoint should build the remote SDP in regards to the order of the MIDs without parsing the local description of itself. Which is an OK solution, but pretty heavy weight for such a simple thing that could just be in some API that lists transceivers or something.

Unless I am missing something, the remote endpoint is required to remember mappings between mid and index from previous offers, but that seems like a small feat for a virtual client that proclaims to speak SDP.

So assuming the virtual endpoint has a small amount of memory - here's my confusion: If the virtual endpoint is the one generating the SDP, then it already knows the m= lines, and if you want to see which transceiver got mapped to which m= line, you can correlate them by the mid. The transceiver knows the mid; the virtual endpoint knows the index. Voíla, it can generate the SDP correctly the next time as well, as long as it remembers which indices it used in the previous offers (which, if it understands SDP, it is required to do, otherwise it is not consistent with its own offers). Is there an issue here that I am missing?

It does in fact not have a small amount of memory - the reason here being that the virtual endpoint is in a library shared between web/mobile/native and is stateless in nature, stateless libraries are the best especially if they cross compiled to different clients :) The parts that parses the SDP is also separated from the part that generates the SDP (it is simply a different method).

I could still be stateless, but simply consume the full local SDP currently in the PC when it needs to generate the remote SDP.
This is a fair option, but you'll then see how annoyingly close the transceiver listing API is to giving you exact information you want - only the index is missing - and thus I filed this issue to discuss that - and also to explore what the other options are.

Mind you this simple approach has been working very well with Plan B - so there is a more complexity in the UP world here (with Plan B it is enough to use PRANSWER for instance).
I'm worried that our product call setup - which is incredibly latency sensitive - will show regressions when moving to UP if there is a lot of "extra dancing" to get the state correct which we didn't have in Plan B.

If there are indeed regressions, it will be tough discussion to move to UP for the existing clients.

Predicting which indices will be used for transceivers that have not been mapped yet is a different story. I'll go over the two scenarios I see for this as well for the sake of completeness:

  • Option A: The client PC is the one generating the offer. In this case, the RTCPeerConnection takes care of mapping them and making sure the offer is valid, so so-far we don't need the index. And the virtual endpoint, which does need to know indices for future O/As, will be able to tell which m= lines are new and on offer and which ones it has seen before. This might be a case where you have to parse the SDP, but if you are talking to a virtual endpoint that proclaims to understand SDP, this seems like a small thing to ask.
  • Option B: The remote virtual endpoint is the one generating the SDP. Again in this scenario, because the remote endpoint is the one generating the SDP, it is the one "offering to receive", and it is up to it whether it wants to reuse existing m= lines (which is pre-existing knowledge that it has) or it wants to add a new m= line for receiving (in which case it would add it at the end; also a known position).

We've solved this problem by doing an extra setLocalDescription at setup, this locks indices and mids in place in a nice predictable way.

IMHO the ordering of mids requirements is a silly one - mid:s are already required to be 1:1 mapped local/remote so the ordering constraint seems odd - it is both an ordered list and a map at the same time? I understand there is a historical reason for it though.

Based on this, are you able to pinpoint which step would be eased by having a mid ordering API?

FWIW I'm very curious if there are more SFU examples out there that use UP and that does not rely on munging - and how they approach this problem.

@henbos
Copy link
Collaborator

henbos commented Jul 22, 2020

Thank you for the productive discussion. I think making the library stateless is an interesting point I had not thought about.

Here's my position:

  • The "normal call setup" (not your use case): If SDP is the protocol used all the way, mLineIndex is likely not very useful, because the client side doesn't need to know how to generate SDP (it is handled by RTCPeerConnection), and the SFU needs to be fluent in SDP anyway so the mLineIndex would be the least of your concerns if you're parsing the SDP anyway.
  • The "virtual endpoint setup" (your use case) or otherwise translating SDP into an application-specific protocol: If you've taken on generating and exchanging SDP and doing an "O/A dance", I think you have already taken on more complexity than m= line index ordering - you're already translating a custom protocol into SDP. However, if simply exposing the mLineIndex allows your library to be stateless, that is a nice-to-have feature, and I wouldn't object to such an API.

Therefore: If other browser vendors also see value in this, I'm happy to add it to the spec, otherwise we have risk of cross-browser compatibility issues for the sake of avoiding a simple parse in the application code.

@henbos
Copy link
Collaborator

henbos commented Jul 22, 2020

@jan-ivar Thoughts?

@henbos
Copy link
Collaborator

henbos commented Jul 22, 2020

The PR to be proposed would be to add e.g:

RTCRtpTransceiver { readonly attribute DOMString? mLineIndex; }
RTCPeerConnection { readonly attribute DOMString? applicationMLineIndex; }

@youennf
Copy link
Contributor

youennf commented Jul 23, 2020

This can be shimed so I am not sure there is much value.
Also, the application might anyway need to handle the lack of these attributes for older browser versions.

@henbos
Copy link
Collaborator

henbos commented Jul 23, 2020

There seem to be little interest from the working group and a possible workaround/shim to the issue, so closing it for now

@henbos henbos closed this as completed Jul 23, 2020
@ovelius
Copy link
Author

ovelius commented Jul 23, 2020

This can be shimed so I am not sure there is much value.
Also, the application might anyway need to handle the lack of these attributes for older browser version

Just pointing out that older browsers/clients can be an argument against all changes =) But agree we have a workaround.

I think, if we look at the bigger picture, that the real issue is this : For SFU applications in WebRTC you'll always end up with an SDP parser/creator in the application. IMHO in Unified Plan this actually took a small step backwards (at least for us) - as the removal of the PRANSWER concept requires more PeerConnection operations than before.

This is in sharp contrast with the one to one scenario , there all the operations are clearly defined, and the SDP is completely opaque for the application.

So I'd put on my wishlist for the future that WebRTC APIs could get first class support for SFU applications - instead of the "simulated peer to peer" approach we now seem to have.

I hope this is somewhat of a fair conclusion, I would love to hear more about how this problem has been approached from other devs that have implemented an SFU application.

Thanks!

@henbos
Copy link
Collaborator

henbos commented Jul 23, 2020

Thanks. You may want to start a discussion over at https://github.com/w3c/webrtc-nv-use-cases/issues which discusses WebRTC "Next Version" use cases, if you're suggesting a more flexible call setup model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants