-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling opus stereo audio without SDP munging (stereo=1) #63
Comments
Not really sure we need to specify anything new. The opus stereo flag is using for signaling that the endpoint supports receiving stereo:
So the sdp mangling we currently need is a hack and all endpoints should signal The question is if we need an API to make the sender encode an stereo signal. In case the app wanted to use a mono source and upscale it to stereo or an stereo source and downscale to mono, I think that the best way would to do it via WebAudio before passing it to webrtc. |
I wonder if this should be solved with...
|
Proposal:
|
channels is a term with meaning when it comes to codecs. But I do not think we need an API, just ship stereo=1 |
Also, it can't be at transceiver level, as you may want to send mono and receive stereo. I agree with @fippo that shiping |
Sounds reasonable. In that case it can change on the fly depending on the track. Would it be OK to negotiate stereo if only mono is sent? Wherre is stereo=1 defined? |
stereo=1 on the offer just means that the endpoint is able to receive stereo. |
https://tools.ietf.org/html/rfc7587#section-7.1 opus says specifically
mostly because nobody wants to bother with negotiating the number of channels in SDP. Any opus decoder must be able to decode stereo, see https://tools.ietf.org/html/rfc7587#section-3.4 |
In general, the handling of declarative attributes (tagging @alvestrand) seems odd. |
I'm just jumping into this conversation, so please forgive me if this isn't the right thread... but please do consider other channel counts and channel layouts. Currently, sending something like 6 channels of audio doesn't seem to be possible with WebRTC due to browser implementations (which may be fixed in Chromium as of yesterday?). Additionally, for my use case I need to specify an arbitrary channel layout/map, so that the encoder doesn't assume I'm sending 5.1 surround. What I need is 6 discrete channels. As changes to the implementations and specifications are made, please consider use cases such as this. Thank you! |
standard opus rtp doesnt support multichannel (channels >2). Chrome implements it with a non‐standard multiopus codec which, when enabled via sdp mangling, is automatically chosen based on the number of channels of the input track (6 for 5.1 or 8 for 7.1). |
note that
So the SDP mungling that is allowed by current implementation is just a hack due to the lack of APIs and in no way is backed up by any standard. |
This is my understanding:
To me it sounds like no spec change is needed in order to achieve receiving stereo. @perahgren can you verify that WebRTC is able to receive stereo even if "stereo=1" is missing? However it doesn't make sense to be sending stereo at the present moment since everybody is implicitly asking for mono by having "stereo=1" missing. To me it would make sense if "stereo=1" is the default in offers and decide the what to send based on the MediaStreamTrack's number of channels. Questions:
|
FWIW I don't think it makes sense for the receiver to be the one who controls what the sender is sending stereo, other than in a restrictive sense. But assuming the receiver is OK with receiving stereo, it seems like it should be the sender's choice whether or not to do so. Otherwise we end up in a scenario where if the sender doesn't want to send stereo it has to ask the receiver to ask the sender to do mono... I don't want to encourage additional offer-answer dances. |
For offer/answer, the relevant text is in section 7.1 of RFC 7587: o The "stereo" parameter is a unidirectional receive-only parameter. As for what a missing "stereo=1" attribute means: Section 6.1: stereo: specifies whether the decoder prefers receiving stereo or So receiving stereo is mandatory to support, but sending stereo when stereo=1 is missing is "allowed, but you really should have a strong reason to do so". I think that once stereo is fully supported in the pipeline, we should put "stereo=1" into the default FMTP line for Opus. |
From what I know, stereo is fully supported in the pipeline.
If I've understood this correctly, setting stereo=1 by default won't really solve any of the above, apart from not requiring any action when stereo should be sent. If mono should be sent, however, the problems still persist. Also, if we set stereo=1 by default, won't then suddenly all stereo-capable client that are not explicitly setting stereo=0 start sending stereo audio instead of mono? |
While 1) would be solved if it becomes the default (stereo without munging) then the problem of wanting mono is seemingly the same as the original problem, only having the default reversed, so now one might SDP munge to get mono instead. I think the reason why stereo=1 is more attractive than stereo=0 is that even if you negotiate stereo=1 this only sends stereo if the MediaStreamTrack is a stereo source. So in stereo=1 you can choose whether you want stereo or mono whereas with stereo=0 you have no choice, it's mono-only (assuming we want to respect the receiver's stereo=0). So what I wonder is: how common are stereo sources, and is it easy to get mono from a stereo source? Can you simply do getUserMedia() with channelCount:1 or do you have to use WebAudio and if so how cumbersome and performance impactful is it to start using WebAudio? I do think the default should reflect what makes sense. But there is not just the default in SDP that comes to play here but also the default of MediaStreamTracks? |
Yes. Do we have any idea how common this is? (I don't...) |
That is up to the signalling layer which can legitimately strip out stereo=1 and/or replace it with stereo=0 |
No? The receiver can declare stereo=0 independently. That is the "declarative" syntax stuff which is very confusing |
I do not think (as said quite a few comments up) there is any action for the WG here when it comes to chrome shipping stereo=1. |
Haha, well there we go. And here's a modified version of @jan-ivar's fiddle that logs the channelCount: https://jsfiddle.net/wase5g6b/1/ @guidou Do you know what Chromium's default for channelCount is? I would assume that a normal microphone is 1 but I don't know what happens if you have some multichannel sound recording equipment. I think if we write a PR to make stereo=1 the default we just need to make sure that Chromium does up and downsampling correctly, i.e. we don't want stereo=1 to cause us to encode stereo if our MediaStreamTrack has a mono source. @perahgren What would happen today? |
I think we can just merge the PR (to be written) unless @perahgren or @guidou knows about any backwards compatibility issues that this would cause for cases where we do have multiple channels. |
Assuming this is ready for PR in the meantime, please correct me if I'm wrong. |
I think this may cause problems for Chrome when doing setRD for Opus with stereo=1 for two reasons:
See https://chromium.googlesource.com/external/webrtc/stable/talk/+/d3ecbb30dc2684653d61e8ec88a5382aecf62773/media/webrtc/webrtcvoiceengine.cc#1892 for the relevant code. Note also that stereo=1 has no effect on setLD. |
@henbos MBP mics are mono (see Audio Midi Setup). FWIW my Logitech BRIO reports To test Chrome, I used an in-content device picker to pick my BRIO: I get |
@juberti Wouldn't that show up in a Chrome←→Firefox p2p call today then? |
Sure, but the effect will be subtle enough that you won't notice it unless you know what to look for. Still, it should be visible in WebRTC-internals. |
and
Then it sounds like the compatibility problem already existed ever since Chrome initially shipped stereo. So we need to fix this regardless of how we decide to control stereo going forward. To me I think it sounds like the proposed PR still makes sense as-is, but that code like https://chromium.googlesource.com/external/webrtc/stable/talk/+/d3ecbb30dc2684653d61e8ec88a5382aecf62773/media/webrtc/webrtcvoiceengine.cc#1892 need to be update so that the number of channels gets reconfigured to be the minimum of track channels and channels negotiated, and to trigger the possibility of reconfiguring the encoder on replaceTrack() if number of channels change. Does this make sense to everybody? @juberti @perahgren
@guidou Do you know? |
"The proposed PR" = https://docs.google.com/presentation/d/1-ZB5pjq0hTl4iNxGhIc3polH9wjCps2Uzqg7D4v1n14/edit#slide=id.gae53209dde_0_11 (slides 29-30) |
A concern I have on the PR is that it does not really solve the underlying problem of lack of API and the need for munging. However, the problem of lack of API, and the resulting need for munging, is then instead imposed on the clients that actually would want to set stereo=0 (or omit the stereo flag). I may get this wrong, but I think what would be needed is to have an API for setting/clearing the stereo that avoids munging. Having stereo=1 would be handy, but I share the concerns of @juberti and also that there may be other unforeseen effects for clients that implicitly rely on stereo=1 not being set by default. |
The proposed PR is "send stereo by default if the source is stereo". If the source is mono, you won't be sending stereo (up-sampling there would be a bug). The concern I see is for applications that don't specify channelCount when they do getUserMedia() and thus rely on the default. While most microphones are probably mono, as @jan-ivar pointed out, it looks as if the default channelCount changed from 1 to 2 in a recent version of Chrome? I don't know if that is a good or bad default, but if you actually have a stereo microphone and you are recording stereo then why not send stereo? The question I would ask is, if we don't want stereo, why are we recording stereo? If stereo or mono should be the default seems like an interesting question. But if you really don't don't want stereo, then regardless of what we think the default should be, you can always restrict the source to mono with getUserMedia({audio:{channelCount:1}}. So when would you need to do SDP munging? |
looks like I was wrong about "this should be totally safe" but good the channelcount change was caught that way. |
As I see it, the issue is on how to allow the receiving client to restrict the sending client to send stereo. From what I can see, the only way to do that is to include stereo=0 in the answer SDP. And from what I've heard the only way that can be achieved is via munging (maybe I'm missing something there though?). |
Here's my take on this: Yes, the answer SDP would have to be munged if you want to turn this off via SDP in a browser client on the receiver side. However, because the receiver has to be prepared to receive stereo regardless of what this attribute says, and it is ultimately up to the sender whether to send mono and stereo, this modification to the SDP does not actually have any effects on the receiver side. It only has an effect on the sender side where if the sender respects the receivers' wishes, it restricts it to mono. From this I conclude that you could munge the SDP after receiver.setLocalDescription(answer) but before sender.setRemoteDescription(answer). This is not disallowed by the spec, the spec only forbids munging local SDP, not remote SDP. So if you want to turn off stereo via SDP you still can without the SDP munging having to be illegal. That said I would view stereo=0 as primarily interesting for the use case of "talking to a legacy endpoint or a non-browser endpoint (server)". For a real application, I would try to make sure the sender and receiver agree on stereo or mono outside of SDP. But if you want to do it in SDP, you can. |
People might think that to be ugly I don't know, but it would be nice to avoid extra negotiation logic or APIs on both sides to decide what stereo attribute to use when ultimately it doesn't do anything on the receiver-side since stereo has to be receivable regardless. |
@perahgren found this yesterday:
If the browser is the sender, this would change the "stereo=0" from SHOULD NOT to MUST NOT send stereo, since RTCPeerConnection only sends to a single destination. This does not affect what a sensible implementation does. I'm still in favor of stereo=1 being the default because then "you send what you have" and don't need to worry about additional APIs to control SDP. It's then up to the application to agree on capture channel count, similar to how it is today up to the application to agree on capture resolution and frame rate. However the channel count's default has been raised as a concern to move forward. If we would accidentally have two channels then we may accidentally cause regressions. We may want to discuss w3c/mediacapture-main#775 before merging a PR here? |
There are a lot of moving parts here, so let me try to summarize. Summary of the summary: proposed PR is good, but we can't implement it yet without some refactoring in Chrome.
|
Excellent summary Justin. I think the relevant pieces of information are known/knowable at the relevant points in time to avoid wasting any bits, but not known to all the layers that need to know about them - until refactoring. |
I wonder whether it is possible to achieve an SDP setup that supports stereo audio sessions (via opus) properly without using SDP munging?
Examples of scenarios to achieve include
Symmetric setup stereo setup:
-The receiver has not flagged that it wants to receive mono
-The sender wants to send stereo
-The sender achieves a connection sending stereo to the receiver.
-The sender achieves a connection receiving stereo from the receiver.
Asymmetric stereo/mono setup:
-The receiver has not flagged that it wants to receive mono
-The sender wants to send stereo
-The sender achieves a connection sending stereo to the receiver.
-The sender achieves a connection receiving mono from the receiver.
The text was updated successfully, but these errors were encountered: