Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add description of an API for controlling SDP codec negotiation #186

Open
wants to merge 43 commits into
base: main
Choose a base branch
from

Conversation

alvestrand
Copy link
Contributor

@alvestrand alvestrand commented Jun 14, 2023

Fixes #172


Preview | Diff

@alvestrand alvestrand marked this pull request as draft June 15, 2023 14:39
index.bs Outdated Show resolved Hide resolved
index.bs Outdated
#### AddSendCodecCapability #### {#add-send-codec-capability}

1. Let 'capability' be the "capability" argument
1. If 'capability' is a known codec, throw an IllegalModification error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you define "known"? I would suggest forbidding the same mimeType so you can't have collisions between e.g. H264 and a newly added H264 with a different profile-level. But you seem to allow same-mimeType-different-fmtp in the explainer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What test is used to determine whether a capability is 'known"? Do we check against what would be returned by RTCRtpSender.getCapabilities(), Media Capabilities or WebCodecs isSupported()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of throwing here is so that you can't ask to duplicate existing codecs. If you could duplicate, how could the other side know which one to choose for sending?
So it's important on the receive side, but exists on the send side mostly for symmetry.
I think "known" should be "a codec with exactly the same name and fmtp parameters appears in the list that is generated by default" - adding special codecs for special H.264 parameters is probably a valid use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to edit this in, I find that clock rates and channels are involved too. There can be 2 codecs that differ only in clock rate (CN codecs in particular), so I think we should throw in the kitchen sink and say that "equal if all the fields compare equal".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from discussion elsewhere: rtcp-fb might need consideration too (and * does not cut it sadly)

sdp-explainer.md Outdated Show resolved Hide resolved
sdp-explainer.md Outdated Show resolved Hide resolved
sdp-explainer.md Outdated Show resolved Hide resolved
sdp-explainer.md Outdated Show resolved Hide resolved
sdp-explainer.md Show resolved Hide resolved
index.bs Outdated
#### AddSendCodecCapability #### {#add-send-codec-capability}

1. Let 'capability' be the "capability" argument
1. If 'capability' is a known codec, throw an IllegalModification error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What test is used to determine whether a capability is 'known"? Do we check against what would be returned by RTCRtpSender.getCapabilities(), Media Capabilities or WebCodecs isSupported()?

index.bs Outdated Show resolved Hide resolved
index.bs Outdated

1. Let 'capability' be the "capability" argument
1. If 'capability' is a known codec, throw an IllegalModification error.
1. If 'packetizationMode' is defined, and is not a known codec, throw a SyntaxError DOMException.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would one specify they want to use a generic/codec agnostic packetizer/depacketizer?

Currently, the transform must preserve whatever bits the packetizers happen to read from (example libwebrtc reference) - however, if the RTCEncodedVideoFrameMetadata is correctly filled, the packetization process could use this instead, and not need to read any bits from the frame.

Is that something we need to distinguish between - the
"I am not modifying any necessary bits in the header, go ahead and use VP8/AV1/etc. packetizer"
vs the
"I am transforming the entire frame, don't read from it, but the frame metadata is there"
cases?

Should we even support both, or could we only allow generic RTCEncodedVideoFrameMetadata packetization in the first place?

also very minor nit: the name is slightly confusing due to the matching H264 parameter 'packetization-mode'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once that generic / codec agnostic packetizer/depacketizer has a name, we can use it.
Once a spec exists, it's likely to have a name in it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more language about packetizers while moving to naming them explicitly. See if this conversation can be resolved, or whether comments should go elsewhere.

@dontcallmedom-bot
Copy link

This issue had an associated resolution in WebRTC June 2023 meeting – 27 June 2023 (Encoded Transform Codec Negotiation):

RESOLUTION: Adopt PR #186 with details to be discussed in the PR

index.bs Outdated
<pre class="idl">
// New methods for RTCPeerConnection
partial interface RTCPeerConnection {
undefined AddSendCodecCapability(DOMString kind,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind seems redundant if we throw on !mimeType.startsWith("video/") && !mimeType.startsWith("audio/")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that once e.g this is called

const acmeCodec = {
  mimeType: "video/acme-encrypted",
  clockRate: 90000,
  sdpFmtpLine: "encapsulated-codec=vp8",
  packetizationMode: "video/vp8",
};
pc.addSenderCodecCapability(acmeCodec);

...then every negotiation from here on will offer sending of this new codec on ALL transceivers until the pc is deleted? And in first place?

This seems more limiting than e.g. setCodecPreferences which allows negotiation of codecs per transceiver.

In fact, I was wondering, why can't we just reuse setCodecPreferences here instead?

transceiver.setCodecPreferences([acmeCodec]);

I.e. have this be how JS both adds new codecs and specifies their position among existing ones?

We already accepts dictionaries there, but have prose to throw on unknown codecs. But if we're about to allow JS to make up codecs anyway, then that limitation seems superfluous suddenly.

We perhaps adopt a different limitation then, which seems to be that setCodecPreferences sets the same codec preference for both send and receive, do I have that right? But maybe that's something we should solve anyway, or perhaps it's not a problem in practice since JS can just create one-way transceivers?

This might also sidestep input validation question, since we could throw on partially unique codecs (since we're expecting either known codecs or new codecs that can't be mistaken for known ones).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setCodecPreferences() has the limitation that it doesn't distinguish between capacity-to-send and capacity-to-receive.
It might actually make sense to add a direction field to RTCRtpCodecCapability - possible values "send", "receive" and "sendrecv". This could be useful for the receive-only codecs that are currently a design problem (HEVC is the biggest name in town on that front currently).

There are two derived dictionaries from RTCRtpCodec: RTCRtpCodecCapability (adds nothing) and RTCRtpCodecParameters (adds payloadType). Should the direction field be in both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One argument against using SetCodecPreferences is that it will turn unintended changes to the capabilities (on which we currently throw) into definitions for new codecs. Is that a Bad Thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reply to "All negotiations will offer..." - only if unmodified by SetCodecPreferences.
SetCodecPreferences should allow for selecting the additional codecs to offer / answer with as well as the "baseline" codecs as desired by the user.

Copy link
Member

@jan-ivar jan-ivar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I have some questions of scope here, and I'd like to understand the use cases beyond adding SDP negotiation to JS-e2ee.

I also have some concerns about the number of nuts and bolts and application would need to keep straight here, e.g. having to modify frame data.

Also in the spec, we have the RTCRtpScriptTransform interface. I wonder if it would simplify things if we put some options on that interface instead? (but perhaps not since an app may switch transform on the fly AFAICT. @youennf is that right?

index.bs Outdated
<pre class="idl">
// New methods for RTCPeerConnection
partial interface RTCPeerConnection {
undefined AddSendCodecCapability(DOMString kind,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that once e.g this is called

const acmeCodec = {
  mimeType: "video/acme-encrypted",
  clockRate: 90000,
  sdpFmtpLine: "encapsulated-codec=vp8",
  packetizationMode: "video/vp8",
};
pc.addSenderCodecCapability(acmeCodec);

...then every negotiation from here on will offer sending of this new codec on ALL transceivers until the pc is deleted? And in first place?

This seems more limiting than e.g. setCodecPreferences which allows negotiation of codecs per transceiver.

In fact, I was wondering, why can't we just reuse setCodecPreferences here instead?

transceiver.setCodecPreferences([acmeCodec]);

I.e. have this be how JS both adds new codecs and specifies their position among existing ones?

We already accepts dictionaries there, but have prose to throw on unknown codecs. But if we're about to allow JS to make up codecs anyway, then that limitation seems superfluous suddenly.

We perhaps adopt a different limitation then, which seems to be that setCodecPreferences sets the same codec preference for both send and receive, do I have that right? But maybe that's something we should solve anyway, or perhaps it's not a problem in practice since JS can just create one-way transceivers?

This might also sidestep input validation question, since we could throw on partially unique codecs (since we're expecting either known codecs or new codecs that can't be mistaken for known ones).

index.bs Outdated
partial interface RTCRtpSender {
attribute RTCRtpTransform? transform;
undefined setEncodingCodec(RTCRtpCodecParameters parameters);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant now that we have set codec in setParameters. Mentioned in the slides.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be removed, and replaced with a pointer to "set codec in setParameters" where appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving this in with a note for now, when we're separating the encoding and packetization steps, I want to make sure we're controlling both appropriately.

index.bs Outdated Show resolved Hide resolved
index.bs Outdated
Comment on lines 146 to 155
1. The {{RTCPeerConnection/[[CustomSendCodecs]]}} is included in the list of codecs which the user agent
is currently capable of sending; the {{RTCPeerConnection/[[CustomReceiveCodecs]]}} is included in the list of codecs which the user agent is currently prepared to receive, as referenced in "set a session description" steps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I wonder if these custom codecs could instead just live in transceiver.[[PreferredCodecs]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 8 in setCodecPreferences() filters against codecCapabilities (the static).
If we modify that, it would be possible - but would we entirely stop filtering, then?

sdp-explainer.md Outdated Show resolved Hide resolved
sdp-explainer.md Show resolved Hide resolved
sdp-explainer.md Outdated Show resolved Hide resolved
Copy link
Contributor Author

@alvestrand alvestrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the spam - I only thought to group these into a review late in the day.

This addresses comments on index.bs - I'll address comments on the explainer in a separate review.

index.bs Outdated Show resolved Hide resolved
index.bs Outdated
Comment on lines 146 to 155
1. The {{RTCPeerConnection/[[CustomSendCodecs]]}} is included in the list of codecs which the user agent
is currently capable of sending; the {{RTCPeerConnection/[[CustomReceiveCodecs]]}} is included in the list of codecs which the user agent is currently prepared to receive, as referenced in "set a session description" steps.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 8 in setCodecPreferences() filters against codecCapabilities (the static).
If we modify that, it would be possible - but would we entirely stop filtering, then?

Copy link
Contributor Author

@alvestrand alvestrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explainer updated. PTAL!

sdp-explainer.md Outdated Show resolved Hide resolved
sdp-explainer.md Outdated Show resolved Hide resolved
sdp-explainer.md Outdated Show resolved Hide resolved
@alvestrand
Copy link
Contributor Author

Modified according to discussions at TPAC according to solution presented there to #199

PTAL.

Copy link
Member

@dontcallmedom dontcallmedom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix bikeshed markup

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
@dontcallmedom
Copy link
Member

(the remaining CI error is pending merge of #208)

@Philipel-WebRTC
Copy link

Philipel-WebRTC commented Oct 9, 2023

I have read this proposal, and I can't quite understand how it is suppose to work. AFAICT the goal is to allow negotiation of custom codecs, but we still only allow standardized packetizers to be (ab)used.

This part of the spec:

It is up to the transform to ensure that all data and metadata conforms
to the format required by the packetizer; the sender may drop frames that
do not conform.

seems extremely complex. The user needs to know at the lowest possible level of how the packetizers work, know which bits specifically they are allowed to touch or not touch for each codec, and AFAICT it would also be borderline impossible to debug if a packetizer decided to drop the frame for you.

Would something like this work instead?

customCodec = {
   // Type is an enum of {"VP8", "VP9", "H264", "custom"}
   type: “custom”,
   // postfixName only has an effect when the type is "custom". Setting the postfixName
   // to "VP8" would make the codec show up as "custom-VP8" in the SDP.
   postfixName: "VP8",
   clockRate: 90000
};

When custom is selected as type the packetization format is implied to be raw.

Now when a sender and receiver needs to negotiate a codec they can do string matching on the codec names, and assuming they are aware of how the bits are being transformed, then they know everything about the codec and the format.

@alvestrand
Copy link
Contributor Author

to @Philipel-WebRTC - I don't understand your proposal.
In particular:

  • h.264 has two packetizers, distinguished by "packetization-mode=0/1" in the fmtp string. This is part of my argument for using a full codec description to identify the packetizer.
  • codec names have to be unique, and if standardized names are used, they need to be used for standardized codecs
  • there's no particular reason why the SDP exchange should have to reveal to a signaling eavesdropper what the inner content of a nonstandard encrypting codec is

There is no standard at the moment for a raw packetization format. So we can't refer to that format from a standard.
A goal of standardization is interoperability - that a depacketizer in one browser should be able to depacketize content that is packetized by another browser. So referring to an unspecified format is not a win here.

I thought about using a DOMString for "setPacketizer" rather than a codec description, and allowing implementations to put non-specified values in there. But I didn't find any particular advantage in doing so.

@alvestrand alvestrand marked this pull request as ready for review October 10, 2023 12:17
@Philipel-WebRTC
Copy link

Philipel-WebRTC commented Oct 10, 2023

It could be that I fundamentally misunderstand the goal of this proposal, but AFAICT the goal is to enable custom codecs to be negotiated via SDP, or am I wrong?

As soon as we negotiate a non-standardized codec name then we don't have any standardized packetizer to rely on, and I get that the proposed API would allow the sender and receiver to map the payload type of a custom codec to some other know (de)packetizer, but to me that seems very limiting and complex.

The way it is proposed right now, would we expect anyone to every use a packetizer other than VP9 (the only one that separates descriptor and bitstream data cleanly in the payload) for a custom codec? Again, this to me looks a bit more like the abuse side of things than a general solution (assuming I understand the goal of this PR).

@alvestrand
Copy link
Contributor Author

At least one external WebRTC user has made this almost work by using the H.264 packetizer - he was advised that he would have an easier time if he appended the custom data as SEI (don't know the proper term for those). So no, VP9 is not the only option, but yes, VP9 is probably the best standard one at the moment.

It's always possible for implementations to define their own packetizers, but that's not useful for interoperability.

@alvestrand
Copy link
Contributor Author

Actually, registering new codecs in the vnd. tree with IANA is a possibility that would allow a well defined way of using nonstandard packetizers.

@alvestrand
Copy link
Contributor Author

@aboba you have a "request changes" marker. Can you re-review?

@youennf
Copy link
Collaborator

youennf commented Oct 13, 2023

Following on yesterday's editor meeting, here are some thoughts related to this proposal.
I would tend to reduce the scope of the API to what is needed now, without precluding any future extension.

AIUI, we seem to get consensus on adding support to:

  1. Enable media agnostic RTP payload formats like SFrame.
  2. Allow the application to select one of these RTP payload formats as part of a RTCRtpSender transform.

For 1, we assume the UA implements the RTP payload format.
setCodecPreferences can be used if the UA does not negotiate it by default.

For 2, we could use payload types.
Another approach, potentially more convenient for web developers, is to reuse the name of the payload format directly.
This is similar to manipulating mimeType instead of payloadType.

partial interface RTCEncodedAudioFrame {
    attribute DOMString rtpPaylodFormat;
}
partial interface RTCEncodedVideoFrame {
    attribute DOMString rtpPayloadFormat;
}

On sender side, the application would set it to sframe, to make use of https://www.ietf.org/archive/id/draft-thatcher-avtcore-rtp-sframe-00.html. On receiver side, the application detects the use of sframe by checking rtpPayloadFormat.

We could also use that API internally to further define the interaction of SFrameTransform with the RTP layers.

@alvestrand
Copy link
Contributor Author

Solving the SDP negotiation problem for built-in Sframe is not sufficient. We have to solve it for script transforms.

The proposal in #186 (comment) will not allow us to affect SDP negotiation for script transforms; SDP negotiation is only affected by things that happen before offer/answer time - there are no frames before offer/answer.

@jan-ivar
Copy link
Member

  1. Enable media agnostic RTP payload formats like SFrame.

Do you mean any other types here, or just SFrame? Do you have an example?

For 1, we assume the UA implements the RTP payload format.
setCodecPreferences can be used if the UA does not negotiate it by default.

What would that look like? Would it multiply the payload types in RTCRtpSender.getCapabilities("video")?

  1. Allow the application to select one of these RTP payload formats as part of a RTCRtpSender transform.

Do you mean a RTCRtpScriptTransform here? What would that look like?

partial interface RTCEncodedVideoFrame {
   attribute DOMString rtpPayloadFormat;

What's the advantage or use case for choosing this per frame?

Apps can already specify SFrame ahead of negotiation, so doesn't the UA have all it needs to negotiate sframe?

const {sender} = pc.addTransceiver(track);
sender.transform = new SFrameTransform({role: "encrypt"});
// negotiate
const {codecs} = sender.getParameters();
if (!codecs.find(findSFrame)) sender.transform = null;

@alvestrand
Copy link
Contributor Author

  1. Enable media agnostic RTP payload formats like SFrame.

Do you mean any other types here, or just SFrame? Do you have an example?

You heard the SCIP (NATO) format being mentioned on the call. That's one.
A payload format to replace Google's nonstandard a=packetization-mode:raw would be another.

For 1, we assume the UA implements the RTP payload format.
setCodecPreferences can be used if the UA does not negotiate it by default.

What would that look like? Would it multiply the payload types in RTCRtpSender.getCapabilities("video")?

It would look like the API described in https://developer.mozilla.org/en-US/docs/Web/API/RTCRtpTransceiver/setCodecPreferences

  1. Allow the application to select one of these RTP payload formats as part of a RTCRtpSender transform.

Do you mean a RTCRtpScriptTransform here? What would that look like?

From the explainer:

    metadata = frame.metadata();
    metadata.pt = options.payloadType;
    frame.setMetadata(metadata);
partial interface RTCEncodedVideoFrame {
   attribute DOMString rtpPayloadFormat;

What's the advantage or use case for choosing this per frame?

I believe Youenn described this case in details on the call; I added this FAQ question specifically to address his use case.

From the explainer:

  1. Q: My application wants to send frames with multiple packetizers. How do I accomplish that?

    A: Use multiple payload types. Each will be assigned a payload type. Mark each frame with the payload type they need to be packetized as.

Apps can already specify SFrame ahead of negotiation, so doesn't the UA have all it needs to negotiate sframe?

I particularly mentioned on the call (and have said in every presentation that I've given on it) that I want an interface that is able to support the existing, deployed, Javascript-based sframe format, which is not the same as the sframe transform that is supposed to use the (still not final) IETF SFrame format draft, or any other transform.

@youennf
Copy link
Collaborator

youennf commented Oct 16, 2023

What's the advantage or use case for choosing this per frame?

Fair question.

I believe Youenn described this case in details on the call; I added this FAQ question specifically to address his use case.

From the explainer:

  1. Q: My application wants to send frames with multiple packetizers. How do I accomplish that?

The use case I know is enable/disable SFrame dynamically. Otherwise, we just want to stick to whatever packetization is associated to the encoder media content and we can already change the media content via setParameters.

This change can be done either at the frame level or at the transform level (similarly to setParameters really).
If at the transform level, you need to call sender.transform = newTransform to change whether SFrame packetization would be used or not. This is slightly less flexible but might be more convenient/less error prone to web developers and could allow some optimisations on the UA side.
You loose some flexibility, except if you plan to switch packetization at a very specific frame. I am unsure whether we have a strong use case here.

It is interesting to look at receiver side though, in case a UA implements the SFrame packetization format and processing happens in a script transform. The web application might want to know:

  • packets were processed by the SFrame depacketizer
  • Which underlying media decoder will be used (info provided by the SFrame depacketizer from the RTP payload content).
    It might be convenient to expose an API for both things and it makes sense to expose this at the frame level, since this might change for every frame potentially.

I would tend to expose the same API receiver and sender side, hence why I would tend to go with frame level.

Solving the SDP negotiation problem for built-in Sframe is not sufficient. We have to solve it for script transforms.

That is probably where we have a disconnect.
Script transforms have several potential use cases:

  • Implement SFrame or a variant of SFrame. Once we have the packetization format in the UA, I do not see a need for a new generic mechanism to negotiate it. I am ok with SFrame packetization format spec to expose an extension point in the SDP, and we would expose this in our API (say setCodecPreferences for instance).
  • Add metadata to media content. In this case, I believe we want encoded content to stick to the same format (e.g. H264 would stay H264 and metadata would be put in SEI). We do not need to change the packetization format and I do not see the need to negotiate anything in the SDP. RTP header extensions might be a more meaningful extension point if need be.
  • Plug a new codec. It seems best to solve this with a separate API, I do not think we want to enshrine this kind of support in encoded transforms. For instance, a video encoder takes a VideoFrame as input and EncodedVideoChunk as output. A script transform takes an EncodedVideoChunk both for input and output. Also, these two should be usable together: it makes sense to use a SFrame transform on a plugged-in codec simply with sender.transform = sframeTransform.

This plug a new codec API would probably need to let the web application handle both the encoding/decoding (UA does not know the content format) and the associated packetization (UA does not know the format so does not know the associated packetization format), in addition to SDP negotiation. Relying on the SFrame packetization for new codecs is probably not something we want, as per IETF feedback.

For this plug a new codec work, I would tend to start with a plug your own encoder API that could be used without packetization/SDP handling, for instance to let web apps fine tune a WebCodecs encoder setup (set QP per frame/macro-block for instance). We could then address new formats on top of this API to handle SDP negotiation/RTP packetization.
It could also be used for the metadata to media content use case since web applications may often carry VideoFrame and associated metadata together.

@alvestrand
Copy link
Contributor Author

Solving the SDP negotiation problem for built-in Sframe is not sufficient. We have to solve it for script transforms.

That is probably where we have a disconnect. Script transforms have several potential use cases:

  • Implement SFrame or a variant of SFrame. Once we have the packetization format in the UA, I do not see a need for a new generic mechanism to negotiate it. I am ok with SFrame packetization format spec to expose an extension point in the SDP, and we would expose this in our API (say setCodecPreferences for instance).

Specific interfaces that can only be used for the SFrame transform should go into the SFrame transform specification, not here. setCodecPreferences chooses the preference by the receiver about what the sender should send, it does not enforce either what the sender sends or what the receiver will have to handle.

setDepacketizer(PT, depacketizer) is an API that addresses the use case on the receiver side.

  • Add metadata to media content. In this case, I believe we want encoded content to stick to the same format (e.g. H264 would stay H264 and metadata would be put in SEI). We do not need to change the packetization format and I do not see the need to negotiate anything in the SDP. RTP header extensions might be a more meaningful extension point if need be.

We don't agree here. There is no generic metadata mechanism, and a number of codecs lack functionality for adding metadata.

  • Plug a new codec. It seems best to solve this with a separate API, I do not think we want to enshrine this kind of support in encoded transforms.

I don't agree here. I think we want to allow for experimentation of this kind, we should actively ensure that we make it possible to try out new codecs without changing the browser.

This plug a new codec API would probably need to let the web application handle both the encoding/decoding (UA does not know the content format) and the associated packetization (UA does not know the format so does not know the associated packetization format), in addition to SDP negotiation. Relying on the SFrame packetization for new codecs is probably not something we want, as per IETF feedback.

For this plug a new codec work, I would tend to start with a plug your own encoder API that could be used without packetization/SDP handling, for instance to let web apps fine tune a WebCodecs encoder setup (set QP per frame/macro-block for instance).

Doing this without SDP handling would make the current situation, where people send content that does not conform to the SDP describing what they claim to be sending, strictly worse. I would strongly oppose adding such an API to the PeerConnection family of APIs without solving the SDP issue first.

We could then address new formats on top of this API to handle SDP negotiation/RTP packetization. It could also be used for the metadata to media content use case since web applications may often carry VideoFrame and associated metadata together.

If we have to handle SDP negotiation and RTP packetization for this use case, and we know that we have other use cases we need to handle it for, why not accept the current proposal into the WD?

@jan-ivar
Copy link
Member

  • Plug a new codec. It seems best to solve this with a separate API, I do not think we want to enshrine this kind of support in encoded transforms. For instance, a video encoder takes a VideoFrame as input and EncodedVideoChunk as output. A script transform takes an EncodedVideoChunk both for input and output.

I agree this suggests an API mismatch. I see evidence of this in the Lyra use case as well, which had to:

  1. Use a special "L16" null-codec in Chrome to bypass encoding and pass raw audio directly to the transform
  2. SDP-munge ptime back to 20, to counter Chrome's drop to 10 ms from the large amount of (uncompressed) input
  3. Reverse the 16-bit endian of the input, which is already in network order, back to platform order for lyra-js

It's an impressive feat, but I imagine these problems would be amplified with video. It might need its own API.

Also, these two should be usable together: it makes sense to use a SFrame transform on a plugged-in codec simply with sender.transform = sframeTransform.

This makes sense to me as well. E.g. spitballing:

sender.encoder = new RTCRtpScriptEncoder(worker, options);
sender.transform = new SFrameTransform({role: "encrypt"});

?

@alvestrand
Copy link
Contributor Author

My preferential API for chained transforms is readable.pipeThrough(transform1).pipeThrough(transform2).pipeTo(writable).

It is possible to write chain links that permit this syntax when using RTCRtpScriptTransformer.

@alvestrand
Copy link
Contributor Author

My point was really that if we want to add more attributes to addTransceiver(), we should not add more arguments.
Actually addTransceiver() already takes an RTCRtpTransceiverInit with 3 existing members - adding a transform here would be consistent with the existing interface.

@jan-ivar
Copy link
Member

jan-ivar commented Jan 8, 2024

Actually addTransceiver() already takes an RTCRtpTransceiverInit with 3 existing members - adding a transform here would be consistent with the existing interface.

I've filed #221. My point here was merely to highlight that a transceiver is primarily the sender and receiver.

@jan-ivar
Copy link
Member

jan-ivar commented Jan 8, 2024

... That way, in some wonderful future where one can construct RtpSender and RtpReceiver objects without SDP, you don't need RtpTransceivers to exist.

I agree, so we should stop adding new methods to RtpTransceiver if we can avoid it.

I favor SDP-agnostic options on new RTCRtpScriptTransform over the more SDP-specific transceiver.addCodec.

Is it SDP-specific for JS to declare the input/output codec of the transform it performs? — Seems useful to the worker regardless of SDP. E.g. based on the transform-based part of what @alvestrand and I have been working on:

    transceiver.sender.transform = new RTCRtpScriptTransform(worker, {
      inputCodecs: [{mimeType: "video/vp8"}],
      outputCodecs: [{mimeType: "video/custom-encrypted"}]
    });
    transceiver.receiver.transform = new RTCRtpScriptTransform(worker, {
      inputCodecs: [{mimeType: "video/custom-encrypted"}],
      outputCodecs: [{mimeType: "video/vp8"}]
    });

This nicely communicates the task to the (reused) worker (removing the need to invent a side property in my fiddle).

Copy link
Member

@jan-ivar jan-ivar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alvestrand Here's the transform example you asked for. Hope it looks OK and thanks for your patience!

I was able to reuse the same worker function. I didn't touch on acceptOnlyInputCodecs since the JS does that (but it might be a nice optimization).

sdp-explainer.md Outdated Show resolved Hide resolved
sdp-explainer.md Outdated Show resolved Hide resolved
alvestrand and others added 2 commits January 12, 2024 11:06
Separate the two proposals better

Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>
Separate the two proposals better

Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>
@alvestrand
Copy link
Contributor Author

After writing up some considerations for how codecs are described in w3c/webrtc-pc#2925 I am starting to feel that the transform proposal fits better - but it would be described as modifying the codec list when a transform is added.

Comments?

alvestrand and others added 6 commits January 25, 2024 11:56
Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>
Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>
Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>
Co-authored-by: Jan-Ivar Bruaroey <jan-ivar@users.noreply.github.com>
Co-authored-by: Philipp Hancke <fippo@goodadvice.pages.de>
@dontcallmedom-bot
Copy link

This issue was mentioned in WEBRTCWG-2024-02-20 (Page 14)

@alvestrand
Copy link
Contributor Author

@jan-ivar this is incomplete, but does it so far conform with what you think we've agreed on?

@alvestrand
Copy link
Contributor Author

I think this is ready for an editors review now - I believe the spec changes are in a state where they are good to merge, and reflect the API used in the explainer.

Please take a look.

@dontcallmedom-bot
Copy link

This issue had an associated resolution in WebRTC April 23 2024 meeting – 23 April 2024 (Custom Codecs):

RESOLUTION: Consensus on on #186, discussion to continue on #202

sequence&lt;RTCRtpCodec&gt; inputCodecs;
sequence&lt;RTCRtpCodec&gt; outputCodecs;
boolean acceptOnlyInputCodecs = false;
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are specs that take an ANY and transform them to a dictionary. @jan-ivar will find an example.

Copy link
Member

@jan-ivar jan-ivar May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From streams:

In that spec, underlyingSource is an object, vs ours which is any, so we'd probably want to check against object before doing this to avoid a TypeError here. (more below)

@alvestrand
Copy link
Contributor Author

Aim to set editors can integrate on next Thursday (May 2, 2024)

Copy link
Member

@jan-ivar jan-ivar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing this. Some questions still around how this all fits together with custom codecs, sframe etc.

sequence&lt;RTCRtpCodec&gt; inputCodecs;
sequence&lt;RTCRtpCodec&gt; outputCodecs;
boolean acceptOnlyInputCodecs = false;
};
Copy link
Member

@jan-ivar jan-ivar May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From streams:

In that spec, underlyingSource is an object, vs ours which is any, so we'd probably want to check against object before doing this to avoid a TypeError here. (more below)

@@ -130,6 +130,7 @@ The <dfn abstract-op>readEncodedData</dfn> algorithm is given a |rtcObject| as p
1. Let |frame| be the newly produced frame.
1. Set |frame|.`[[owner]]` to |rtcObject|.
1. Set |frame|.`[[counter]]` to |rtcObject|.`[[lastEnqueuedFrameCounter]]`.
1. If |rtcObject|'s |transform|'s |options| contains {{RTCRtpScriptTransformParameters/acceptOnlyInputCodecs}} with the value True, and the {{RTCEncodedVideoFrameMetadata/mimeType}} of |frame| does not match any codec in |transform|'s |options|' |inputCodecs|, execute the [$writeEncodedData$] algorithm given |rtcObject| and |frame|.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First a nit: Maybe set an [[acceptOnlyInputCodecs]] internal slot to avoid referencing the options object which gets transferred? This is meant to be an invariant, and would avoid implying cross-thread access.

Also, while reviewing this, I realized the spec is confusing atm wrt what threads the various objects are on (rtcObject is always on main-thread), so I've filed #229 which should help. I hope to do a PR there soon. — To not block on that, I think we can still merge this with the understanding that #229 might move things in the right place later, if that's OK with everyone.

But on to the substance of this line:

The writeEncodedData is the transformer.writable's writeAlgorithm, i.e. it's underlying sink. The streams spec normally manages when this is called. This says to invoke that algorithm directly, which I worry might interfere with the streams spec's view of what is happening since it owns when to call its underlying sink. E.g. this might bypass any frames queued on the writable's internal queue that the streams spec maintains.

I'd be more comfortable if we could find a more WebRTC-specific shortcut that doesn't interact with the streams spec here. If I understand correctly, the goal was to bypass the streams machinery here anyway. Otherwise this optimization doesn't seem worthwhile.

Shouldn't we also abort the remaining steps in that case? I.e. don't enqueue the frame on the readable?

@@ -146,7 +147,7 @@ The <dfn abstract-op>writeEncodedData</dfn> algorithm is given a |rtcObject| as

On sender side, as part of [$readEncodedData$], frames produced by |rtcObject|'s encoder MUST be enqueued in |rtcObject|.`[[readable]]` in the encoder's output order.
As [$writeEncodedData$] ensures that the transform cannot reorder frames, the encoder's output order is also the order followed by packetizers to generate RTP packets and assign RTP packet sequence numbers.
The packetizer may expect the transformed data to still conform to the original format, e.g. a series of NAL units separated by Annex B start codes.
The packetizer will expect the transformed data to conform to the format indicated by its {{RTCEncodedVideoFrameMetadata/mimeType}}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to define requirements on the packetizer here, e.g. what MUST happen if it doesn't or can't conform.

Comment on lines +365 to +383
<dt>
<dfn dict-member>frameId</dfn> <span class="idlMemberType">unsigned long long</span>
</dt>
<dd>
<p>
An identifier for the encoded frame, monotonically increasing in decode order. Its lower
16 bits match the frame_number of the AV1 Dependency Descriptor Header Extension defined in Appendix A of [[?AV1-RTP-SPEC]], if present.
Only present for received frames if the Dependency Descriptor Header Extension is present.
</p>
</dd>
<dt>
<dfn dict-member>dependencies</dfn> <span class="idlMemberType">sequence&lt;unsigned long long&gt;</span>
</dt>
<dd>
<p>
List of frameIds of frames this frame references.
Only present for received frames if the AV1 Dependency Descriptor Header Extension defined in Appendix A of [[?AV1-RTP-SPEC]] is present.
</p>
</dd>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a separate PR for this or explain its relationship? I don't get it.

dictionary RTCRtpScriptTransformParameters {
sequence&lt;RTCRtpCodec&gt; inputCodecs;
sequence&lt;RTCRtpCodec&gt; outputCodecs;
boolean acceptOnlyInputCodecs = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: With "accept" I worry it's not super clear to what happens to frames that are not "accepted". Especially for E2EE, we'd want to make it super clear that frames NOT accepted are sent through untransformed.

In the explainer you refer to it as a "filter". How about filterOnInputCodecs or transformInputCodecsOnly?

Also, I forget: why can't we default this to true or even remove the option to set it to false?

Comment on lines +806 to +811
The member "packetizationMode" is added to the {{RTCRtpCodec}} definition; its value is a MIME type. When this is present, the rules for packetizing and depacketizing are those of the named MIME type. For codecs not supported by the platform, this MUST be present.

<pre class='idl'>
partial dictionary RTCRtpCodec {
DOMString packetizationMode;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't the idea with having both inputCodecs and outputCodecs that we could infer the packetizationMode from the inputCodecs? e.g. from #186 (comment).

RTCRtpCodec is used all over the place, including setCodecPreferences, set/getParameters, and the codec match algorithm, and it's not super-clear to me when the user agent is supposed to populate this member or not, or even why it needs to be exposed when the application needs to know this a priori.

The only example uses packetizationMode: video/sframe which is a bit confusing to me. Sorry if I'm a bit rusty on this.

It's also not used in any algorithm, so it's not clear what its function is.

## Operations ## {#RTCRtpScriptTransform-operations}

The <dfn constructor for="RTCRtpScriptTransform" lt="RTCRtpScriptTransform(worker, options)"><code>new RTCRtpScriptTransform(|worker|, |options|, |transfer|)</code></dfn> constructor steps are:
1. Set |t1| to an [=identity transform stream=].
2. Set |t2| to an [=identity transform stream=].
3. Set |this|.`[[writable]]` to |t1|.`[[writable]]`.
4. Set |this|.`[[readable]]` to |t2|.`[[readable]]`.
4. Set |this|.`[[options]]?` to the result of converting |options| to an RTCRtpScriptTransformParameters dictionary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on comment above, something like:

Suggested change
4. Set |this|.`[[options]]?` to the result of converting |options| to an RTCRtpScriptTransformParameters dictionary.
4. Set |this|.`[[options]]` to |options|.
5. Let |optionsObject| be |options| if |options| is of type [=object=], otherwise `{}` .
6. Set |this|.`[[optionsDict]]` to the result of converting |optionsObject| to an RTCRtpScriptTransformParameters dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tell SDP negotiation to negotiate support for non-standard codecs