Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does recording of remote a/v streams always imply re-encoding? #139

Open
ashimokawa opened this issue Oct 22, 2017 · 12 comments · May be fixed by #190

Comments

@ashimokawa
Copy link

commented Oct 22, 2017

I tried recording a remote stream from a WebRTC conference via MediaRecorder.
Even when the remote stream was h264/opus, the stream recorded locally was vp8/opus.
I am aware that I can specify a desired codec, and force encoding of h264/opus locally. But the question is weather this always involves re-encoding inside the client browser even if codecs match, or is there a way to simple dump or remux the stream, keeping the remote encoded stream without costly re-encoding?

I think this should be noted somewhere in the spec.

@alvestrand

This comment has been minimized.

Copy link
Contributor

commented Oct 23, 2017

I'm not clear about whether the spec should enforce one or another way of doing this.
Recording involves making choices about framing and so on, so at least remuxing is required, but we could go either way on re-encoding: say that the stack chooses automagically to not re-encode if the configurations line up just right, or say that there's some special control on recording that says "don't re-encode, and give me an error back if I try to set constraints or parameters that can't be satisfied without re-encoding the frames".

@henbos

This comment has been minimized.

Copy link

commented Sep 13, 2019

My take: MediaRecorder specifies the codec with MediaRecorderOptions.mimeType. The input to the recorder is the frames of the MediaStreamTrack, the recorder is essentially a sink, responsible for encoding. The recorder should ideally not know or care (wrong abstraction layer) if the track originates from a camera, webrtc or something else.

That said - if an implementation want to take advantage of the fact that the recorder is of the same format as a WebRTC stream then it would be free to implement such optimization. But that would be a browser optimization feature, not a spec mandate.

@henbos

This comment has been minimized.

Copy link

commented Sep 13, 2019

@henbos

This comment has been minimized.

Copy link

commented Sep 13, 2019

I spoke a bit more with some engineers and there is a difference between a re-encoded stream and a WebRTC dump of the stream. For example, key frames might be different, which could cause a difference in whether or not the resulting recorded could be seeked in or not when playing it back.

It might make sense to allow the application some control here, such as if we leave the mimeType unspecified we could allow not re-encoding, but if it is specified we always re-encode?

@henbos

This comment has been minimized.

Copy link

commented Sep 26, 2019

This was discussed at TPAC with consensus of #139 for proposal B. What is needed is a clarification in the spec, and to be allowed to update the mimeType at onstart when the first frame arrives (that is when we know what we are recording, not synchronously when start() is called).

@henbos henbos added the Ready for PR label Sep 26, 2019
@Pehrsons

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2019

@henbos the tracks are known synchronously in start(). What else is there to know about what we are recording?

@henbos

This comment has been minimized.

Copy link

commented Sep 26, 2019

If the track is sourced from something that is based on an encoded stream, such as a remote WebRTC track, and we want to allow the User Agent avoid re-encoding the stream, then we have to wait for the first encoded frame to arrive before we know what codec is used, and as such we cannot synchronously update the mimeType attribute. We know this before firing "onstart" though.

@henbos

This comment has been minimized.

Copy link

commented Sep 26, 2019

I suppose the application could ensure the track has already received frames before starting to record (according to spec the track should be muted before the first frame, which means you can listen to onunmute, but Chrome is implemented to fire onunmute even if we have not received RTP packets yet - that's a bug), in which case we can pick the codec synchronously because the track could know which codec is used. But if you start recording prior to having frames, black frames would be recorded with an arbitrary codec, and possibly force re-encoding.

@Pehrsons

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2019

The codec isn't really tied to the frames.

Can "track" fire on a peer connection before the codec has been negotiated? I guess I don't know webrtc-pc well enough..

@henbos

This comment has been minimized.

Copy link

commented Sep 26, 2019

The codec isn't a property of a frame, but WebRTC doesn't know the codec until it received RTP packets with encoded frames.

WebRTC will give you remote tracks before codecs have been negotiated. Here's how it works:

  • If a transceiver is created, a receiver is also created, which has a track. This can happen whether or not we ever intend to use this track: a transceiver can be created by calling addTransceiver(), this is even prior to negotiation, or it can be created due to processing an offer, which is half-way through negotiation (whether or not we intend to receive on it). Processing an offer can trigger ontrack, but it's too early to know its codec. Receivers can exist because we are willing to receive something after negotiating, or they can be dummy objects that are never used.
  • Regardless how the "receiving track" was created, it is always muted by default.
  • At some point in time we complete negotiation. The track is still muted. At this point we know which codecs are allowed (a list of codecs) and we have to have the necessary decoders available in case incoming data arrives. But we don't know which of the codecs the other endpoint actually decides to use.
  • RTP packets arrive to the receiver! Hurray! We unmute the track and NOW we know which codec is used, because we can tell by inspecting the RTP packets.
  • (On the fly the RTP stream could change to a different codec that was negotiated and we have to be prepared for this. In terms of recording, we could either start re-encoding or we could fail recording.)
@Pehrsons

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2019

Thanks for the breakdown!

To stick with the TPAC decision of only allowing keeping the original encoding from the wire when the constrained mime type is empty, it seems like deferring setting the mimetype until we fire "start" is indeed the simplest solution.

I think we should also clarify that "start" is only fired after some data has been encoded then. Today the spec says:

Start recording all tracks in tracks using the MediaRecorder's current configuration, and gather the data into a Blob blob and queue a task, using the DOM manipulation task source, to fire an event named start at recorder, then return undefined.

gather the data into a blob and queue a task, ..., to fire an event named start is vague. What is the definition of gather the data?

  • "Wait until some data has been gathered and put it into blob, and keep gathering future data into blob as well"? Or
  • "Gather any incoming data into blob as it comes"?

The former would fire "start" after the recorder has seen some data, the latter could fire "start" before the recorder has seen any data.

I'll also note that there could be "pause" and "resume" events before "start", which seems unintuitive. Heck, even "stop" could happen before "start" if the blob was empty (and there would be no "start" at all). Also unintuitive. Are we OK with that?

FWIW Firefox will currently force the firing of "start" just before it fires "stop" if no "start" was fired yet. This behavior is old so I don't know the full reasoning behind it, but I guess it went like: if we have stopped, we must have started already...

@henbos henbos referenced a pull request that will close this issue Sep 26, 2019
@henbos

This comment has been minimized.

Copy link

commented Sep 26, 2019

I went with "As soon as data is available on all tracks". What do you think about #190?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.