Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does a developer decide on a value for playoutDelay ? #46

Closed
padenot opened this issue Jul 17, 2020 · 23 comments · Fixed by #160
Closed

How does a developer decide on a value for playoutDelay ? #46

padenot opened this issue Jul 17, 2020 · 23 comments · Fixed by #160

Comments

@padenot
Copy link

padenot commented Jul 17, 2020

In general, APIs that take a numerical value for a tradeoff are bad, because it's hard to determine the threshold values between the various use-cases. How does a developer find what the best value is for a particular use-case?

If the best value for a use-case is fixed (which seems to be the case, looking at #8), then an enum is better. If not, then other APIs that allow determining this value for a particular environment must be available.

@jan-ivar
Copy link
Member

jan-ivar commented Jul 17, 2020

I think these are valid concerns. There's also a precedent here in latencyHint. Following that model we could imagine:

enum RTCRtpReceiverLatencyCategory {
  "balanced",
  "interactive",
  "playback",
};

partial interface RTCRtpReceiver {
  attribute (RTCRtpReceiverLatencyCategory or double) playoutDelay;
};

This should be more web compatible, and remove the onus on web developers to discover the minimum allowed delay of each browser and work around it with tables of different values for well-known browsers.

Happy to bikeshed enum names based on use-cases. This should also hopefully produce a healthy discussion about what the default value should be (presumably web conferencing is "interactive" not "balanced", right?)

@henbos
Copy link
Collaborator

henbos commented Jul 20, 2020

I think the number you would use is the number of seconds of delay you are happy with under good network conditions, it translates very well into how interactive you want it to be. I don't remember the details of the discussions since this was quite a long time ago now, but we flip-flopped between enums and numbers several times and at the end of the day we thought that a number is more well-defined than an enum, because otherwise it begs the question "how interactive is 'interactive'?" and then we'd have to arbitrarily pick a number of seconds that corresponds to something being "interactive".

I think one of the major issues in terms of interop is with the implementation, that for audio tracks it can take several minutes before the delay you ask for to be achieved. I didn't realize this when I wrote the spec; I had assumed that you get what you ask for within a few seconds.

I don't think an enum is more web compatible because interactive could then mean milliseconds on one browser but a second on another browser?

@henbos
Copy link
Collaborator

henbos commented Jul 20, 2020

That said I think there are good arguments for doing things with enum as well, but I do think that is would most likely map into some amount of seconds internally when you do the implementation.

@henbos
Copy link
Collaborator

henbos commented Jul 20, 2020

The pro with an enum is you could adapt the amount of seconds over time the more confident you get about the stability of the network, but that seems like the 2.0 implementation of an API for playout delay.

@padenot
Copy link
Author

padenot commented Jul 20, 2020

That said I think there are good arguments for doing things with enum as well, but I do think that is would most likely map into some amount of seconds internally when you do the implementation.

Yes, but the browser knows the network / local resources more than the web app developer.

@henbos
Copy link
Collaborator

henbos commented Jul 20, 2020

I agree with that, to allow a more powerful implementation that does tradeoffs I think an enum helps make it clear that the UA can be flexible, but it also makes it less testable, which might be OK. playoutDelay was originally named playoutDelayHint to allow the UA to override the decision, but to make things more testible I think it evolved into a more explicit delay knob and renamed playoutDelay.

If there is interest in implementing an enum I'd support that, or in otherwise revisiting the definition. But if implementations are basic "delay by X seconds" though I think the current API is more well-defined, despite the issue about not knowing how to pick the best number of seconds. @jan-ivar Is Firefox interested in an API like this?

@padenot
Copy link
Author

padenot commented Jul 20, 2020

A numeric value is OK if there is a feedback mechanism to inform developers programmatically that the value has been used as-is, or (for example) it's been clamped or changed. Is this the case? This would be useful for testing as well, of course. Knowing the depth of the jitter buffer is necessary for A/V sync (especially when it can be set to very high value).

Short of having this, an enum is preferable, but having an information about the amount of buffering is necessary anyways.

It needs to be a hint (which is fine with a way to know the value). Throwing for values more than 4.0 is arbitrary, and is not discoverable programmatically.

@murillo128
Copy link

My two cents.

I am currently working on an use case that requires synchronized playback of a remote stream in two different devices. We add a custom delay on the primary node via web audio and then adjust the secondary via playoutDelayHint and adjust this based on the rtt. So, having a numeric value makes sense, at least for us.

Regarding the time required to adjust the delay to the new value, it is an implementation detail of NetEq. We have modified it so it converges faster (whiting 1 second) to the value set by js by adding silence or dropping packets instead of the default behavior of NetEq.

On a side note, I would be awesome if we could add more parameters to control the jitter buffer behavior (or even completely replace it) as, at least NetEq, is not tuned correctly for several use cases.

@padenot
Copy link
Author

padenot commented Jul 21, 2020

My two cents.

So, having a numeric value makes sense, at least for us.

Yes, but in your case you've answered the question: you're using the rtt to change the value of the jitter buffer depth. For your use-case it makes sense. With your changes that make it converge faster, it's probably OK for A/V sync as well (if needed). What is missing is a way for regular apps to determine the best value for the jitter buffer depth, based on something (network condition, machine load, etc.), and to know the duration of the jitter buffer, for A/V sync. This is what this issue is about.

If it's fixed for an app (as it seems to be for e.g. Meet), then a numerical value is bad and an enum is superior.

On a side note, I would be awesome if we could add more parameters to control the jitter buffer behavior (or even completely replace it) as, at least NetEq, is not tuned correctly for several use cases.

This will have to happen as a natural consequence of the abstraction level lowering that seem to happen in 2.0.

@murillo128
Copy link

murillo128 commented Jul 21, 2020 via email

@padenot
Copy link
Author

padenot commented Jul 21, 2020

The numeric value is not the problem. The absence of way to determine what value is best is the problem.

@henbos
Copy link
Collaborator

henbos commented Jul 21, 2020

Regarding the time required to adjust the delay to the new value, it is an implementation detail of NetEq. We have modified it so it converges faster (whiting 1 second) to the value set by js by adding silence or dropping packets instead of the default behavior of NetEq.

On a side note, I would be awesome if we could add more parameters to control the jitter buffer behavior (or even completely replace it) as, at least NetEq, is not tuned correctly for several use cases.

That's interesting to hear. I would like to see Chrome's NetEq implementation to converge that fast too, but have no idea about the tradeoffs of that. @minyuel FYI an external developer has made playoutDelay more responsive for audio receivers.

@AndrewJDR
Copy link

AndrewJDR commented Jul 24, 2020

I wanted to weigh in in favor of some more fine grained control than just an enum. As an application developer, I may have a different opinion about what constitutes glitch resilience (which I guess would be the playback enum value above) than a browser developer does. Also, WebRTC has a stats API -- it's not like app developers are flying completely blind.

What if i think >= 1 dropped frame and >= 10 nacks every minute means that not enough resilience is being provided?
10 dropped frames and 100 nacks?
An RTT of > 100ms?
Or some other combination of these things that I came up with through a lot of experimentation with my specific application (with its own specific resolution/fps/etc parameters)?

And what if I'm willing to give up some latency for some greater glitch resilience, but I have a hard cutoff on how much latency I'm willing to give up? (e.g. 300ms, 500ms, 700ms, etc)

And if I identify a technique for scaling up the delay at a rate of my own choosing, also found through experiments with my own application's specific use case?

This smells like something that needs a scalar value.

Also providing an enum sounds fine, though.

@AndrewJDR
Copy link

P.S. If the assertion is that there is not enough data available to the application to build a good heuristic for adjusting this value, let's beef up the stats, not water down the ability to adjust the value!

@henbos
Copy link
Collaborator

henbos commented Jul 24, 2020

I don't think the intent of this API was ever that you would be that fine of a control knob. Generally speaking, the internal engine should be in the best position to know how to adjust the delay as to minimize poor quality and is the one in control of the jitter buffer. The problem is that the assumption was previously always that we wan to play out received media as soon as possible because that optimizes interactiveness, even if a shorter buffering necessarily entails a risk of reducing the quality when packets are dropped or don't arrive on time.

The intent of the playoutDelay was to give the application the power to say "you don't need to push the playout delay to below this point, because the interactiveness of my application's use case can loosen up these constraints... even if conditions are pretty good, I don't mind a bit of extra delay if that increases the odds of better quality".

@henbos
Copy link
Collaborator

henbos commented Jul 24, 2020

Example use case: I'm passively listening to a presentation, I don't care if I get the presentation a couple of seconds later because I'm not interacting with that content in real time. playoutDelay = 2. Later, there's a Q&A session, and now interactiveness is important. playoutDelay = 0.

@henbos
Copy link
Collaborator

henbos commented Jul 24, 2020

If your application grades how interactive you require the content to be, I think that is a better guide to what values to use for playoutDelay than you trying to do the job of the internal engine and predict network quality.

@AndrewJDR
Copy link

AndrewJDR commented Jul 24, 2020

The problem is that the assumption was previously always that we wan to play out received media as soon as possible because that optimizes interactiveness, even if a shorter buffering necessarily entails a risk of reducing the quality when packets are dropped or don't arrive on time.

For what it's worth, this API has me excited because in our application, while interactivity is quite important, smoothness is also fairly important -- typically important enough that usually, we can tolerate between 100-500ms of latency if that's what it takes for smoothness, above which it's probably too much latency to be acceptable. At other, rarer moments, higher latencies up to 2 seconds are acceptable in the name of smoothness . At still other rare moments, interactivity is crucial above all-else, so playoutDelay = 0. This API seems to allow me to express that, and that maps well to what you described, so I think we're on the same page on that part, and I'm glad it supports that usecase.

I still think that application developers, including myself, want the more fine grained control that I described above. Ultimately, full control over the jitter buffer would be even better, but... baby steps. I think it's just key to remember that webrtc is being used for so many different things outside of video/audio conferencing, like remote rendering, gaming, AR/VR, animation playback, remote production work, etc... and developers out there are going to want to do things like grow their jitter buffer incredibly fast at the first sign of trouble (perhaps far faster than the browser's jitter buffer heuristic or enum presets would deem suitable) while still trying to keep it small if the connection is good, or things like that. While I understand it wasn't the ultimate goal of this API to allow for that kind of thing, personally I see it as a step in the right direction rather than sullying the API. Folks doing cool unexpected things with the API is not necessarily a bad thing.

@murillo128
Copy link

IMHO this api would be quite useless if we allow to set a playoutDelayHint of 50ms and we end with a jitterbuffer delay 2s for several minutes. That would still be the case if we set it to interactive and get 2s delays because internally the neteq decides that it is better to converge slowly than drop packets.

We could state that the hint is the minimum value that we want the jitter buffer can take, but again neteq can decide to slow ramp up and take several minutes until that delay is achieved.

While a bit more complicated, I think the best alternative is to be able to define min and max values for the jitter buffer that are strictly endorsed by the jitter buffer is set. If jitter is lower than the min value then it should buffer packets until the min is reached before starting playback. Also, if jitter is avobe max, packets should be dropped so delay is never bigger than the max value.

@henbos
Copy link
Collaborator

henbos commented Jul 24, 2020

Yeah I definitely think the implementation needs to apply the playoutDelay faster. Maybe it is quicker at speeding things up again than slowing things down? Otherwise it would be quite "dangerous" when you become interactive again

@murillo128
Copy link

murillo128 commented Jul 24, 2020 via email

@jan-ivar
Copy link
Member

To help separate discussion, let's discuss min max over in #28.

@dontcallmedom-bot
Copy link

This issue was mentioned in WEBRTCWG-2023-04-18 (Page 81)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants