RTCRtpContributingSource.audioLevel not guaranteed to be in sync with audio playout #1085

taylor-b · 2017-03-16T20:27:05Z

My assumption is that this feature exists so that applications can show audio level UI indications for different participants of a call.

However, I don't see how this can be done in a robust manner, since the RTCRtpContributingSource objects are updated whenever a packet is received and not when audio is played out, with how the spec currently reads:

Each time an RTP packet is received, the RTCRtpContributingSource objects are updated.

Consider these situations:

There is a noticeable delay between packets being received and audio playing out, due to poor network conditions, resulting in an audioLevel that's updated well in advance of audio playout; e.g., you see the volume indicator move before the speaker opens their mouth.
Traffic is bursty, resulting in the audio level jumping around when there's a burst of traffic, then remaining stagnant for a while.
Packets arrive out of order... and timestamp actually decreases?

How can these problems be mitigated? Could we change "Each time an RTP packet is received" to "each time a frame of media is delivered to the MediaStreamTrack" (or whatever the right terminology there is)?

Otherwise, what can an application do? Use getStats to figure out the playout delay, and then delay updating the audio level UI for that amount of time?

The text was updated successfully, but these errors were encountered:

taylor-b · 2017-03-16T22:23:33Z

Another option: add a method on RTCRtpReceiver to get the remote timestamp of the last frame that was played out.

An application could call getContributingSources and see a source with timestamp X, call getCurrentPlayoutTimestamp and get Y, and then wait for Y - X before updating the audio level UI.

The advantages of this approach are that it's simpler from an implementation perspective, and it allows the application to get information sooner, in case that's ever desired.

jesup · 2017-03-17T05:49:02Z

Moving the point from packet reception to packet-coming-out-of-jitter-buffer (i.e. when it's played) is straightforward, and roughly what was intended.

The only reason timestamps would make sense is that if the application is polling the stats, and it could set a timeout to update UI (and maybe switch elements around) in sync with the timestamp it got for the level. This might avoid something like: Poll - audio level change is still in jitter buffer for 1 more ms, audio level changes 1 ms later, and the next poll isn't for 100ms - so any update would lag by 99ms.

Downside of timestamps is that you'll always be setting timers for updating the UI. In practice an app might apply UI changes either immediately (ahead of the change), or on the next poll/update. Probably applying changes immediately would be better, since in reality any indication you get of a change is more like "level changed sometime since you last polled".

That brings us to (barring crazy jitter depths or applications that poll every 10ms) a place where the current text actually isn't bad in practice - getting the notification 'early' compensates for the lag due to polling. Not perfect, but a partial/rough compensation. The timestamp idea would make it a more correct approximation (modulo polling frequency), but in practice would mean every poll would be followed by a timeout to update the UI - multiplied by the number of sources you're displaying

taylor-b · 2017-03-17T17:28:48Z

Moving the point from packet reception to packet-coming-out-of-jitter-buffer (i.e. when it's played) is straightforward, and roughly what was intended.

So it sounds like you're in favor of this approach? Do you have any suggestion about the correct spec terminology? Since there's no concept of a jitter buffer, would it be accurate to say "when the RTCRtpReceiver's remote source produces a frame of media", or maybe "delivers a frame of media to the MediaStreamTrack"?

getting the notification 'early' compensates for the lag due to polling

I don't feel good about this, though; there's no guarantee that the polling lag and jitter buffer delay will always cancel each other out perfectly.

taylor-b · 2017-03-23T01:28:28Z

Another issue that was brought to my attention recently: this part of the description of audioLevel means that implementations are required to decode a packet and compute the audio level as soon as a packet is received:

If an RFC 6464 extension header is not present, the browser will compute the value as if it had come from RFC 6464 and use that.

Doing this would be bad for performance. Chrome currently only decodes a packet and computes the audio level (for getStats) when more data is needed for playout.

Fixes w3c#1085. May not be the correct terminology, but the intention is that the contributing source objects are updated at playout time, such that if an application is using them to drive an audio level UI, that UI will be in sync with the audio played out by the browser.

taylor-b · 2017-03-29T03:26:41Z

Tried making a PR. I think the main question is whether we can come up with a definition of a point in time for "playout" whose interpretation isn't too ambiguous.

fippo · 2017-03-29T07:22:53Z

What is the general usage model for showing the audio level -- polling the value of audioLevel inside a requestAnimationFrame to update the UI?

The alternative here would be to have the contributing source emit an event when its value changes. Which might be 50 times a second...

Fixes w3c#1085. May not be the correct terminology, but the intention is that the contributing source objects are updated at playout time, such that if an application is using them to drive an audio level UI, that UI will be in sync with the audio played out by the browser.

alvestrand added the April 2017 interim topic label Mar 23, 2017

alvestrand assigned taylor-b Mar 23, 2017

taylor-b mentioned this issue Mar 29, 2017

Attempt to update RTCRtpContributingSource objects at playout time. #1098

Merged

stefhak added the PR exists label Apr 12, 2017

aboba closed this as completed in #1098 Apr 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTCRtpContributingSource.audioLevel not guaranteed to be in sync with audio playout #1085

RTCRtpContributingSource.audioLevel not guaranteed to be in sync with audio playout #1085

taylor-b commented Mar 16, 2017

taylor-b commented Mar 16, 2017

jesup commented Mar 17, 2017

taylor-b commented Mar 17, 2017

taylor-b commented Mar 23, 2017

taylor-b commented Mar 29, 2017

fippo commented Mar 29, 2017

RTCRtpContributingSource.audioLevel not guaranteed to be in sync with audio playout #1085

RTCRtpContributingSource.audioLevel not guaranteed to be in sync with audio playout #1085

Comments

taylor-b commented Mar 16, 2017

taylor-b commented Mar 16, 2017

jesup commented Mar 17, 2017

taylor-b commented Mar 17, 2017

taylor-b commented Mar 23, 2017

taylor-b commented Mar 29, 2017

fippo commented Mar 29, 2017