Add current audio latency metric #124

henbos · 2023-10-28T09:56:07Z

Fixes #119.

Part of #96. See also follow up #128.

henbos · 2023-10-28T10:03:53Z

This is the follow-up for the audio frames counters, only covering the latency metrics (current and total).

Wording of algorithm is #108 and needs a separate PR (edit: and here it is: #127) to refactor both audio and video in the same PR.

henbos · 2023-10-28T10:13:49Z

Where the discussion left off from earlier is: #117 (comment)

An audio frame is the group of audio samples that happen at the same time, one per audio channel.

This is my understanding too.

Just poll track.stats at two points in time and take the delta delivered delay / delta delivered frames. [...]

How can you do this if you don't know if you have an average, rolling average, or something else? Besides, it's still nonsensical. If I have three audio buffers delivered, of 1024 frames each, and latencies are respectively 10 frames, 500 frames and 36 frames, it goes:

(10 + 500 + 36) / (1024 + 1024 + 1024) = 0.177734375

this number doesn't mean anything. (10 + 500 + 36) / 3 means something, that's a regular average, but it masks jitter by definition because it's an average, so it's a low pass.

My understanding is that if "current audio latency" has a meaningful definition, then so does the total audio latency, because it is the same metric except that you increment for each audio frame rather than only exposing the latest value.

It is up to the UA to estimate the latency of an audio frame. The app does not need to care as long as it is accurate.

henbos · 2023-10-28T15:04:20Z

If I have three audio buffers delivered, of 1024 frames each, and latencies are respectively 10 frames, 500 frames and 36 frames, it goes:

Do you mean the first set of 1024 frames have a latency of 10 ms, the second set of 1024 frames have a latency of 500 ms, and the third set of 1024 frames have a latency of 36 ms? If so I would expect:

deliveredFramesLatency = 1024 * 10 + 1024 * 500 + 1024 * 36 = 559104 ms
deliveredFrames = 1024 + 1024 + 1024 = 3072

Meaning the average latency per frame is 559104 / 3072 = 182 ms.

(And if buffering frames in order to send them in a batch adds latency, then that would be part of the latency measurement.)

henbos · 2023-10-30T13:50:57Z

I tried to clarify that latency includes buffering. @o1ka who is the Chrome WebRTC audio expert, does this PR make sense to you?

o1ka · 2023-11-01T13:58:03Z

The definition makes sense to me. The question re: average / rolling average is not quite clear. "current audio latency", i.e. the latency of the last delivered frame, has a specific value, it's not a statistic. The aggregated "delivered audio frames latency" allows the clients to have whatever statistics they need: overall average, the histogram of "average over polling interval", etc.

henbos · 2023-11-01T14:07:46Z

@padenot @jan-ivar I tried to clarify that this is per frame, that it includes buffering, and that it is measured in ms. I tried to adjust the calculation padenot made earlier as I understood how it should work. Does it make sense now? What are the next steps?

henbos · 2023-11-03T15:11:08Z

I forgot if I linked this or not, but the "total measure + number of measurements" (in this case total latency and number of delivered frames) is based on the same principle as is described in the bottom half of WebRTC's Guidelines for design of stats objects, and MediaStreamTrack.stats will likely be frequently used together with RTCPeerConnection.getStats() even if they are independent APIs. Use cases include looking at several second averages for sustained quality implications rather than temporary glitches or even entire call values for the purpose of A/B testing.

I am also in strong favor of also exposing most recent values (e.g. current latency) since I think that has more value in cases such as WebAudio or more real-time use cases than averages. I think the two compliment each other.

henbos · 2023-11-08T08:04:42Z

PTAL @padenot and @jan-ivar. As discussed in the meeting yesterday, I have updated this PR to:

Only add current latency, not total latency. A separate PR will be created to address min/max/avg, issue filed: [Audio Stats] Add average, min and max latency #128.
Changed the wording a bit to say current latency, rather than latency of the last audio frame specifically, as to give the user agent room to expose a representative value even if the UA has different buffer sizes in different layers of the stack, which is an implementation detail we do not need to expose in a current latency metric.

henbos · 2023-11-08T08:05:59Z

Assuming this PR can merge this week, I'll create the follow-up (min/max/avg) PR after this one has merged since I apparently don't know how to create PR chains on GitHub

index.html

henbos · 2023-11-13T09:47:35Z

@jan-ivar @padenot Ping, is there anything that I've not addressed? If you prefer something to be phrased differently, please let me know how, but IIUC there is nothing preventing this to merge. Please let me know if that is not correct and if so what needs to happen

padenot · 2023-11-13T15:20:16Z

I don't understand why this contains things that have not been agreed upon, such as dropped frames.

padenot · 2023-11-13T15:25:48Z

index.html

@@ -545,14 +547,26 @@ <h4>The MediaStreamTrackAudioStats interface</h4>
              such as if the track is muted or disabled, then the counters do
              not increase.</p>
            </div>
+            <li>
+              <p>The <dfn data-lt="current audio latency">current audio


This needs a rewrite. When defining something, it's best to proceed like so:

Define what a certain concept is. In our case, it's the time, in milliseconds, between the point in time an audio input device has acquired a signal, and the time it is able to provide it to an output device. Similarly, we could spec the second part as the point in time the audio data is available to script. It's also important to say that this must be the latest available figure. "Current" isn't very precise for something that changes 300 times a second.

Then say that it can be unavailable. Do we really want a nullable here, forcing authors to use an if statement (or something like that)?

Don't use the word "note" in normative statements, this is confusing. Or use an explicit non-normative section, in which case you can use the word may.

In specification, say what needs to be done, not what doesn't need to be done: saying that this shouldn't include the output latency is bad, it should naturally follow from other definitions. If there's room for confusion, add an informative note.

Define what a certain concept is [...] Don't use the word "note" [...] say what needs to be done, not what doesn't need to be done

Reworded. If this is not to your liking please suggest an edit that is.

I now have a definition of "input latency" which can be referenced by both the "latest input latency" (this PR) and for min/max/avg latency (future PR being discussed in #128).

In our case, it's the time, in milliseconds, between the point in time an audio input device has acquired a signal, and the time it is able to provide it to an output device.

The sink may not be an output device. But I reworded it to use your language except I say "to delivery of any of its sinks". Does that make sense to you?

Similarly, we could spec the second part as the point in time the audio data is available to script.

Not sure how to address this comment. Isn't "the script" just another sink?

Then say that it can be unavailable. Do we really want a nullable here, forcing authors to use an if statement (or something like that)?

I agree it would be nice to avoid "if not null" since it would only really be null before the first frames have been delivered. I guess it could be up to the user agent to make a sensible initial guess? I don't know, I removed nullable for API ergonomics reasons, but it's a bit under-specified what should happen if you call track.stats.latency inside the getUserMedia promise's resolve() in which case I don't know if a measurement has necessarily been made yet, depending on implementation?

padenot · 2023-11-13T15:26:05Z

index.html

@@ -578,9 +592,11 @@ <h4>The MediaStreamTrackAudioStats interface</h4>
              set {{MediaStreamTrackAudioStats/[[DeliveredFramesDuration]]}}
              to [= delivered audio frames duration =],
              set {{MediaStreamTrackAudioStats/[[DroppedFrames]]}} to
-              [= dropped audio frames =] and
+              [= dropped audio frames =],


I don't know why this is here.

See #124 (comment) and #129

henbos · 2023-11-13T15:30:20Z

I don't understand why this contains things that have not been agreed upon, such as dropped frames.

This PR does not add dropped frames. Dropped frames was merged in a previous PR (#117). At the time I didn't understand you had a problem with them, so I did what Jan-Ivar proposed and split up the PR in order to make progress (using the editors can integrate label). Since we haven't resolved this disagreement, I just filed #129.

As for this PR, it only adds input latency. Do you have any concerns with the definitions of latency? Edit: I saw your other comments now

henbos

Please take another look and suggest edits

henbos · 2023-11-14T11:06:34Z

index.html

@@ -578,9 +592,11 @@ <h4>The MediaStreamTrackAudioStats interface</h4>
              set {{MediaStreamTrackAudioStats/[[DeliveredFramesDuration]]}}
              to [= delivered audio frames duration =],
              set {{MediaStreamTrackAudioStats/[[DroppedFrames]]}} to
-              [= dropped audio frames =] and
+              [= dropped audio frames =],


See #124 (comment) and #129

henbos · 2023-11-14T12:01:24Z

index.html

@@ -545,14 +547,26 @@ <h4>The MediaStreamTrackAudioStats interface</h4>
              such as if the track is muted or disabled, then the counters do
              not increase.</p>
            </div>
+            <li>
+              <p>The <dfn data-lt="current audio latency">current audio


Define what a certain concept is [...] Don't use the word "note" [...] say what needs to be done, not what doesn't need to be done

Reworded. If this is not to your liking please suggest an edit that is.

I now have a definition of "input latency" which can be referenced by both the "latest input latency" (this PR) and for min/max/avg latency (future PR being discussed in #128).

In our case, it's the time, in milliseconds, between the point in time an audio input device has acquired a signal, and the time it is able to provide it to an output device.

The sink may not be an output device. But I reworded it to use your language except I say "to delivery of any of its sinks". Does that make sense to you?

Similarly, we could spec the second part as the point in time the audio data is available to script.

Not sure how to address this comment. Isn't "the script" just another sink?

Then say that it can be unavailable. Do we really want a nullable here, forcing authors to use an if statement (or something like that)?

I agree it would be nice to avoid "if not null" since it would only really be null before the first frames have been delivered. I guess it could be up to the user agent to make a sensible initial guess? I don't know, I removed nullable for API ergonomics reasons, but it's a bit under-specified what should happen if you call track.stats.latency inside the getUserMedia promise's resolve() in which case I don't know if a measurement has necessarily been made yet, depending on implementation?

henbos · 2023-11-16T08:22:52Z

Friendly ping @padenot and @jan-ivar, do you have time to take a look before the editor's meeting today?

youennf · 2023-11-16T16:02:15Z

index.html

+              <div class="note">
+                <p>A sink that consumes audio may add additional processing
+                latency not included in this measurement, such as playout delay
+                or encode time.</p>


I do not think the intent of this value is for realtime processing.
Adding a note about the intent (goal is to allow monitoring not realtime processing) would be nice.
If we want very precise latency info for say web audio apps, it might be best to expose this value in web audio measuring microphone to web audio sink.

I added a note about monitoring, WDYT @youennf ?

This value is useful for Web Audio applications as is. Web Audio applications that perform recordings of audio input device need to know how far in the past, audio recordings should be shifted to be in phase with other elements. In a well-behaved system, there is little expectations for this value to change, but it should be possible to see it changed if need be (a common cause will be drift compensation when crossing time domains).

Examples: https://ampedstudio.com/manual/studio-settings/ (end of page, workaround by using a loop-back mic), https://www.w3.org/2021/03/media-production-workshop/talks/ulf-hammarqvist-audio-latency.html

If that value is useful, why not adding it to WebAudio directly? This should be cheap enough to do and would give a more precise latency info given it would compute the actual latency between microphone and web audio graph.

The Web Audio API does not deal with input, only output. It's problematic for various reasons, but it's how it was done. The object that is used to refer to the microphone is the MediaStreamTrack, and so this is why we're adding it there.

The latency in between the microphone and the web audio graph is precisely what is being added here, in the sense that the web audio graph is a sink, because it runs off of a system audio output callback.

But there can be other sinks, such as an HTMLMediaElement, or using the microphone in some other way, directly feeding it to a MediaRecorder, etc.

The Web Audio API does not deal with input,
The object that is used to refer to the microphone is the MediaStreamTrack

I do not understand this.
MediaStreamAudioSourceNode is the web audio sink of MediaStreamTrack.
Can you clarify why microphone-to-web-audio latency should not be put there instead?
At least in Safari, there is a small buffer at MediaStreamAudioSourceNode which could impact the web audio input latency.

there can be other sinks, such as an HTMLMediaElement

Exactly, with different latencies potentially. It makes more sense to put values there for those applications.
For instance, at least for Safari's implementation, droppedVideoFrames makes sense in https://w3c.github.io/media-playback-quality/#dom-htmlvideoelement-getvideoplaybackquality but does not make sense inMediaStreamTrack.

Re dropped video frames, in Chrome it is possible for frames to be dropped prior to sink delivery (measured by track.stats) or by the sink, for example the RTCPeerConnection (measured by pc.getStats). It would clearly be wrong to expose encoder frame drops inside track.stats. We could expose track frame drops inside the media-source stats object that is returned by getStats, but this would mean that every sink has to report something that is outside of its own scope. And let me remind you that the reason we're talking about track.stats rather than getStats is that when we added these and other metrics to webrtc-stats, we got pushback for adding stuff that is out of scope. (Which I think is justified, as there are non-WebRTC use cases that care about track stats.)

So likewise, if MediaStreamAudioSourceNode is adding latency that is not accounted for by track.stats, then that seems like a good place to measure it. But I still see value in each part of the pipeline measuring its own thing rather than letting components be responsible for neighbor components and making assumptions about how many steps in the pipeline is of interest to the application

Friendly ping @padenot and @jan-ivar. Are there any changes requested regarding the latency definition?

I agree with @henbos. If MediaStreamAudioSourceNode (or its Track sibling) adds latency, then we can add something there. More likely than not we won't add something there, because if it adds latency there is a design problem in the implementation, and we'll argue in the exact same way we're arguing about audio frame drops: it's better for the Web that we fix our implementation than to add a way to measure avoidable latency overhead.

If we agree, does that mean we can merge this PR?

youennf

LGTM

index.html

Co-authored-by: youennf <youennf@users.noreply.github.com>

henbos added 2 commits October 28, 2023 11:53

Add audio latency metrics.

fc69095

Improve definitions

065b418

henbos requested review from padenot and jan-ivar October 28, 2023 10:03

henbos mentioned this pull request Oct 28, 2023

[Audio Stats] Add current latency #119

Closed

Milliseconds

8319226

henbos added 2 commits October 30, 2023 09:40

Clarifications about buffering

1a370ad

More clarifications

f6260f8

henbos added 4 commits November 8, 2023 08:47

Rebase

e0f84b4

Only expose current latency

ff005bd

Remove duplicate </li>

d9f060b

Rewording

89332eb

henbos changed the title ~~Add audio latency metrics.~~ Add current audio latency metric Nov 8, 2023

henbos commented Nov 8, 2023

View reviewed changes

index.html Outdated Show resolved Hide resolved

jan-ivar reviewed Nov 9, 2023

View reviewed changes

index.html Outdated Show resolved Hide resolved

ratio

53134e8

padenot reviewed Nov 13, 2023

View reviewed changes

Reword as per comments

92eed03

Remove nullable

48e1c60

henbos commented Nov 14, 2023

View reviewed changes

youennf reviewed Nov 16, 2023

View reviewed changes

Note on accuracy

3bec90f

youennf approved these changes Nov 22, 2023

View reviewed changes

index.html Outdated Show resolved Hide resolved

Update index.html

461286a

Co-authored-by: youennf <youennf@users.noreply.github.com>

padenot approved these changes Dec 1, 2023

View reviewed changes

alvestrand merged commit 96c9d9d into w3c:main Dec 1, 2023
1 check passed

henbos mentioned this pull request Dec 4, 2023

Move MediaStreamTrack stats in its own spec? #132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add current audio latency metric #124

Add current audio latency metric #124

henbos commented Oct 28, 2023 •

edited by pr-preview bot

Loading

henbos commented Oct 28, 2023 •

edited

Loading

henbos commented Oct 28, 2023

henbos commented Oct 28, 2023 •

edited

Loading

henbos commented Oct 30, 2023 •

edited

Loading

o1ka commented Nov 1, 2023

henbos commented Nov 1, 2023

henbos commented Nov 3, 2023

henbos commented Nov 8, 2023 •

edited

Loading

henbos commented Nov 8, 2023

henbos commented Nov 13, 2023

padenot commented Nov 13, 2023

padenot Nov 13, 2023

henbos Nov 14, 2023

padenot Nov 13, 2023

henbos Nov 14, 2023

henbos commented Nov 13, 2023 •

edited

Loading

henbos left a comment

henbos Nov 14, 2023

henbos Nov 14, 2023

henbos commented Nov 16, 2023 •

edited

Loading

youennf Nov 16, 2023 •

edited

Loading

henbos Nov 17, 2023

padenot Nov 20, 2023

youennf Nov 20, 2023

padenot Nov 21, 2023

youennf Nov 22, 2023

henbos Nov 22, 2023 •

edited

Loading

henbos Nov 27, 2023

padenot Nov 30, 2023

henbos Dec 1, 2023

youennf left a comment

Add current audio latency metric #124

Add current audio latency metric #124

Conversation

henbos commented Oct 28, 2023 • edited by pr-preview bot Loading

henbos commented Oct 28, 2023 • edited Loading

henbos commented Oct 28, 2023

henbos commented Oct 28, 2023 • edited Loading

henbos commented Oct 30, 2023 • edited Loading

o1ka commented Nov 1, 2023

henbos commented Nov 1, 2023

henbos commented Nov 3, 2023

henbos commented Nov 8, 2023 • edited Loading

henbos commented Nov 8, 2023

henbos commented Nov 13, 2023

padenot commented Nov 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henbos commented Nov 13, 2023 • edited Loading

henbos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henbos commented Nov 16, 2023 • edited Loading

youennf Nov 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henbos Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youennf left a comment

Choose a reason for hiding this comment

henbos commented Oct 28, 2023 •

edited by pr-preview bot

Loading

henbos commented Oct 28, 2023 •

edited

Loading

henbos commented Oct 28, 2023 •

edited

Loading

henbos commented Oct 30, 2023 •

edited

Loading

henbos commented Nov 8, 2023 •

edited

Loading

henbos commented Nov 13, 2023 •

edited

Loading

henbos commented Nov 16, 2023 •

edited

Loading

youennf Nov 16, 2023 •

edited

Loading

henbos Nov 22, 2023 •

edited

Loading