Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Audio Stats] Disagreement about audio dropped counters #129

Open
henbos opened this issue Nov 13, 2023 · 19 comments
Open

[Audio Stats] Disagreement about audio dropped counters #129

henbos opened this issue Nov 13, 2023 · 19 comments
Assignees

Comments

@henbos
Copy link
Contributor

henbos commented Nov 13, 2023

The spec currently describes bout deliveredFrames and totalFrames, allowing the app to calculate:

droppedFrames = totalFrames - deliveredFrames

There is disagreement about whether measuring this is valuable.

@henbos
Copy link
Contributor Author

henbos commented Nov 14, 2023

CC @jan-ivar @alvestrand @youennf for visibility. I'll try to summarize the positions, please correct me if I am characterizing anyone's position in a way they would not sign off on themselves:

Mozilla's position (@padenot) is that frames being dropped for any reason is a browser bug, even on poor performance devices under heavy load, and should never happen. But if it for some unexplainable reason does happen it is not actionable by the app anyway. Debugging browser bugs can be done with telemetry, developer tools, UMAs, etc and does not need to be exposed to JavaScript. It is assumed that all user agents use real time threads and that real time threads ensure best quality of service possible.

Google's position (@o1ka) is that while dropping frames is very rare on an individual basis, when this does happen we want to know about it. In aggregate having this information is important for A/B testing and we already have real world data that an app improving its performance such as changing send codec does have a positive impact on glitch metrics. Even on an individual basis, when users report poor audio quality in the wild having measurements help understand issues better (which part of the pipeline is contributing to what as uploaded from a real world example). This helps understanding whether there are issues with user agents, OSes or devices.

@henbos henbos self-assigned this Nov 14, 2023
@henbos
Copy link
Contributor Author

henbos commented Nov 16, 2023

There will be some time dedicated to discuss this at the Virtual Interim on Tuesday

@henbos
Copy link
Contributor Author

henbos commented Nov 16, 2023

@youennf mentioned Safari can drop audio frames too based on timestamps IIUC. Feel free to clarify here. You asked about Chrome, and Chrome can detect OS dropping audio frames based on timestamp (same as Safari I think?), but drops can also occur due to audio processing or IPC message handling as other examples mentioned by the people I talked to.

@vr000m
Copy link

vr000m commented Nov 19, 2023

In my opinion, this measures frames captured by device and delivered to sink, i.e., the webrtc pipeline before encoding takes place. Ergo knowing that the issue happened at the between the device and the webrtc layer on the sender is really important. The simplest thing that the app can do as a consequence of this information is show an alert that significant frames are being lost. The user can then decide what to do next, savvy users may close a few apps, some other may choose to move to a different device.

Since this is happening before encoding, it is harder to decide if switching to a lower complexity codec would be the thing to do, at least it is not straightforward to concur that the glitches are entirely CPU related.

@vr000m
Copy link

vr000m commented Nov 19, 2023

On a related question to bluetooth mics --
I am assuming that the wireless issues will not be reported because the frames metrics are measured after the bluetooth delivers frames to the browser/UA?

@o1ka
Copy link

o1ka commented Nov 20, 2023

On a related question to bluetooth mics -- I am assuming that the wireless issues will not be reported because the frames metrics are measured after the bluetooth delivers frames to the browser/UA?

In general UA can detect if there are frames dropped by the device, based on callback time stamps/buffer sizes/sample rate; unless the platform audio layer masked those drops somehow.

@padenot
Copy link

padenot commented Nov 20, 2023

In my opinion, this measures frames captured by device and delivered to sink, i.e., the webrtc pipeline before encoding takes place. Ergo knowing that the issue happened at the between the device and the webrtc layer on the sender is really important. The simplest thing that the app can do as a consequence of this information is show an alert that significant frames are being lost. The user can then decide what to do next, savvy users may close a few apps, some other may choose to move to a different device.

It's better than the UA shows that there is a problem. It's even better for the problem to not happen, or for the UA to fix the problem, by e.g. increasing buffer sizes transparently or switching to e.g. a cheaper AEC algorithm or something.

But I have a hard time believing that a device that is capable of doing WebRTC calls cannot do glitchless audio IO and voice processing, regardless of the load. And even if load is high, cutting on video encoding is going to be a lot more efficient and a lot better for call quality than anything related to audio. Audio takes precedence, always has, always will.

Since this is happening before encoding, it is harder to decide if switching to a lower complexity codec would be the thing to do, at least it is not straightforward to concur that the glitches are entirely CPU related.

The important point here is that all of this (audio input, processing and output) is supposed to be happening all on real-time threads.

If changing something else that is non-real-time on the system affects something that is real-time, and the machine isn't critically under load already with other real-time threads, then it's an implementation bug. It seems to be that this is known to happen in Chrome, it's therefore time to fix it, instead of adding an attribute to the web platform.

@youennf
Copy link
Contributor

youennf commented Nov 20, 2023

I forgot to comment on this thread.

In Safari, I think we see dropping video frames as a bug (there is no buffering), so we would prefer not exposing this value, or at least understand when this could actually happen.

With regards to audio, I concurr with @o1ka. The API that Safari makes audio glitches possible and detectable. I am not exactly sure of the conditions it could happen, a crash of the audio daemon, buggy drivers, BT intermittent connectivity maybe. These are probably edge cases but may happen in practice. I heard that Chrome for instance was using at some point a dedicated audio process to prevent audio bugs from crashing the web app or Chrome. In that case, I would believe glitches to happen.

@padenot, FWIW, I do not think the intent for any of these stats is for the web app to do anything but passive logging, these are only stats.
This is the case for audio glitches, as well as latency. As I said before, I would prefer web audio to expose actual latency so that the computation would be from microphone to web audio sink, instead of microphone to virtual audio sink, and in the place (AudioWorkletGlobalScope?) where it makes most sense.

@padenot
Copy link

padenot commented Nov 20, 2023

In Safari, I think we see dropping video frames as a bug (there is no buffering), so we would prefer not exposing this value, or at least understand when this could actually happen.

This isn't about video, but I'm not sure if you meant video based on what you said?

With regards to audio, I concurr with @o1ka. The API that Safari makes audio glitches possible and detectable. I am not exactly sure of the conditions it could happen, a crash of the audio daemon, buggy drivers, BT intermittent connectivity maybe. These are probably edge cases but may happen in practice. I heard that Chrome for instance was using at some point a dedicated audio process to prevent audio bugs from crashing the web app or Chrome. In that case, I would believe glitches to happen.

Same for Firefox, anything can happen if the system is broken. My point is that this is an implementation bug, and that the Web Application cannot do anything meaningful and shouldn't attempt to do so. UAs can use telemetry to track those bugs and fix them if possible.

@padenot, FWIW, I do not think the intent for any of these stats is for the web app to do anything but passive logging, these are only stats.
This is the case for audio glitches, as well as latency. As I said before, I would prefer web audio to expose actual latency so that the computation would be from microphone to web audio sink, instead of microphone to virtual audio sink, and in the place (AudioWorkletGlobalScope?) where it makes most sense.

As said elsewhere multiple times, latency is essential to write an audio application that performs recording from an input device. If you don't have this number, you cannot write an app that is acceptable for a user, because all audio tracks that have been recorded are going to be shifted a bit. It is also essential that this number is available separately from the audio output latency (which has been available for years). Latency is very much a requirement that is used for something else than logging.

https://github.com/w3c/mediacapture-extensions/pull/124/files#r1399495694 has two links, one of which going in depths about this (a conference talk).

@youennf
Copy link
Contributor

youennf commented Nov 20, 2023

As said elsewhere multiple times, latency is essential to write an audio application that performs recording from an input device.

I agree with this. If we put it in MediaStreamTrack we have to invent a virtual sink, since a track might have no sink at all, or might have multiple sinks. By exposing it in say MediaStreamAudioSourceNode, we know the source and the sink, we know we want fresh values very often, the contract is very clear.

AFAIK, the driver force for this API is stats, where there is no interest in getting a precise latency computation, this is more about latency metrics.
Why should we consider putting latency for audio processing apps in MediaStreamTrack if there is a better place where we are more confident about the definition and interoperability?

@o1ka
Copy link

o1ka commented Nov 21, 2023

Same for Firefox, anything can happen if the system is broken. My point is that this is an implementation bug, and that the Web Application cannot do anything meaningful and shouldn't attempt to do so. UAs can use telemetry to track those bugs and fix them if possible.

Why should we deny the app to have its own telemetry for its critical paths though? UA's telemetry does not have the context the app does, and is not necessarily sensitive enough for specific use cases the app invests into.

@fippo
Copy link

fippo commented Nov 21, 2023

Regarding the "use telemetry" argument we had a very good example of where telemetry was not helping a few years back:
https://webrtchacks.com/troubleshooting-unwitting-browser-experiments-al-brooks/
This was a severe bug limited to headsets which were rare in the overall population of microphones but very common in the particular context of call centers.
Browser telemetry was unable to capture this. In the scope of just the service itself such metrics would be highly useful and actionable. It will help browsers to identify devices that are problematic and ultimatively improve the quality of experience for users.

@steely-glint
Copy link

steely-glint commented Nov 21, 2023

To my (down in the weeds) point in Nov WG session about the page raising ptime to avoid a problem.

A sensible implementation might try to capture interval match the ptime - I have no idea if user agents do this, but some antique
code I wrote a long time ago attempted to do that on iOS :
AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration, sizeof(preferredBufferSize), &preferredBufferSize);

In which case changing ptime from 20ms to 60ms would reduce the load on the capture thread by 60% - which might plausibly reduce frame loss.

So I think it should be exposed.

@youennf
Copy link
Contributor

youennf commented Nov 22, 2023

I did a quick test and BT connectivity issues can trigger audio frame drops, observable by native apps.
The web page could ask whether the user wants to switch to another hopefully more stable microphone, especially if it is known to be BT.
This is one example where this stat could directly improve user experience.

@henbos
Copy link
Contributor Author

henbos commented Nov 22, 2023

This would be incredibly valuable to Google Meet and other apps. Thanks for checking, Youenn. I should have included it in the slides. FTR, we do get a lot of user reports from frustrated bluetooth users.

@youennf
Copy link
Contributor

youennf commented Nov 22, 2023

FTR, we do get a lot of user reports from frustrated bluetooth users.

If that is frequent enough, I wonder whether stats monitoring is the best API to solve that issue.
Maybe this warrants a dedicated event based API.

@henbos
Copy link
Contributor Author

henbos commented Nov 22, 2023

I don't think an event is enough, you can have a lot or a little issues so you definitely don't just want a boolean that fires when some random threshold is reached. Even if we have an event I don't see that replacing the need for the underlying metrics.

@alfredh
Copy link

alfredh commented Nov 22, 2023

I would like to support this proposal, and confirm that the issue is real, in the wild.

Some reasons for audio glitches:

  • DSP clock is running at almost samplerate, e.g. 48000.0001 Hz
    This will cause dropped samples on a regular basis.

  • Application cannot provide audio samples in time, to be played
    out on the DSP soundcard. For playback, sending null samples
    is possible, but it should still be counted as a "glitch".

  • The hardware is broken.

  • The audio driver software is buggy and/or broken.

  • Other unknown reasons outside our control.

I think a good way to collect this info is number of samples
dropped, this is an integer number. For example
count number of overruns/underruns for playback/recording.
Some audio drivers can provide this stats, but not all.

Here is an example with Portaudio:

https://files.portaudio.com/docs/v19-doxydocs/portaudio_8h.html#a8a60fb2a5ec9cbade3f54a9c978e2710

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants