Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web Audio API: RenderCapacity API #843

Open
hoch opened this issue May 10, 2023 · 10 comments
Open

Web Audio API: RenderCapacity API #843

hoch opened this issue May 10, 2023 · 10 comments

Comments

@hoch
Copy link

hoch commented May 10, 2023

I'm requesting a TAG review of RenderCapacity API.

Generally, the Web Audio renderer’s performance is affected by the machine speed and the computational load of an audio graph. However, Web Audio API does not expose a way to monitor the computational load, and it leaves developers no options to detect glitches that are the essential part of UX in audio applications. Providing developers with a “glitch indicator” is getting more important because the scale of audio applications grows larger and more complex. (Developers have been asking for this feature since 2018.)

Further details:

  • [v] I have reviewed the TAG's Web Platform Design Principles
  • Relevant time constraints or deadlines: 2023 Q2~Q3
  • The group where the work on this specification is currently being done: W3C Audio WG
  • The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): N/A
  • Major unresolved issues with or opposition to this specification: N/A
  • This work is being funded by: N/A

We'd prefer the TAG provide feedback as (please delete all but the desired option):
💬 leave review feedback as a comment in this issue and @hoch @padenot

@hober
Copy link
Contributor

hober commented Aug 2, 2023

How are you defining "load" (as exposed in averageLoad and peakLoad? Is this "unix load average as reported by /usr/bin/w? Is there an equivalent concept on non-Unix platforms?

@padenot
Copy link

padenot commented Aug 3, 2023

Audio systems, when rendering an audio stream, typically work with a synchronous audio callback (called in the spec a system level audio callback), that is called in an isochronous fashion, by the system, on a real-time thread, with a buffer of n frames, that the program must fill entirely, and the return as soon as possible.

This callback, called continuously during the lifetime of the audio stream, returns to the system with audio samples, and then hands it off to the rest of the OS. This audio might be post-processed and is usually output on an audio output device, such as headphones or speakers.

Let frames[i] be the number of frames that has to be rendered by the callback on the i-th iteration (a few hundreds to a couple of thousands is typical in this situation)
Let sr is the sample-rate at which the audio system runs (44100Hz or 48000Hz are typical values)
Let r[i] the time, in seconds, it took to render n frames this time. This is in other words, the execution time of the callback

frames[i] / sr is a number of audio frames, divided by the sample-rate, so it's a duration in seconds. It's the duration a buffer of frames[i] samples takes to be played out.

The load for this render quantum is:

load[i] = r[i] / (frames[i] / sr)

In a nominal scenario, the load is below 1.0: it took less time to render the audio than it takes to play it out. In an overload scenario (called under-run in the audio programming jargon), the load can be greater than 1.0. At this point, it is expected that the user will hear audio dropouts. This provokes discontinuities in the audio output and is very noticeable.

Because the time it takes to render the audio is usually directly controllable by authors (for example, by deciding to reduce the quality of some parts of the audio processing graph, that are less essential than others for the application), authors would like to be able to observe this load.

A real-life example that could benefit from this new API would be the excellent https://learningsynths.ableton.com/. If you open the menu by clicking the icon on the top left (on desktop), and scroll down this panel, you see that the render quality is controllable.

Similarly, it's not uncommon for digital audio workstations or other professional audio software to display a load indicator in their user interface, to warn the user that there's too much processing for the system in its current configuration.

In the Web Audio API spec, this is defined in the section Rendering an audio graph.

@torgo torgo modified the milestones: 2023-08-28-week, 2023-09-04-week Sep 3, 2023
@cynthia
Copy link
Member

cynthia commented Sep 7, 2023

This, compute pressure, and the worker QoS proposal seems to be all somewhat connected in terms of serving this kind of compute time guarantee needs (or lack of guarantee thereof) - would it make sense to distill some common patterns out of this for consistency?

@hoch
Copy link
Author

hoch commented Sep 7, 2023

That's an interesting suggestion. However, the level of precision in Compute Pressure API is not enough (4 buckets) and the design of the Worker QoS proposal seems quite distant from this API. (i.e. you're setting the option at construction time)

Based on the developer survey we conducted, the bucket size 4 is not suitable for anything useful. Another approach that we're discussing at the moment is using the strong permission signal (e.g. microphone) to allow the full precision of capacity value. Conversely, the API only offers limited buckets (~10) without explicit user permission.

Screenshot 2023-09-07 at 10 39 54 AM

@cynthia
Copy link
Member

cynthia commented Jan 23, 2024

Sorry for the long delay. We've discussed this during our F2F, and having some level of consistency/interoperability between this proposal and compute pressure would be a better architectural direction. (Setting aside QoS and how to make that proposal consistent, as it seems much earlier stages)

Some questions for you:

  1. Can you consider to have a common interface for pressure signal shared between Compute Pressure and RenderCapacity? If not, why?
  2. Can you consider using an Observer for your use case? If not, why? (We will ask the opposite about Compute Pressure and events)
  3. Where does the working group stand with respect to limiting the granularity of the pressure? Do you have agreement about limiting granularity in the absence of some gating function, like gaining permission for microphone or similar?

With all of these questions, we think the use cases are valid so no questions there.

@kupix
Copy link

kupix commented Feb 21, 2024

There is a way to estimate render capacity that works today: capture timestamps before and after processing on the audio thread. This method isn't without challenges: timestamps can only be captured with 1ms precision (thanks Spectre) and the audio chunk rate may beat with the 1kHz timestamp clock. So aggregation/estimation requires a period of the order of 1 second to stabilise although this could probably be improved with more sophisticated timestamp processing.

See it in action: https://bungee.parabolaresearch.com/bungee-web-demo.

There may be a further challenge with adapting processing complexity according to render capacity. Occasionally something (browser or OS) seems to detect a lightly used thread and either move it to an efficient or low-clocked core. So, paradoxically, faster render code can sometimes result in increased render capacity. This is a "denominator problem" that needs more study.

Simple sample below (simpler averaging than link above).

class NoiseGenerator extends AudioWorkletProcessor {
  constructor() {
    super();
    this.active = this.idle = 0;
  }

  process(inputs, outputs, parameters) {
    const start = Date.now();
    if (this.idle) {
        this.idle += start;
        console.log("Render capacity: " + 100 * this.active / (this.active + this.idle + 1e-10) + "%");
    }
    this.active -= start;

    // generate some noise
    for (let channel = 0; channel < outputs[0].length; ++channel)
      for (let i = 0; i < outputs[0][channel].length; ++i)
        outputs[0][channel][i] = Math.random() * 2 - 1;

    const finish = Date.now();
    this.active += finish;
    this.idle -= finish;

    return true;
  }
}

registerProcessor('noise-generator', NoiseGenerator);

@plinss plinss removed this from the 2024-01-23-f2f-London milestone Mar 11, 2024
@torgo torgo added this to the 2024-03-18-week milestone Mar 17, 2024
@LeaVerou
Copy link
Member

Hello there! We looked at this today during a breakout.

Other TAG members will comment with other components of the review, but we had some questions wrt API design. We need to better understand how this API fits in to the general use cases where it will be used. Currently, the explainer includes a snippet of code showing this in isolation, where it is modifying parameters in the abstract. What is the scope of starting and stopping this kind of monitoring for the use cases listed? Are authors expected to listen continuously or sample short periods of time (because monitoring is expensive)?

If they are expected to listen continuously, then what is the purpose of the start() and stop() methods? If their only purpose is to set the update interval, that could be a property directly on AudioContext (in which case the event would be on AudioContext as well and would be named in a more specific way, e.g. rendercapacitychange).

We were also unsure what the update interval does exactly. Does it essentially throttle the event so you can never get more than one event per that period? Does it set the period over which the load is aggregated? Both? Can you get multiple update events without the load actually changing?

Lastly, as a very minor point, change is a far more widespread naming convention for events compared to update, see https://cdpn.io/pen/debug/dyvGYoV update makes more sense if the event fires every updateInterval regardless of whether there was a change, but it produces better DX to only fire the event when the value has actually changed so that every invocation is meaningful.

We were also wondering how this relates to #939 ?

@LeaVerou LeaVerou added Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review and removed Progress: in progress labels Mar 25, 2024
@martinthomson
Copy link
Contributor

An addendum on security... We think that the general approach to managing side-channel risk is acceptable.

Overall, fewer buckets would be preferable; at most 10, though preferably 5. Though surveys indicate that some number of sites would be unhappy with fewer than 10 buckets, there is an opportunity to revise the number of buckets over time based on feedback on use. Increasing the number of buckets should be feasible without affecting site compatibility. Starting with a more private default is the conservative option. Increasing resolution carries a small risk in that change events are more likely to occur more often (see API design feedback).

More detail is ultimately necessary to understand the design:

  1. Is hysteresis involved?
  2. Is the reported value a maximum/average/percentile?
  3. What happens when the load exceeds 100%?

@martinthomson
Copy link
Contributor

@hoch, @padenot, do you have any feedback on the questions above?

@padenot
Copy link

padenot commented Jul 2, 2024

This is somewhat in pause for now at the Audio WG level, implementors aren't exactly sure how to ship this.

@plinss plinss removed this from the 2024-07-01-week:c milestone Aug 5, 2024
@torgo torgo removed the Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review label Aug 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests