Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio decoder output to regular buffers and not AudioBuffer #179

Closed
padenot opened this issue Apr 9, 2021 · 14 comments · Fixed by #228
Closed

Audio decoder output to regular buffers and not AudioBuffer #179

padenot opened this issue Apr 9, 2021 · 14 comments · Fixed by #228

Comments

@padenot
Copy link
Collaborator

padenot commented Apr 9, 2021

Quite a few programs or library don't really use the Web Audio API with the native nodes etc., and rely on either the AudioWorklet or even a ScriptProcessorNode to play their audio. All the DSP happens in user-code (probably WASM), because that's code that is already written, and the Web is "just" another target in this context.

It seems like we could add an output mode for our audio decoders that would write to a user-provided buffer, to minimize copy.

This would also sidestep the issue where (for now) the AudioBuffer object is always planar f32, and quite a few authors would like integers for memory footprint reasons, and also because that's what codec libraries frequently output, no need for a useless conversion.

Forcing the use of AudioBuffers (that have quite a few constraints) when any user of WASM probably has their own audio engine seem like a sub-optimal solution for them. This would also simplify the spec and the code I guess.

@sandersdan
Copy link
Contributor

sandersdan commented Apr 12, 2021

Agreed, AudioBuffer is limiting. AudioFrame was designed to minimize dependence on it (it only needs to be created when accessed) so we can add new ways to use the audio data in a backwards-compatible way.

We don't have a buffer pool API to hook this into, but there are other parts of WebCodecs that could benefit from such a thing, and I expect we'll work on that in v2.

Would you prefer it was changed to a getter method rather than an attribute?

@padenot
Copy link
Collaborator Author

padenot commented Apr 27, 2021

As discussed with @chcunningham, this is expected to be more important and useful for authors than using AudioBuffer, and this would be in addition to AudioBuffer, not instead, modulo the design point mentioned in https://github.com/WebAudio/web-audio-api-v2/issues/119#issuecomment-808330658.

This would remove the dependency on Web Audio API V2, and allow this to be self-contained. I'd be open for an implementation to ship with this and not the AudioBuffer variant, if the API allows for feature detection.

The general idea is to pass ArrayBuffers for input and output, detaching the buffers, and then giving back both buffers on output, to allow authors to implement their own recycling if they want. This is not a standard pattern (yet) but it should be, since it's the closest thing to having good memory access patterns until there is read only memory.

We might be able to do real-zero copy for WASM for the output if we do it correctly, but it means the API will have to diverge even more from what is usually being done (essentially passing in the Memory object, an offset, and a length).

@chcunningham
Copy link
Collaborator

chcunningham commented Apr 28, 2021

This issue has some overlap with a parallel discussion in #162 (comment)

TL;DR - AudioBuffer forces some snapshotting behavior ("acquire the content") which is not super intuitive.

What if we do something like this

] interface AudioFrame {
    constructor(AudioFrameInit init);

    readonly attribute long long timestamp;  // microseconds

    // Deprecated. Will make a lazy copy.
    readonly attribute AudioBuffer buffer;

    readonly attribute unsigned long length; // in sample-frames
    readonly attribute double duration; // in seconds
    readonly attribute float sampleRate; // in sample-frames per second

    // Channel access
    readonly attribute unsigned long numberOfChannels;
    void copyFromChannel(
        Float32Array destination,
        unsigned long channelNumber,
        optional unsigned long bufferOffset = 0);

  // Creates a AudioFrame, which needs to be independently closed.
  AudioFrame clone();
  void close();
};

Basically, hoist up the immutable AudioBuffer methods, deprecate AudioBuffer. This fixes some of our issues while saving the matter of transfering a BYOB for later.

@chcunningham
Copy link
Collaborator

The above needs sampleFormat

at first: planarfloat32 (we can bikeshed on the name)

discussed whether to bake "planar" or "interleaved" into single enum that simultaneously describes the format (float32)... no strong feeling. I lean toward doing it in combined enum (for rough symmetry w/ videoframe pixel format)

linear -vs- / alaw / ulaw:
resolved: lets only have linear in "frames". treat alaw/ulaw as "encoded" - would be acceptable in chunks (we'll decode to linear)

@chcunningham
Copy link
Collaborator

re: sample format, I propose we borrow the naming conventions used in ffmpeg, but drop the AV_SAMPLE_FMT prefix (analagous to the "I420" pixel format used in VideoFrame).

Something like this could be a good starting list:

U8,          ///< unsigned 8 bits
S16,         ///< signed 16 bits
S32,         ///< signed 32 bits
FLT,          ///< float
S16P,        ///< signed 16 bits, planar
S32P,        ///< signed 32 bits, planar
FLTP,        ///< float, planar

These are the formats Chromium seems to recognized from ffmpeg. LMK if this missing something obvious or if one of these really rare. My first PR will just add FLTP to keep it simple (matching what we already have w/ AudioBuffer).

@padenot
Copy link
Collaborator Author

padenot commented Apr 29, 2021

I was going to ask whether S32 was in fact used to represent 24-bits integer audio samples (that are quite common), but your link answers that (if we're doing it the same way ffmpeg does it, which I think is sensible, for the same reasons: there is no 24-bits type in WASM either, and nobody wants to do computations by hand).

@padenot
Copy link
Collaborator Author

padenot commented Apr 29, 2021

resolved: lets only have linear in "frames". treat alaw/ulaw as "encoded" - would be acceptable in chunks (we'll decode to linear)

to clarify with an example: ulaw would decode to 16-bits integer so that the dynamic range is not lost.

@tguilbert-google
Copy link
Member

Small comment on the proposed interface above:

Once buffer is removed, there will be no way to check if a frame is closed from JS. There should perhaps be an attribute indicating the frame is closed, and that calling copyFromChannel will throw.

chcunningham added a commit that referenced this issue Apr 30, 2021
The mutability of AudioBuffer was undesirable. Also, we like having mor
sample formats. See discussion in #179.
@padenot
Copy link
Collaborator Author

padenot commented Apr 30, 2021

there will be no way to check if a frame is closed from JS

isn't the "classic" way to check this is buffer.length == 0 ? I reckon it's not explicit per se, but it's what authors to today to check whether a buffer has been detached.

@tguilbert-google
Copy link
Member

there will be no way to check if a frame is closed from JS

isn't the "classic" way to check this is buffer.length == 0 ? I reckon it's not explicit per se, but it's what authors to today to check whether a buffer has been detached.

Yes. But if we remove buffer, in favor of copyFromChannel(), as per Chris' comment:

// Deprecated. Will make a lazy copy.
readonly attribute AudioBuffer buffer;

We won't be able to check this anymore. Furthermore, the act of checking whether a frame is closed or not could force the potential lazy copy.

@chcunningham
Copy link
Collaborator

For reference, I dug up the ECMAScript spec on detachment
https://tc39.es/ecma262/#sec-detacharraybuffer

WDYT of trying to follow the TypedArray style by similarly zeroing out analogous fields of AudioData (length, duration, sampleRate, numberOfChannels, ...)? We can make this a clear signal by spec'ing that it is otherwise invalid to construct an AudioData from an empty BufferSource.

@chcunningham
Copy link
Collaborator

For those reading along, the PR landed with unresolved issues (done to unblock approved dependent PRs). I'll send a follow up shortly.

@chcunningham chcunningham added this to the V1 Launch Blockers milestone May 5, 2021
@padenot
Copy link
Collaborator Author

padenot commented May 5, 2021

Thanks I opened #223 for following up with a link to the context from Dale, you and myself.

@chcunningham
Copy link
Collaborator

Closing this one, as follow up work now tracked in #223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants