Allowing decoupling of decoding and demuxing for images #205

padenot · 2021-04-29T15:59:21Z

There are a number of container formats that can hold a variety of different codecs’ coded data as well as the possibility of the same coded data appearing in a variety of containers.

If a demuxing API is later considered (as it's not infrequently brought up during calls and discussions with developers, #24 for tracking, certainly less urgent than decoders!), this could be handled by desugaring the image decoding API maybe ?

dalecurtis · 2021-04-29T20:26:10Z

Are you aware of any containers here that are not ISOBMFF? If not, AFAIK images are always stored muxed, only multi-frame content based on a multi-frame codec like HEVC, AV1, or H264 is stored demuxed -- but those are served by VideoDecoder.

So my take is that we probably don't need any changes to the ImageDecoder API, just a demuxing API and clients will use VideoDecoder for things that demuxed codecs and ImageDecoder for things that are still complete image files.

padenot · 2021-04-30T09:50:11Z

@baumanj, if you can maybe explain more precisely what you had in mind with this?

baumanj · 2021-04-30T20:10:56Z

WebP uses RIFF, which is not ISOBMFF. Also, since there's lots of useful information that can be derived from containers without ever doing decoding, it seems useful to make a separation to allow for greater forward compatibility. Add that to the fact that IP restrictions make formats like HEIC undecodable on many platforms, but the same container interpretation code could give access to useful metadata. Finally, I don't assume that ISOBMFF, which is fairly complex, not-free and heavyweight will be the only container any new formats ever use, so separating the ability to decode the image data one may store within it today would increase flexibility and innovation in the future.

dalecurtis · 2021-04-30T20:15:30Z

To me that sounds like you're just advocating for a containers API, which I agree is a nice value add. However, I can't think any concrete changes that we'd make to the ImageDecoder API which would help this use case. Can you give some examples of what you're thinking about?

I.e., in a world with a browser provided RIFF, ISOBMFF, $future_format demuxer that vends demuxed samples, what would we add to the ImageDecoder API that's not there already? Since we take a raw array buffer + mime type, I think we can decode whatever may come if it's appropriate to home it on ImageDecoder instead of say VideoDecoder.

baumanj · 2021-04-30T20:31:18Z

I'm not suggesting this needs to be part of the ImageDecoder API itself, but rather that implicitly including demuxing as a black box inside the decoder API makes it less useful and forward-looking. Wouldn't it be better to keep the decoder API smaller and simpler and break out the demuxing concerns to an API which is appropriately focused on the details relevant to containers?

For one, the ability to get metadata about the image(s) without doing the decode seems like a natural one. Given containers with the ability to have multiple images encoded at different sizes, color spaces, etc., there are many questions that could be asked of a container that I don't think it makes sense to add to a decoder interface which is more appropriately streamlined to f(coded image) → pixel data. Also, the same container could conceivably contain multiple coded image types only some of which the platform has decoder support for. How would the Image Decoding API handle that?

dalecurtis · 2021-04-30T20:46:36Z

Images and their containers are highly coupled in the by-far most common cases (GIF, JPEG, etc), so anything that separates those processes seems bound to incur more complexity and not less. I don't think it makes sense in the common case to force authors to go through a separate demuxing and decoding phase for images. Aside from the complexity this feels like a performance issue in the single image case. I talk about this in the explainer.

I 100% agree that a containers API and metadata extraction is very useful and something we should consider going forward, but I think it's orthogonal to what we have in the ImageDecoder API. I'm absolutely on board with exposing what metadata makes sense through the tracks API that ImageDecoder has. In your example we'd only expose the tracks the platform can decode.

baumanj · 2021-05-04T16:02:04Z

Images and their containers are highly coupled in the by-far most common cases

I definitely agree that is the case with the legacy formats, but since WebP, it looks like things are moving in a different direction. All the formats developed in the past decade likely to see broad use that I'm aware of (HEIC, AVIF, JPEG-XL, etc.) decouple the container from the codec. Also, it looks like images are tending to become more video-like in their development, so I would expect to see more of the mixing and matching of container and codec that's so common in that space.

I don't think it makes sense in the common case to force authors to go through a separate demuxing and decoding phase for images

I agree. There should definitely be a simple path for the common case of inputting the whole container and getting out something renderable. I'm just advocating for the inclusion of container-level awareness as a first class abstraction here since I believe it will smooth adoption of new formats.

dalecurtis · 2021-05-04T17:41:32Z

It sounds like you're just in favor of issue #24 then. Do you have any concrete proposals for how we should change ImageDecoder? If not I think we should close this issue and move discussion to #24.

Per #205 (comment) I think ImageDecoder can already do everything we'd want to do in a post-containers API world and VideoDecoder cover any gaps in the demuxed packets case.

baumanj · 2021-05-05T16:41:53Z

Do you have any concrete proposals for how we should change ImageDecoder?

I think the addition of a metadata query would be very useful. That wouldn't strictly be part of a container API since it's also germane to JPEG, PNG and other highly coupled formats like you mentioned, but going forward, that should almost certainly be a container-level operation to extract metadata like dimensions, bitdepth, colorspace, existence of alpha, exif data, etc.

VideoDecoder cover any gaps in the demuxed packets case

Can you elaborate?

dalecurtis · 2021-05-08T03:41:47Z

I think the addition of a metadata query would be very . That wouldn't strictly be part of a container API since it's also germane to JPEG, PNG and other highly coupled formats like you mentioned, but going forward, that should almost certainly be a container-level operation to extract metadata like dimensions, bitdepth, colorspace, existence of alpha, exif data, etc.

As mentioned above. I'm in total agreement on adding such metadata, I designed the ImageTrack interface for such things. Today it's just frame count and other simple data, but I envision it to hold all the things you're talking about and more. Metadata is decoded automatically in the current spec language. It is indeed a separate step from decoding.

So apologies, but I'm still confused on what your request is for the current API shape. Are you just suggesting those metadata fields? That seems a minor addition and less of a fundamental shape thing. Can you elaborate more?

VideoDecoder cover any gaps in the demuxed packets case

Can you elaborate?

I.e., in some future world where we have a WebContainers API (which we're all in favor of, just maybe not as part of WebCodecs or at least not in v1), you'd pass a bytestream to said container API and after some track selection, in addition to metadata, you'd get demuxed packets of codec XYZ. If that codec happens to be a video one, you can use the VideoDecoder API as it stands today. If it's an image one, there's no reason the current API shape can't accept those as a ReadableStream of bytes or typed chunks.

baumanj · 2021-05-11T19:05:18Z

So apologies, but I'm still confused on what your request is for the current API shape. Are you just suggesting those metadata fields? That seems a minor addition and less of a fundamental shape thing. Can you elaborate more?

I agree the shape is reasonable (providing metadata via the ImageTrack interface), but I didn't realize that was the intention because it doesn't have most of the kind of metadata I'd assumed would be included (dimensions, color space, transforms, etc.). Is there a reason that can't be included now?

dalecurtis · 2021-05-11T19:11:34Z

Sorry that wasn't clearer! The only reason is that we were trying to be conservative in what was exposed currently. We dropped exif rotation for now since we couldn't agree on the best way to expose it. Dimensions seems like an easy and non-controversial one to add now. Are there any others you'd prefer would be in WebCodecs v1?

Color space and transforms will need to wait until we figure out the right language for describing them, which may not be a part of v1 since we'll probably need to consider interplay between the new canvas color spaces and such.

chcunningham · 2021-05-12T05:14:53Z

Triage note: tentatively marking 'extension' as recent discussion proposes new attributes/metadata.

padenot · 2021-05-12T12:57:13Z

We dropped exif rotation for now since we couldn't agree on the best way to expose it.

Because <img> does it now, this needs to be figured out, otherwise we can't reimplement <img> with, say, ImageDecoder and canvas.

And even without considering that, it's not great to not be able to draw images in the right orientation...

dalecurtis · 2021-05-12T14:19:56Z

Sorry to be clear, we just dropped a public accessor for the exif rotation code metadata -- orientation works correctly. It's all handled under the hood (there are extensive WPT for this) just like it is for img.

baumanj · 2021-05-12T19:46:52Z

Another thing that occurs to me that maybe represents an actual difference to API shape. If ImageBufferSource can only be a containerized image (for formats which have containers), I worry that this will discourage innovation. We already have formats that can be used in various container contexts, and providing a decode-only interface would allow a consumer of this API to deal with the container details themselves instead of waiting for them to be implemented by the browser. I'm pretty sure CDNs are going to be interested in ways to slim down images into a format which is more minimalistic given that the defaults tend to prioritize flexibility over minimizing the byte overhead.

How would people feel about having an interface that can take a raw coded frame w/o metadata (other that what the codec defines) in addition to the convenience interface that can be passed the entire containerized image?

dalecurtis · 2021-05-12T19:52:45Z

That's what I was referring to with this above:

If that codec happens to be a video one, you can use the VideoDecoder API as it stands today. If it's an image one, there's no reason the current API shape can't accept those as a ReadableStream of bytes or typed chunks.

ReadableStreams are dynamically typed, so we can always allow a stream of chunks later on combined with a mime type requirement. I don't think we need this quite yet, but it wouldn't be breaking to add at any point in the future.

sandersdan · 2021-05-12T19:57:46Z

How would people feel about having an interface that can take a raw coded frame w/o metadata

We have an API for decoding raw frames, VideoDecoder. The problem is that advanced image formats don't have standardized raw formats, so we can't easily specify how you would ask VideoDecoder to do that work.

That's not in scope for WebCodecs V1, and I doubt that inventing bespoke formats is ever going to be in-scope for WebCodecs.

In cases where there is a standardized raw format, it would make sense for UAs to implement support for them. Whether those make more sense in ImageDecoder vs VideoDecoder would depend on the individual formats.

baumanj · 2021-05-18T17:07:23Z

We have an API for decoding raw frames, VideoDecoder. The problem is that advanced image formats don't have standardized raw formats, so we can't easily specify how you would ask VideoDecoder to do that work.

AVIF is based on AV1, which has a standardized, free-to-use, publicly available format, right?

That's not in scope for WebCodecs V1, and I doubt that inventing bespoke formats is ever going to be in-scope for WebCodecs.

I'm not clear what you mean by "bespoke" in this context. If you mean whatever a random individual creates, I agree that's not an important use case, but if you mean something that is early in the standards development process which hasn't had time to be implemented by all browsers, allowing early adopters to provide container interpretation while still leaning on a standard API to handle the decoding seems like a boon to innovation.

In cases where there is a standardized raw format, it would make sense for UAs to implement support for them. Whether those make more sense in ImageDecoder vs VideoDecoder would depend on the individual formats.

I'm not really clear whether you support adding an interface to decode raw (that is, demuxed) coded data to the image decoding API here or not.

sandersdan · 2021-05-18T17:18:00Z

AVIF is based on AV1, which has a standardized, free-to-use, publicly available format, right?

Yes, and AV1 is directly supported by VideoDecoder.

I'm not clear what you mean by "bespoke" in this context.

An example here would be PNG or JPEG. These formats are tightly coupled to their containers, so it's not clear what a "raw", uncontainered version of these would be. We could invent our own, which would be "bespoke".

dalecurtis · 2021-05-18T18:09:43Z

I'm not really clear whether you support adding an interface to decode raw (that is, demuxed) coded data to the image decoding API here or not.

To be clear: We believe such data should be processed by the VideoDecoder API, it's properly designed to handle all the intricacies of demuxed (implying configuration is separate -- a crucial detail) coded data. However we're not opposed to extending ImageDecoder to take a ReadableStream of EncodedVideoChunks for formats the user agent accepts in <img> if a strong enough use case is presented.

That said, I think our conversation is meandering quite a bit - to the point that I'm not really sure what we're discussing anymore. @baumanj can you please provide a concrete list of your requests? As far as I can tell, it seems you have two:

Accepting demuxed data in ImageDecoder.
Adding more metadata to the ImageTrack.

For the first I haven't heard any reasons why the VideoDecoder API is insufficient. For the second, we should split into individual issues for each piece of metadata you would like to add.

baumanj · 2021-05-19T17:31:00Z

I'm not clear what you mean by "bespoke" in this context.

An example here would be PNG or JPEG. These formats are tightly coupled to their containers, so it's not clear what a "raw", uncontainered version of these would be. We could invent our own, which would be "bespoke".

For containerless formats like PNG or JPEG, I'd say the demuxing operation is a noop, and the same content should be accepted as inputs to a theoretical "raw" input mechanism for the ImageDecoder API. Is there a downside to that?

That said, I think our conversation is meandering quite a bit - to the point that I'm not really sure what we're discussing anymore. @baumanj can you please provide a concrete list of your requests? As far as I can tell, it seems you have two:
* Accepting demuxed data in ImageDecoder.
* Adding more metadata to the ImageTrack.

I think that's a fair summary. Thanks for refocusing.

For the first I haven't heard any reasons why the VideoDecoder API is insufficient.

Implementation-wise, I expect the same underlying decoder libraries to be used for both ImageDecoder and VideoDecoder where appropriate, but I do not think it's appropriate for VideoDecoder to have full responsibility to decoding raw image data. The fact that several major recent still image formats are based on video codecs is a coincidence, not a fundamental property that should drive the shape of the API, I don't think. Would JPEG-XL be supported by VideoDecoder? Since we're talking about images, what's the downside of providing a facility within ImageDecoder for handling demuxed input?

For the second, we should split into individual issues for each piece of metadata you would like to add.

Sounds good

sandersdan · 2021-05-19T18:31:43Z

For containerless formats like PNG or JPEG, I'd say the demuxing operation is a noop, and the same content should be accepted as inputs to a theoretical "raw" input mechanism for the ImageDecoder API. Is there a downside to that?

I don't follow, ImageDecoder can already decode PNG and JPEG, so the existing API is already "raw" here.

The fact that several major recent still image formats are based on video codecs is a coincidence

I'd say we are seeing a bifurcation in image formats that is likely to continue in the future. These codec-based formats have features that align well with VideoDecoder, while non-codec formats align better with ImageDecoder.

What is a coincidence is that none of the non-codec formats can be meaningfully demuxed, but without an example of something different I don't think we're ready to propose an API for it.

Would JPEG-XL be supported by VideoDecoder?

I think JPEG XL falls squarely within the ImageDecoder feature set using the current API.

padenot added the image issues related to image decoding and encoding label Apr 29, 2021

chcunningham added the extension Interface changes that extend without breaking. label May 12, 2021

dalecurtis mentioned this issue Jun 14, 2021

Should WebCodecs be exposed in Window environments? #211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing decoupling of decoding and demuxing for images #205

Allowing decoupling of decoding and demuxing for images #205

padenot commented Apr 29, 2021

dalecurtis commented Apr 29, 2021

padenot commented Apr 30, 2021

baumanj commented Apr 30, 2021

dalecurtis commented Apr 30, 2021 •

edited

Loading

baumanj commented Apr 30, 2021

dalecurtis commented Apr 30, 2021

baumanj commented May 4, 2021

dalecurtis commented May 4, 2021

baumanj commented May 5, 2021

dalecurtis commented May 8, 2021 •

edited

Loading

baumanj commented May 11, 2021

dalecurtis commented May 11, 2021

chcunningham commented May 12, 2021

padenot commented May 12, 2021

dalecurtis commented May 12, 2021

baumanj commented May 12, 2021

dalecurtis commented May 12, 2021

sandersdan commented May 12, 2021

baumanj commented May 18, 2021

sandersdan commented May 18, 2021

dalecurtis commented May 18, 2021

baumanj commented May 19, 2021

sandersdan commented May 19, 2021

Allowing decoupling of decoding and demuxing for images #205

Allowing decoupling of decoding and demuxing for images #205

Comments

padenot commented Apr 29, 2021

dalecurtis commented Apr 29, 2021

padenot commented Apr 30, 2021

baumanj commented Apr 30, 2021

dalecurtis commented Apr 30, 2021 • edited Loading

baumanj commented Apr 30, 2021

dalecurtis commented Apr 30, 2021

baumanj commented May 4, 2021

dalecurtis commented May 4, 2021

baumanj commented May 5, 2021

dalecurtis commented May 8, 2021 • edited Loading

baumanj commented May 11, 2021

dalecurtis commented May 11, 2021

chcunningham commented May 12, 2021

padenot commented May 12, 2021

dalecurtis commented May 12, 2021

baumanj commented May 12, 2021

dalecurtis commented May 12, 2021

sandersdan commented May 12, 2021

baumanj commented May 18, 2021

sandersdan commented May 18, 2021

dalecurtis commented May 18, 2021

baumanj commented May 19, 2021

sandersdan commented May 19, 2021

dalecurtis commented Apr 30, 2021 •

edited

Loading

dalecurtis commented May 8, 2021 •

edited

Loading