Add Image Decoding, associated interfaces and algorithms #152

chcunningham · 2021-03-17T02:51:33Z

Fixes #50.

chcunningham · 2021-03-17T02:55:57Z

@dalecurtis, mind taking a first pass? Interface wise, this is everything we discussed. But behind those friendly interfaces hides quite a bit of state.

@aboba, FYI - will request review formally once Dale has had a go.
@padenot FYI

index.src.html

chcunningham

Thanks @dalecurtis, great feedback.

index.src.html

Also includes other minor fixes.

chcunningham · 2021-03-19T21:49:10Z

@cconcolato, we're interested to get your take how ImageDecoder describes a list of of ImageTracks (ImageDecoder.tracks). We think it maps well onto video-based image formats like avif (given that "tracks" is a longstanding concept for video), but there's some debate about whether it is over engineered. That is, do formats like avif support an arbitrary number of tracks? If we instead expect all image formats to define at most 2 tracks (one animated, one still), then we could simplify the API to remove all mention of tracks and make selections simply by giving a value for preferAnimation.

chcunningham · 2021-03-23T20:09:18Z

@aboba, I think Dale's review is sufficiently concluded for you to begin. Standing by for questions and feedback.

cconcolato · 2021-03-23T21:44:01Z

@cconcolato, we're interested to get your take how ImageDecoder describes a list of of ImageTracks (ImageDecoder.tracks). We think it maps well onto video-based image formats like avif (given that "tracks" is a longstanding concept for video), but there's some debate about whether it is over engineered. That is, do formats like avif support an arbitrary number of tracks? If we instead expect all image formats to define at most 2 tracks (one animated, one still), then we could simplify the API to remove all mention of tracks and make selections simply by giving a value for preferAnimation.

@chcunningham Not sure I have enough information to answer. Let me try.
First, I'm a bit confused when you say "at most 2 tracks (one animated, one still)". In AVIF or HEIF-based formats, there are 2 distinct concepts: image items and image sequence tracks. One image item represents one image (ignoring scalable, layered image items). A image sequence track contains multiple images that are meant to be displayed in sequence, following some optional timing information (duration, and loop). Of course, if it helps the design of the API and the implementations, you could expose an item as a single-image track.

Then, an ISOBMFF file can indeed contain many tracks. For example, an mp4 file could contain an AV1 video track, an audio track, a subtitle track. It could also contain an image sequence track (e.g. using only the key frames of the video track) meant to be viewed as a GIF-like animation before the video is clicked. It could also contain one image item to give a representative image of the video. Theoretically, you can construct files with as many tracks and items as you want. It could have N video tracks, K audio tracks, P image sequence tracks, Q subtitle tracks, R image items, etc... In practice, typical files will have simple configurations.

Reading:

[[animated]]
Indicates whether this track contains an animated image with multiple frames.
[[frame count]]
The number of frames in this track.
[[repetition count]]
The number of times the animation is intended to repeat.

It seems to me that you want to expose only image items and image sequences (which is fine). I'm curious how an implementation is supposed to decide if it exposes a sequence of images as a video track or as an image track. Do you expect the track container to guide the implementation? For example, in ISOBMFF, video tracks and image sequence tracks are differentiated by the track handler vide vs. pict.

In practice, I don't think files containing images will contain more than one image sequence. As for image items, there are use cases where people think about storing multiple image items in the same file. This is for example because they are the result of a capture burst, or bracketed images, or multi-angle, multi-view images, or even to package multiple resolutions in the same file. But these use cases are rather rare IMHO. If you want to keep the API simple for now, the preferAnimation approach seems reasonable and matches the hypothetical reader API that MIAF defines:

Inputs to a MIAF reader are:
— a file with a FileTypeBox containing at least one brand specified in this document;
— optionally one of the following:
— item_ID of the item to be output (psItemId),
— track_ID of the track to be output (psTrackId),
— a selection between a static image (psImagePreferredFlag equal to 1) or track (psImagePreferredFlag equal to 0) to be output,
— optionally constraints, such as the maximum width and height of an image item or track;
— optionally one or more of the following roles of the image or track to be output:
— master (default),
— thumbnail,
— auxiliary, which may be further classified by the type.

Maybe you want to consider adding preferThumbnail?

@joedrago may have additional feeback based on the libavif API and its integration in browsers
@dwsinger may have additional feedback based on ISOBMFF/HEIF/MIAF specs.

index.src.html

chcunningham · 2021-03-24T06:19:48Z

Thank you @cconcolato!

It seems to me that you want to expose only image items and image sequences (which is fine).

Yes. And I'm calling both a "track", where an image item is just a track with one frame.

I'm curious how an implementation is supposed to decide if it exposes a sequence of images as a video track or as an image track. Do you expect the track container to guide the implementation? For example, in ISOBMFF, video tracks and image sequence tracks are differentiated by the track handler vide vs. pict.

When you say "video track", I take it you mean an ImageTrack for which track.animated=true. If so, the my intent with animated is to provide an early signal that this track will have a frameCount > 1. Ideally we would do away with animated entirely and just have frameCount, but frameCount is not always known at the outset (particularly for gif), so this may cause folks to prematurely consider a track with 1 frame as non-animated.

This is for example because they are the result of a capture burst, or bracketed images, or multi-angle, multi-view images, or even to package multiple resolutions in the same file. But these use cases are rather rare IMHO. If you want to keep the API simple for now, the preferAnimation approach seems reasonable and matches the hypothetical reader API that MIAF defines:

An earlier draft did just have preferAnimation without the tracks mechanism. But then we had the frameCount and animated properties directly on ImageDecoder. This works, but its limiting if we later do want to add some description of alternative tracks. I noticed that html has long defined AudioTrackList and VideoTrackList interfaces, so this seemed like a pattern we might follow with ImageTrack(List).

Maybe you want to consider adding preferThumbnail?

Could do. What happens when preferThumbnail and preferAnimated compete? Maybe prefer should be an enum w/ either type?

index.src.html

cconcolato · 2021-03-26T01:55:54Z

When you say "video track", I take it you mean an ImageTrack for which track.animated=true.

No I really meant a VideoTrack. This spec defines an ImageTrack and that the HTML spec defines a VideoTrack. When both will be implemented, and the browser is presented with an MP4 video track or MP4 image sequence track, how will it decide if it uses a VideoTrack or an ImageTrack?

Could do. What happens when preferThumbnail and preferAnimated compete? Maybe prefer should be an enum w/ either type?

Maybe, no strong opinion.

chcunningham · 2021-03-26T20:19:00Z

No I really meant a VideoTrack. This spec defines an ImageTrack and that the HTML spec defines a VideoTrack. When both will be implemented, and the browser is presented with an MP4 video track or MP4 image sequence track, how will it decide if it uses a VideoTrack or an ImageTrack?

Ah, I follow. Do you expect to see files like this in the wild? How often? Would it be reasonable to describe these with the image/* mimetype (vs video/*)?

cconcolato · 2021-03-30T16:18:18Z

I can't predict the future, but I could envisage people creating dual-headed files (with an image sequence track and a video track, possibly sharing coded frames) to be used in both <img> and <video>. The same file could be served with either MIME type.

chcunningham · 2021-03-31T06:10:38Z

I can't predict the future, but I could envisage people creating dual-headed files (with an image sequence track and a video track, possibly sharing coded frames) to be used in both and

Thanks. Probably you meant <video> and <img>. I think its reasonable for us to draw the lines like so:

ImageDecoder ignores the VideoTracks in these files. If you're using ImageDecoder, we assume you want image things.
You can still decode the video content for such files using WebCodecs, but this requires the app perform the typical demuxing and creation of {{EncodedVideoChunk}}s

chcunningham · 2021-04-09T07:53:29Z

@aboba, just checking in - any feedback? @padenot I know your feedback is still being compiled.

mathiasbynens · 2021-04-09T14:39:37Z

index.src.html

+  undefined reset();
+  undefined close();
+
+  static Promise<boolean> isTypeSupported(DOMString type);


Why is this a promise-based API? Can we instead return the boolean synchronously? https://github.com/dalecurtis/image-decoder-api/issues/6

This was made promise as we anticipate cases where the UA may not synchronously have the answer. As image formats have started to use video codecs, decoding an image may require instantiating video decoders backed by platform APIs that. A browser architecture may be such that these APIs are called in a separate process, sandboxed for improved security. Supported types then becomes a question for these same APIs, which is implemented via async IPC.

Earlier media capability detection APIs, <video>.canPlayType(), MSE.isTypeSupported(), and WebRTC's RTCRtpSender.getCapabilities() have all been sync. This has at times been problematic for implementers. Newer APIs like MediaCapabilities.decodingInfo() were made async. Note that the other isConfigSupported() interfaces in WebCodecs are also async.

I understand that actual image decoding might depend on another process, but answering the question "do I know what to do with this image type at all (without necessarily doing the work)" seems orthogonal to that. Couldn't browsers just maintain a list of supported image types and synchronously check it whenever isTypeSupported is called?

If not, then have we considered changing the name of this API? IMHO it's surprising to give it the same name as an existing API without matching the signature of its return value.

I understand that actual image decoding might depend on another process, but answering the question "do I know what to do with this image type at all (without necessarily doing the work)" seems orthogonal to that. Couldn't browsers just maintain a list of supported image types and synchronously check it whenever isTypeSupported is called?

If the browser entirely relies on the OS to provide the codec, it may not be possible to know statically what codecs are supported (particularly for newer formats). Instead, we may be forced to query OS apis that are adjacent to the actual decoding APIs. Often this involves the same async IPC to a privaledged sandboxed process.

If not, then have we considered changing the name of this API? IMHO it's surprising to give it the same name as an existing API without matching the signature of its return value.

Open to suggestions. I liked this name for its similarity actually. It is performing essentially the same function as its predecessor. I think any confusion would be pretty immediately resolved at dev-time.

If the browser entirely relies on the OS to provide the codec, it may not be possible to know statically what codecs are supported (particularly for newer formats). Instead, we may be forced to query OS apis that are adjacent to the actual decoding APIs. Often this involves the same async IPC to a privaledged sandboxed process.

In my experience, this is mostly true for video, in the sense that it's well possible that the device that allows (say) power or cpu-efficient decoding is simply physically removed, and so that it's impossible to store the capabilities somewhere on the browser for synchronous access.

Do we have any evidence of a similar constraint for images? In our experience, hardware decoding for images has this problem where the setup time drawfs the decoding time, and the setup time need to happen per image (with a possible edge case when lots of images have the same dimensions/format maybe?).

At least in Chrome, in the event of a gpu process crash, it is possible support for some formats is no longer available.

Otherwise I agree, we're just making this async for symmetry and a hypothetical. Maybe @jernoble or @aboba want to chime in to avoid a decision that would preclude any formats they might want to support.

@dalecurtis what formats aren't supported in Chrome when the GPU process crashes?

In the hypothetical world where a browser only has support for HEIC via a platform decoder, there is a case where repeated GPU crashes may put the browser into a software-rendering/no-gpu mode that prevents access to the platform decoder.

As a practical example, in some cases the platform decoder may have hard limits on the number of instances available and/or may not be working reliably (e.g., it's hanging or crashing too frequently) to the point that it's disabled at runtime. This happens today in Chrome on Android for H.264 support.

Ok, I think it's unlikely that a browser would choose to support an image format and not have a software fallback, but even if such a browser did exist the web author would still need to deal with decoding support going away after they've already checked for support withisTypeSupported().

Practically we don't have software fallback on Android for H.264 today -- so that at least is a real case. In the case where the codec goes away during usage the decoder would trigger a decoding error. The same would occur if the loss occurred between construction and the first decode call.

I think authors would have to handle this case regardless of if iTS is sync or not though. For security reasons (e.g., malicious invocation of the platform decoder), even if software fallback is available, it's unclear that automatic fallback is the right operation.

Ultimately my initial implementation was always synchronous, so if everyone wants this to be sync and isn't swayed by the hypotheticals I don't mind switching it back. I believe @chcunningham only suggested it for symmetry with the rest of the WebCodecs APIs.

Queue task to establish tracks upon construction. As this is no longer user driven, we replace the method with an attribute to let users know when the track list is "ready". Establishing tracks was previously user driven by a call to decodeMetadat(). After further consideration, this seems needlessly complex. Decoding track metadata is not resource intensive and we expect that most usage of ImageDecoder is such that they wish to decode actual frames right away. Users who wish to defer decoding track ImageData may still do so by deferring construction of the ImageDecoder. This commit also includes other small fixes (formatting, output timestamp and duration, and closing ImageDecoder if we fail to establish tracks once data is "complete").

chcunningham · 2021-04-23T19:57:35Z

@padenot @aboba - this PR is over a month old. I've iterated on the design only very slightly since it's original posting (incorporating feedback from others). If no objection, I will go ahead and merge by EOD Wednesday 4/28.

padenot · 2021-04-27T17:42:18Z

We have quite a few feedback items already, and this is being circulated through various teams internally at Mozilla. Because of the type and number of comments we have, I think it's best to fix a few immediate things here (not a lot), merge, and then I'll open a series of issues on this repo, tagged with an appropriate tag (image-decoding?), so that people that mostly care about images can quickly follow only the issue about this. Would that be appropriate? I think it would make the discussions a bit more structured and workable.

That said, taking more than a month to review a fundamental new way to decode image on the web while the space hasn't really moved for years, while quite a few image formats are being added currently, all with the peculiarities and possible optimizations to make this a really compelling option for authors and unlocking new classes of apps is not exactly a long time.

dalecurtis · 2021-04-27T17:47:38Z

Thanks Paul! I look forward to your reports. Your proposal sgtm.

While the PR may have only been out for a month, the API hasn't changed materially around the primary use case from the initial explainer and that's been circulating for over a year.

chcunningham · 2021-04-28T04:25:24Z

Clicked by mistake!

chcunningham · 2021-04-28T04:30:19Z

@padenot

I think it's best to fix a few immediate things here (not a lot)

To clarify, do I understand correctly that you will soon private a short list of things to consider for immediate fixes?

padenot · 2021-04-28T14:36:22Z

To clarify, do I understand correctly that you will soon private a short list of things to consider for immediate fixes?

I was going to add a regular review here like usual for a couple of things yes.

index.src.html

padenot · 2021-04-28T17:56:45Z

index.src.html

+  undefined reset();
+  undefined close();
+
+  static Promise<boolean> isTypeSupported(DOMString type);


Do you mean HEIC? HEIF is supported in software today by Chrome and Firefox.

index.src.html

padenot · 2021-04-29T15:56:10Z

I went ahead and created a tag called image so that folks that mostly care (or are specialized) about images can follow easily new developments. I also filed an initial set of issues with this tag.

chcunningham

@padenot @aboba - I've split off the one outstanding issue of async isTypeSupported() into #213 for followup. Should have better visibility/readability there. Going ahead with merge since this update is otherwise agreed on and we have other issues referencing it.

@chcunningham

SHA: f5a294b Reason: push, by @chcunningham Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

@chcunningham

SHA: f5a294b Reason: push, by @chcunningham Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

@chcunningham

SHA: f5a294b Reason: push, by @chcunningham Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Add Image Decoding, associated interfaces and algorithms

8479a9b

Fixes #50.

chcunningham mentioned this pull request Mar 17, 2021

WebCodecs (again!) w3ctag/design-reviews#612

Closed

1 task

dalecurtis reviewed Mar 17, 2021

View reviewed changes

Various fixes to address dalecurtis@ feedback

2921c86

chcunningham commented Mar 19, 2021

View reviewed changes

Defer decoding track metadata until decodeMetadata() or decode()

12fd233

Also includes other minor fixes.

Add missing dfn for ImageDecoderInit.preferAnimation

5f92556

chcunningham commented Mar 24, 2021

View reviewed changes

index.src.html Show resolved Hide resolved

chcunningham requested a review from padenot March 24, 2021 17:01

chcunningham commented Mar 25, 2021

View reviewed changes

index.src.html Show resolved Hide resolved

mpetroff mentioned this pull request Apr 9, 2021

Img: add method to check if image format is supported whatwg/html#6324

Open

mathiasbynens reviewed Apr 9, 2021

View reviewed changes

chcunningham mentioned this pull request Apr 9, 2021

Sniffing considerations #169

Open

chcunningham added 2 commits April 20, 2021 21:24

Merge remote-tracking branch 'origin/main' into image_decoder

29ddca4

Address lingering ImageDecoder TODOs, dalecurtis@ feedback

a7890de

chcunningham mentioned this pull request Apr 21, 2021

Add AbortError or EncodingError exception as argument for rejected flush() promises #188

Closed

chcunningham added 2 commits April 23, 2021 12:31

Small language fixes

a63ee9b

chcunningham closed this Apr 28, 2021

chcunningham reopened this Apr 28, 2021

padenot reviewed Apr 28, 2021

View reviewed changes

chcunningham added 2 commits April 28, 2021 16:00

Clarify that ImageDecoder.tracks returns a live object

a6fe505

Merge remote-tracking branch 'origin/main' into image_decoder

114979c

padenot reviewed Apr 29, 2021

View reviewed changes

index.src.html Outdated Show resolved Hide resolved

padenot mentioned this pull request Apr 29, 2021

Image encoding API #204

Open

chrisn mentioned this pull request Apr 29, 2021

Web Codecs and image decoding w3c/charter-media-wg#26

Closed

chcunningham added 2 commits April 29, 2021 22:27

Add completed promise. Fix typo

051cc10

Merge remote-tracking branch 'origin/main' into image_decoder

8b7c907

chcunningham commented May 3, 2021

View reviewed changes

chcunningham merged commit f5a294b into main May 3, 2021

github-actions bot added a commit that referenced this pull request May 3, 2021

Merge pull request #152 from w3c/image_decoder

2d01278

SHA: f5a294b Reason: push, by @chcunningham Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions bot added a commit that referenced this pull request May 3, 2021

Merge pull request #152 from w3c/image_decoder

121ef00

SHA: f5a294b Reason: push, by @chcunningham Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions bot added a commit that referenced this pull request May 3, 2021

Merge pull request #152 from w3c/image_decoder

56460af

SHA: f5a294b Reason: push, by @chcunningham Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

chcunningham deleted the image_decoder branch June 2, 2021 03:44

padenot mentioned this pull request Nov 29, 2023

Why is isTypeSupported() promise-based? #750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Image Decoding, associated interfaces and algorithms #152

Add Image Decoding, associated interfaces and algorithms #152

chcunningham commented Mar 17, 2021 •

edited by pr-preview bot

chcunningham commented Mar 17, 2021

chcunningham left a comment

chcunningham commented Mar 19, 2021

chcunningham commented Mar 23, 2021

cconcolato commented Mar 23, 2021

chcunningham commented Mar 24, 2021 •

edited

cconcolato commented Mar 26, 2021

chcunningham commented Mar 26, 2021

cconcolato commented Mar 30, 2021 •

edited

chcunningham commented Mar 31, 2021

chcunningham commented Apr 9, 2021

mathiasbynens Apr 9, 2021

chcunningham Apr 9, 2021

mathiasbynens Apr 12, 2021 •

edited

chcunningham Apr 14, 2021

padenot Apr 27, 2021

dalecurtis Apr 29, 2021

jrmuizel Apr 30, 2021

dalecurtis Apr 30, 2021

jrmuizel Apr 30, 2021

dalecurtis Apr 30, 2021

chcunningham commented Apr 23, 2021

padenot commented Apr 27, 2021

dalecurtis commented Apr 27, 2021

chcunningham commented Apr 28, 2021

chcunningham commented Apr 28, 2021

padenot commented Apr 28, 2021

padenot Apr 28, 2021

padenot commented Apr 29, 2021

chcunningham left a comment

Add Image Decoding, associated interfaces and algorithms #152

Add Image Decoding, associated interfaces and algorithms #152

Conversation

chcunningham commented Mar 17, 2021 • edited by pr-preview bot

chcunningham commented Mar 17, 2021

chcunningham left a comment

Choose a reason for hiding this comment

chcunningham commented Mar 19, 2021

chcunningham commented Mar 23, 2021

cconcolato commented Mar 23, 2021

chcunningham commented Mar 24, 2021 • edited

cconcolato commented Mar 26, 2021

chcunningham commented Mar 26, 2021

cconcolato commented Mar 30, 2021 • edited

chcunningham commented Mar 31, 2021

chcunningham commented Apr 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mathiasbynens Apr 12, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chcunningham commented Apr 23, 2021

padenot commented Apr 27, 2021

dalecurtis commented Apr 27, 2021

chcunningham commented Apr 28, 2021

chcunningham commented Apr 28, 2021

padenot commented Apr 28, 2021

Choose a reason for hiding this comment

padenot commented Apr 29, 2021

chcunningham left a comment

Choose a reason for hiding this comment

chcunningham commented Mar 17, 2021 •

edited by pr-preview bot

chcunningham commented Mar 24, 2021 •

edited

cconcolato commented Mar 30, 2021 •

edited

mathiasbynens Apr 12, 2021 •

edited