Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Image Decoding, associated interfaces and algorithms #152

Merged
merged 12 commits into from May 3, 2021

Conversation

chcunningham
Copy link
Collaborator

@chcunningham chcunningham commented Mar 17, 2021

Fixes #50.


Preview | Diff

@chcunningham
Copy link
Collaborator Author

@dalecurtis, mind taking a first pass? Interface wise, this is everything we discussed. But behind those friendly interfaces hides quite a bit of state.

@aboba, FYI - will request review formally once Dale has had a go.
@padenot FYI

index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@chcunningham chcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dalecurtis, great feedback.

index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
index.src.html Show resolved Hide resolved
index.src.html Outdated Show resolved Hide resolved
@chcunningham
Copy link
Collaborator Author

@cconcolato, we're interested to get your take how ImageDecoder describes a list of of ImageTracks (ImageDecoder.tracks). We think it maps well onto video-based image formats like avif (given that "tracks" is a longstanding concept for video), but there's some debate about whether it is over engineered. That is, do formats like avif support an arbitrary number of tracks? If we instead expect all image formats to define at most 2 tracks (one animated, one still), then we could simplify the API to remove all mention of tracks and make selections simply by giving a value for preferAnimation.

@chcunningham
Copy link
Collaborator Author

@aboba, I think Dale's review is sufficiently concluded for you to begin. Standing by for questions and feedback.

@cconcolato
Copy link

@cconcolato, we're interested to get your take how ImageDecoder describes a list of of ImageTracks (ImageDecoder.tracks). We think it maps well onto video-based image formats like avif (given that "tracks" is a longstanding concept for video), but there's some debate about whether it is over engineered. That is, do formats like avif support an arbitrary number of tracks? If we instead expect all image formats to define at most 2 tracks (one animated, one still), then we could simplify the API to remove all mention of tracks and make selections simply by giving a value for preferAnimation.

@chcunningham Not sure I have enough information to answer. Let me try.
First, I'm a bit confused when you say "at most 2 tracks (one animated, one still)". In AVIF or HEIF-based formats, there are 2 distinct concepts: image items and image sequence tracks. One image item represents one image (ignoring scalable, layered image items). A image sequence track contains multiple images that are meant to be displayed in sequence, following some optional timing information (duration, and loop). Of course, if it helps the design of the API and the implementations, you could expose an item as a single-image track.

Then, an ISOBMFF file can indeed contain many tracks. For example, an mp4 file could contain an AV1 video track, an audio track, a subtitle track. It could also contain an image sequence track (e.g. using only the key frames of the video track) meant to be viewed as a GIF-like animation before the video is clicked. It could also contain one image item to give a representative image of the video. Theoretically, you can construct files with as many tracks and items as you want. It could have N video tracks, K audio tracks, P image sequence tracks, Q subtitle tracks, R image items, etc... In practice, typical files will have simple configurations.

Reading:

[[animated]]
Indicates whether this track contains an animated image with multiple frames.
[[frame count]]
The number of frames in this track.
[[repetition count]]
The number of times the animation is intended to repeat.

It seems to me that you want to expose only image items and image sequences (which is fine). I'm curious how an implementation is supposed to decide if it exposes a sequence of images as a video track or as an image track. Do you expect the track container to guide the implementation? For example, in ISOBMFF, video tracks and image sequence tracks are differentiated by the track handler vide vs. pict.

In practice, I don't think files containing images will contain more than one image sequence. As for image items, there are use cases where people think about storing multiple image items in the same file. This is for example because they are the result of a capture burst, or bracketed images, or multi-angle, multi-view images, or even to package multiple resolutions in the same file. But these use cases are rather rare IMHO. If you want to keep the API simple for now, the preferAnimation approach seems reasonable and matches the hypothetical reader API that MIAF defines:

Inputs to a MIAF reader are:
— a file with a FileTypeBox containing at least one brand specified in this document;
— optionally one of the following:
— item_ID of the item to be output (psItemId),
— track_ID of the track to be output (psTrackId),
— a selection between a static image (psImagePreferredFlag equal to 1) or track (psImagePreferredFlag equal to 0) to be output,
— optionally constraints, such as the maximum width and height of an image item or track;
— optionally one or more of the following roles of the image or track to be output:
— master (default),
— thumbnail,
— auxiliary, which may be further classified by the type.

Maybe you want to consider adding preferThumbnail?

@joedrago may have additional feeback based on the libavif API and its integration in browsers
@dwsinger may have additional feedback based on ISOBMFF/HEIF/MIAF specs.

@chcunningham
Copy link
Collaborator Author

chcunningham commented Mar 24, 2021

Thank you @cconcolato!

It seems to me that you want to expose only image items and image sequences (which is fine).

Yes. And I'm calling both a "track", where an image item is just a track with one frame.

I'm curious how an implementation is supposed to decide if it exposes a sequence of images as a video track or as an image track. Do you expect the track container to guide the implementation? For example, in ISOBMFF, video tracks and image sequence tracks are differentiated by the track handler vide vs. pict.

When you say "video track", I take it you mean an ImageTrack for which track.animated=true. If so, the my intent with animated is to provide an early signal that this track will have a frameCount > 1. Ideally we would do away with animated entirely and just have frameCount, but frameCount is not always known at the outset (particularly for gif), so this may cause folks to prematurely consider a track with 1 frame as non-animated.

This is for example because they are the result of a capture burst, or bracketed images, or multi-angle, multi-view images, or even to package multiple resolutions in the same file. But these use cases are rather rare IMHO. If you want to keep the API simple for now, the preferAnimation approach seems reasonable and matches the hypothetical reader API that MIAF defines:

An earlier draft did just have preferAnimation without the tracks mechanism. But then we had the frameCount and animated properties directly on ImageDecoder. This works, but its limiting if we later do want to add some description of alternative tracks. I noticed that html has long defined AudioTrackList and VideoTrackList interfaces, so this seemed like a pattern we might follow with ImageTrack(List).

Maybe you want to consider adding preferThumbnail?

Could do. What happens when preferThumbnail and preferAnimated compete? Maybe prefer should be an enum w/ either type?

@cconcolato
Copy link

When you say "video track", I take it you mean an ImageTrack for which track.animated=true.

No I really meant a VideoTrack. This spec defines an ImageTrack and that the HTML spec defines a VideoTrack. When both will be implemented, and the browser is presented with an MP4 video track or MP4 image sequence track, how will it decide if it uses a VideoTrack or an ImageTrack?

Could do. What happens when preferThumbnail and preferAnimated compete? Maybe prefer should be an enum w/ either type?

Maybe, no strong opinion.

@chcunningham
Copy link
Collaborator Author

No I really meant a VideoTrack. This spec defines an ImageTrack and that the HTML spec defines a VideoTrack. When both will be implemented, and the browser is presented with an MP4 video track or MP4 image sequence track, how will it decide if it uses a VideoTrack or an ImageTrack?

Ah, I follow. Do you expect to see files like this in the wild? How often? Would it be reasonable to describe these with the image/* mimetype (vs video/*)?

@cconcolato
Copy link

cconcolato commented Mar 30, 2021

I can't predict the future, but I could envisage people creating dual-headed files (with an image sequence track and a video track, possibly sharing coded frames) to be used in both <img> and <video>. The same file could be served with either MIME type.

@chcunningham
Copy link
Collaborator Author

I can't predict the future, but I could envisage people creating dual-headed files (with an image sequence track and a video track, possibly sharing coded frames) to be used in both and

Thanks. Probably you meant <video> and <img>. I think its reasonable for us to draw the lines like so:

  • ImageDecoder ignores the VideoTracks in these files. If you're using ImageDecoder, we assume you want image things.
  • You can still decode the video content for such files using WebCodecs, but this requires the app perform the typical demuxing and creation of {{EncodedVideoChunk}}s

@chcunningham
Copy link
Collaborator Author

@aboba, just checking in - any feedback? @padenot I know your feedback is still being compiled.

undefined reset();
undefined close();

static Promise<boolean> isTypeSupported(DOMString type);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a promise-based API? Can we instead return the boolean synchronously? https://github.com/dalecurtis/image-decoder-api/issues/6

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was made promise as we anticipate cases where the UA may not synchronously have the answer. As image formats have started to use video codecs, decoding an image may require instantiating video decoders backed by platform APIs that. A browser architecture may be such that these APIs are called in a separate process, sandboxed for improved security. Supported types then becomes a question for these same APIs, which is implemented via async IPC.

Earlier media capability detection APIs, <video>.canPlayType(), MSE.isTypeSupported(), and WebRTC's RTCRtpSender.getCapabilities() have all been sync. This has at times been problematic for implementers. Newer APIs like MediaCapabilities.decodingInfo() were made async. Note that the other isConfigSupported() interfaces in WebCodecs are also async.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that actual image decoding might depend on another process, but answering the question "do I know what to do with this image type at all (without necessarily doing the work)" seems orthogonal to that. Couldn't browsers just maintain a list of supported image types and synchronously check it whenever isTypeSupported is called?

If not, then have we considered changing the name of this API? IMHO it's surprising to give it the same name as an existing API without matching the signature of its return value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that actual image decoding might depend on another process, but answering the question "do I know what to do with this image type at all (without necessarily doing the work)" seems orthogonal to that. Couldn't browsers just maintain a list of supported image types and synchronously check it whenever isTypeSupported is called?

If the browser entirely relies on the OS to provide the codec, it may not be possible to know statically what codecs are supported (particularly for newer formats). Instead, we may be forced to query OS apis that are adjacent to the actual decoding APIs. Often this involves the same async IPC to a privaledged sandboxed process.

If not, then have we considered changing the name of this API? IMHO it's surprising to give it the same name as an existing API without matching the signature of its return value.

Open to suggestions. I liked this name for its similarity actually. It is performing essentially the same function as its predecessor. I think any confusion would be pretty immediately resolved at dev-time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the browser entirely relies on the OS to provide the codec, it may not be possible to know statically what codecs are supported (particularly for newer formats). Instead, we may be forced to query OS apis that are adjacent to the actual decoding APIs. Often this involves the same async IPC to a privaledged sandboxed process.

In my experience, this is mostly true for video, in the sense that it's well possible that the device that allows (say) power or cpu-efficient decoding is simply physically removed, and so that it's impossible to store the capabilities somewhere on the browser for synchronous access.

Do we have any evidence of a similar constraint for images? In our experience, hardware decoding for images has this problem where the setup time drawfs the decoding time, and the setup time need to happen per image (with a possible edge case when lots of images have the same dimensions/format maybe?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in Chrome, in the event of a gpu process crash, it is possible support for some formats is no longer available.

Otherwise I agree, we're just making this async for symmetry and a hypothetical. Maybe @jernoble or @aboba want to chime in to avoid a decision that would preclude any formats they might want to support.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dalecurtis what formats aren't supported in Chrome when the GPU process crashes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the hypothetical world where a browser only has support for HEIC via a platform decoder, there is a case where repeated GPU crashes may put the browser into a software-rendering/no-gpu mode that prevents access to the platform decoder.

As a practical example, in some cases the platform decoder may have hard limits on the number of instances available and/or may not be working reliably (e.g., it's hanging or crashing too frequently) to the point that it's disabled at runtime. This happens today in Chrome on Android for H.264 support.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think it's unlikely that a browser would choose to support an image format and not have a software fallback, but even if such a browser did exist the web author would still need to deal with decoding support going away after they've already checked for support withisTypeSupported().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically we don't have software fallback on Android for H.264 today -- so that at least is a real case. In the case where the codec goes away during usage the decoder would trigger a decoding error. The same would occur if the loss occurred between construction and the first decode call.

I think authors would have to handle this case regardless of if iTS is sync or not though. For security reasons (e.g., malicious invocation of the platform decoder), even if software fallback is available, it's unclear that automatic fallback is the right operation.

Ultimately my initial implementation was always synchronous, so if everyone wants this to be sync and isn't swayed by the hypotheticals I don't mind switching it back. I believe @chcunningham only suggested it for symmetry with the rest of the WebCodecs APIs.

Queue task to establish tracks upon construction. As this is no longer
user driven, we replace the method with an attribute to let users know
when the track list is "ready".

Establishing tracks was previously user driven by a call to
decodeMetadat(). After further consideration, this seems needlessly
complex. Decoding track metadata is not resource intensive and we expect
that most usage of ImageDecoder is such that they wish to decode actual
frames right away. Users who wish to defer decoding track ImageData may
still do so by deferring construction of the ImageDecoder.

This commit also includes other small fixes (formatting, output
timestamp and duration, and closing ImageDecoder if we fail to establish
tracks once data is "complete").
@chcunningham
Copy link
Collaborator Author

@padenot @aboba - this PR is over a month old. I've iterated on the design only very slightly since it's original posting (incorporating feedback from others). If no objection, I will go ahead and merge by EOD Wednesday 4/28.

@padenot
Copy link
Collaborator

padenot commented Apr 27, 2021

We have quite a few feedback items already, and this is being circulated through various teams internally at Mozilla. Because of the type and number of comments we have, I think it's best to fix a few immediate things here (not a lot), merge, and then I'll open a series of issues on this repo, tagged with an appropriate tag (image-decoding?), so that people that mostly care about images can quickly follow only the issue about this. Would that be appropriate? I think it would make the discussions a bit more structured and workable.

That said, taking more than a month to review a fundamental new way to decode image on the web while the space hasn't really moved for years, while quite a few image formats are being added currently, all with the peculiarities and possible optimizations to make this a really compelling option for authors and unlocking new classes of apps is not exactly a long time.

@dalecurtis
Copy link
Contributor

Thanks Paul! I look forward to your reports. Your proposal sgtm.

While the PR may have only been out for a month, the API hasn't changed materially around the primary use case from the initial explainer and that's been circulating for over a year.

@chcunningham
Copy link
Collaborator Author

Clicked by mistake!

@chcunningham chcunningham reopened this Apr 28, 2021
@chcunningham
Copy link
Collaborator Author

@padenot

I think it's best to fix a few immediate things here (not a lot)

To clarify, do I understand correctly that you will soon private a short list of things to consider for immediate fixes?

@padenot
Copy link
Collaborator

padenot commented Apr 28, 2021

To clarify, do I understand correctly that you will soon private a short list of things to consider for immediate fixes?

I was going to add a regular review here like usual for a couple of things yes.

index.src.html Show resolved Hide resolved
index.src.html Show resolved Hide resolved
undefined reset();
undefined close();

static Promise<boolean> isTypeSupported(DOMString type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean HEIC? HEIF is supported in software today by Chrome and Firefox.

index.src.html Outdated Show resolved Hide resolved
@padenot padenot mentioned this pull request Apr 29, 2021
@padenot
Copy link
Collaborator

padenot commented Apr 29, 2021

I went ahead and created a tag called image so that folks that mostly care (or are specialized) about images can follow easily new developments. I also filed an initial set of issues with this tag.

Copy link
Collaborator Author

@chcunningham chcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@padenot @aboba - I've split off the one outstanding issue of async isTypeSupported() into #213 for followup. Should have better visibility/readability there. Going ahead with merge since this update is otherwise agreed on and we have other issues referencing it.

@chcunningham chcunningham merged commit f5a294b into main May 3, 2021
github-actions bot added a commit that referenced this pull request May 3, 2021
SHA: f5a294b
Reason: push, by @chcunningham

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
github-actions bot added a commit that referenced this pull request May 3, 2021
SHA: f5a294b
Reason: push, by @chcunningham

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
github-actions bot added a commit that referenced this pull request May 3, 2021
SHA: f5a294b
Reason: push, by @chcunningham

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@chcunningham chcunningham deleted the image_decoder branch June 2, 2021 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for images
7 participants