Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming image decoding #13

Closed
jakearchibald opened this issue Sep 2, 2019 · 33 comments
Closed

Streaming image decoding #13

jakearchibald opened this issue Sep 2, 2019 · 33 comments
Labels
maybe Ideas that might be in scope, and worth discussing

Comments

@jakearchibald
Copy link

jakearchibald commented Sep 2, 2019

const decoded = await imageDecoder.decode(input);
const canvas = ...;
canvas.getContext('2d').putImageData(decoded, 0, 0);

The above (from the explainer) suggests that decoding doesn't stream, which feels like a missed opportunity.

In its current state, it feels like it'd be better to change createImageBitmap so it could accept a stream.

Maybe that should happen, whereas a whole new API could provide expose streamed image decoding.

Images can stream in a few ways:

  • Yielding pixel data from top to bottom.
  • Yielding pixel data with increasing detail (progressive/interlaced).
  • Yielding frames.

This would allow partially-loaded images to be used in things like <canvas>.

@pthatcherg
Copy link
Contributor

To the possible WebIDL I added this:

partial interface ImageData {
readonly attribute ReadableStream readable; // of bytes
}

Which would allow the data to stream (the pixel data), at least from the decoder.

I'm interested to hear what you mean by "streaming frames" That makes it sound a lot like a video decoder, which is hopefully well supported by the previous portions of the explainer. Is there something different you area looking for? Perhaps what you mean is different quality versions of the same image rather than different frames over time?

@jakearchibald
Copy link
Author

I'm interested to hear what you mean by "streaming frames". That makes it sound a lot like a video decoder

By yielding frames I mean animated gif/webp. Maybe that could be handled by the video decoder, but the platform doesn't treat these things the same elsewhere.

Perhaps what you mean is different quality versions of the same image rather than different frames over time?

Nah, "yielding pixel data with increasing detail (progressive/interlaced)" was supposed to cover that case.

@jakearchibald
Copy link
Author

To the possible WebIDL I added this:

partial interface ImageData {
readonly attribute ReadableStream readable; // of bytes
}

I don't think that makes sense as ImageData is a synchronous data structure. Maybe you mean ImageBitmap? But even then it seems weird as APIs expect ImageBitmap to represent a 'decoded' image source.

@pthatcherg
Copy link
Contributor

A couple of different things at once:

  • Other than containerization (which is an interesting topic for video vs. image codecs in its own right), how is an animated webp file different than a vp8 video?

  • For ImageData.readable, my intention was to allow a WHATWG stream version of .data. I was hoping it could then be piped through transform streams (to edit the image) before being piped elsewhere (such as an encoder).

  • I'm a little new to ImageData vs. ImageBitmap, but reading through the Chromium source code, it appears that ImageData is the lower-level concept of a source of an image which is used in many places, so that seemed like a more fitting structure to represent what would come out of a decoder. But I could be wrong.

  • How could a progress stream of raw/unencoded bytes of increasing quality work?

@jakearchibald
Copy link
Author

Other than containerization (which is an interesting topic for video vs. image codecs in its own right), how is an animated webp file different than a vp8 video?

I'm not sure. But we don't allow <video> to play gifs, or <img> to play muted mp4s. It'd be great if that changed IMO 😄.

For ImageData.readable, my intention was to allow a WHATWG stream version of .data. I was hoping it could then be piped through transform streams (to edit the image) before being piped elsewhere (such as an encoder).

I think that would be better handled by a helper that converts an array/sequence to a stream. But, I don't see the benefit.

The benefit of streaming is you can do things in chunks, or ideally in parallel, so a streaming encoder/decoder is only useful if it can provide meaningful data before the whole operation is complete.

How could a progress stream of raw/unencoded bytes of increasing quality work?

Good question, and the answer would take a lot of careful design, but here are some half-baked ideas:

If the image format only touches pixels once during decode (baseline jpeg, webp), I'd expect the stream to yield data structures like this:

  • Final image width.
  • Final image height.
  • ImageData of some decoded data.
  • Target x position of image data.
  • Target y position of image data.
  • Target width of image data.
  • Target height of image data.

I guess 'multi-scan' formats would have the same format, but the target x & y would always be 0, and the target width & height would be the same as the final width & height.

@pthatcherg
Copy link
Contributor

pthatcherg commented Sep 4, 2019 via email

@guest271314
Copy link
Contributor

@jakearchibald

In its current state, it feels like it'd be better to change createImageBitmap so it could accept a stream.

Note, createImageBitmap and ImageCapture.grabFrame() implementation has issues which until fixed impact reliability of that API.

The first list item below is an insidious bug. The only way that was able to achieve expected result, before reading the second list item which clarified what the issue is, was to assign ReadableStreamDefaultController to a variable in start() which could call close() on outside of the ReadableStream instance.

@jakearchibald
Copy link
Author

jakearchibald commented Sep 4, 2019

@pthatcherg

a streaming encoder/decoder is only useful if it can provide meaningful data before the whole operation is complete.

How useful that will be for images is still definitely uncertain. So far, it's just an idea to possibly explore.

Every browser has taken advantage of streaming decoding for over a decade now. I think its usefulness has been well proven.

You originally said stream of bytes

I think I said pixel data. But yes, there needs to be metadata that tells the developer where that data exists in the overall image.

I guess what I'm trying to say, at a higher level, is: All browsers ship streaming image decoders that allow them to improve performance by handling image data of partially decoded images. Let's give developers access to this.

@jakearchibald
Copy link
Author

@guest271314 this issue is about streaming image decoding, not getting images from a video. Please start another issue, or post on an existing relevant issue, if you have something different to discuss.

@guest271314
Copy link
Contributor

Only posted here because you mentioned using or modifying createImageData() which has issues. If you interpret the comments as pertaining only to video and the information is not useful to you so be it. Good luck!

@jakearchibald
Copy link
Author

@guest271314 The existence of an implementation bug in one browser does not justify the creation of a whole new standard/API.

@w3c w3c deleted a comment from guest271314 Sep 4, 2019
@pthatcherg
Copy link
Contributor

@pthatcherg

a streaming encoder/decoder is only useful if it can provide meaningful data before the whole operation is complete.

How useful that will be for images is still definitely uncertain. So far, it's just an idea to possibly explore.

Every browser has taken advantage of streaming decoding for over a decade now. I think its usefulness has been well proven.

I meant more specifically the .readable attribute on ImageData giving a stream of bytes rather than .data for an array of bytes.

The "stream of data structures" thing that you meant (and I misunderstood at first) is something different.

You originally said stream of bytes

I think I said pixel data. But yes, there needs to be metadata that tells the developer where that data exists in the overall image.

Ah.... my mistake, then.

I guess what I'm trying to say, at a higher level, is: All browsers ship streaming image decoders that allow them to improve performance by handling image data of partially decoded images. Let's give developers access to this.

I think we're on the same page now.

Let's see if I can take a shot at what the WebIDL would look like:

interface ImageDecoder {
// You get more than one as more information becomes available.
ReadableStream decode(EncodedImageData or ReadableStream);
}

interface DecodedImage {
// Things that aren't available yet are null
readonly attribute int? width;
readonly attribute int? height;
readonly attribute int? targetWidth;
readonly attribute int? targetHeight;
readonly attribute int? targetX;
readonly attribute int? targetY;
readonly attribute ImageData? imageData;
}

That could probably be cleaned up a bit, but first let's make sure that's generally what you're looking for (and if it's implementable :).

@jakearchibald
Copy link
Author

It should probably be a transform stream rather than a function that takes a readable, but seems good!

However, I haven't looked at how streaming codecs actually work, and what they output. They might do something much better, or cater for something we're missing.

@pthatcherg
Copy link
Contributor

"Justify" and "justice" have separate etymologies.

But also, wow.

FYI, I deleted a comment that I think was distracting from the meaningful conversation.

@pthatcherg
Copy link
Contributor

It should probably be a transform stream rather than a function that takes a readable, but seems good!

However, I haven't looked at how streaming codecs actually work, and what they output. They might do something much better, or cater for something we're missing.

OK, how about this:

interface ImageDecoder {
  attribute WritableStream writable;  // of encoded bytes
  attribute ReadableStream readable;  // of DecodedImage (progressively more info)
}

interface DecodedImage {
  // Things that aren't available yet are null
  readonly attribute int? width;
  readonly attribute int? height;
  readonly attribute int? targetWidth;
  readonly attribute int? targetHeight;
  readonly attribute int? targetX;
  readonly attribute int? targetY;
  readonly attribute ImageData? imageData;
}

@jakearchibald
Copy link
Author

👍

This doesn't cater for animated formats, but yeah, it does feel like the video stuff (or something similar) would be in a better position to handle that.

@pthatcherg
Copy link
Contributor

I think what I'll do is remove what is there currently for images (I meant it to be a PR and just pushed to the wrong branch anyway) and then make a PR like this.

@pthatcherg
Copy link
Contributor

As I'm updating the explainer to represent this, I had the following question:

What is it that the browser can do here that you can't do already in wasm/JS?

@jakearchibald
Copy link
Author

In terms of the things discussed in this thread, it can all be done if you throw enough wasm at it.

However, the browser already has well-optimised steaming image decoders, so why not let developers use those rather than importing their own?

@guest271314
Copy link
Contributor

Is this issue aiming for something similar to http://www.http2demo.io and https://http2.akamai.com/demo?

Does not HTTP require the entire resource to be downloaded, e.g., using fetch() even where ReadableStream is used to read the response? Meaning even if "Steaming image decoding" is used at HTTP it is an illusion after the fact of the resource being downloaded (save for EventSource and WebSocket, web-platform-tests/wpt#18335)?

If the total size of ImageData is known before hand individual pixel can be set at the ImageData instance similar to what AFAICT is described at whatwg/html#4785, see https://github.com/dsanders11/imagebitmap-getimagedata-demo.

@guest271314
Copy link
Contributor

"Justify" and "justice" have separate etymologies.

But also, wow.

FYI wikionary is not a primary source for the etymology or meaning of English words, terms or phrases. In fact no such primary source exists, as English is an equivocal language. There is no prohibition to arbitrary creation and re-defining of words or terms, or not providing any definition at all. One example is the term "justice", where the term is not defined in U.S. or State law. Similar to the term "justify" or "justifiable". E.g., the phrase "justifiable homocide" can have vastly different interpretations depending on who is being evaluated for the action and whose life was taken. The State can conclude "justifiable" the family can reject such an assertion, for example in the case of Stephon Clark in Sacramento, California. Another example is the term "cheaper by the dozen" https://english.stackexchange.com/q/486088 which has several different interpretations, or "etymology", if you prefer. The closest you can get to a primary source for the meaning of a word, term or phrase in English is a technical document, e.g., a specification or standard, or a law enacted by a legislative body, where the codified rules of statutory construction apply.

@jakearchibald
Copy link
Author

@guest271314

Is this issue aiming for something similar to http://www.http2demo.io and https://http2.akamai.com/demo?

No.

Does not HTTP require the entire resource to be downloaded, e.g., using fetch() even where ReadableStream is used to read the response?

No.

Meaning even if "Steaming image decoding" is used at HTTP it is an illusion after the fact of the resource being downloaded (save for EventSource and WebSocket, web-platform-tests/wpt#18335)?

No.

If the total size of ImageData is known before hand individual pixel can be set at the ImageData instance similar to what AFAICT is described at whatwg/html#4785, see https://github.com/dsanders11/imagebitmap-getimagedata-demo.

Yes but you'd still want to know when pixels changed, and what changed. For progressive/interlaced formats, your proposal doesn't let you provide your own interpolation.

@guest271314
Copy link
Contributor

Have had the concept of, if the pixel dimension of the image to be displayed are the same, to create a single "array" for the purposes of this use case, and replace each pixel that is different in the original array when a new "chunk" arrives, avoiding the creation of multiple ImageBitmap or ImageData instances.

Now, mapping which pixel changed would at least require creating a map and keeping track of which pixels change in both directions, which may or may not consume just as much resources as creation of multiple ImageBitmap and ImageData instances due to traversing the given "array" and "map" data structures, though should only need to be created once.

For an animation use case would consider utilizing Web Animation API, where images or pixels could be "streamed" to be displayed at background-image using an "array", async generator, ReadableStream, etc. Created such a concept some time ago though lost the code. Created another such proof-of-concept for the use case of creating a timeline for input being a MediaStream from canvas.captureStream() which does not have any ending, it is a live stream, the basic code

keyframes.push(
                            {backgroundImage: `url(${canvas.toDataURL("image/webp")})`, width:width + "px", height:height + "px"});
                        stream.requestFrame();
// ...

            let t = Math.floor(duration * 1000);
            const animation = picture.animate(keyframes, {duration: t});
            animation.play();

Am far more best suited to trying to solve challenging use cases where the use case is clear and at least some code exists to test and determine what the bugs are and what is ultimately not possible using current technologies.

What are you not able to achieve right now?

@jakearchibald
Copy link
Author

@guest271314

I appreciate you trying to help, but given that you don't understand the streaming nature of HTTP or image decoders, I don't think this conversation is constructive, and will only continue to derail the thread for the rest of us.

Rather than ask me questions about what HTTP and images can/can't do, perhaps do some research. You can get answers to the things you've asked with some pretty basic tests, or by putting your query to a search engine.

@guest271314
Copy link
Contributor

Have done research. That is why posted the previous link describing HTTP/2. It is not clear what you are trying to achieve that you cannot do right now. If you have an incoming stream of data [0, 10, 255] where the first two elements are x and y and the last the "color code" you can simply swap the existing color for a different color. That can be drawn onto a canvas or streamed as a background image at a frame rate at any HTML element. By "streaming" image are you describing streaming a single image in "chunks" or streaming arbitrary pixels to form a single image for the purpose of an animation? Either procedure should be presently possible.

The accusation of "derail" is not correct at all. Am asking specific questions attempting to gather what you are actually trying to do that you are not capable of doing right now. The questions posed are no different than any other questions on any board. If you already had the answers to your own question then there would be no need for you to file this issue in the first place, due to that fact we a re in the exact same space relevant to asking questions.

@guest271314
Copy link
Contributor

@jakearchibald Am at the front-end. "the rest of us" since you are speaking for everybody other than yourself as well should be able to figure out what you are actually trying to do and to solve the problem that is not clear to this novice coder that just writes code. Do not really care at all if you or anyone or entity like the questions posed or not. Am not here to make friends. Am asking technical questions for own edification. If answering such basic questions is beyond the scale of your status then just state that: you are over-qualified to answer such question. Which again, leads back to why you even need to file this issue in the first place. You can solve your own inquiry with using your own expertiese. Am certain in the final analysis the outcome will be useful and clear. Best of luck with your project.

@pthatcherg pthatcherg added the maybe Ideas that might be in scope, and worth discussing label Sep 18, 2019
@nadavsinai
Copy link

nadavsinai commented Oct 28, 2019

HI, we at Philips-Algotec developing a medical imaging application would benefit very much from this proposed WebCodecs extension for using the browser's decoders in our own code.
In addition, allowing the user to register a custom decoder via JS/WASM would be truely amazing- this will really manifest the low-end extensibility that the Extensible Web Manifesto speaks of.
something like a module adhering to the ImageDecoder interface which can be registered for a given mimeType

const jpegXLDecoder = new JpegXLDecoder(); /// do your WASM/JS magic here and adhere to interface.
const imageDecoder = new ImageDecoder({mimetype: 'image/jpegxl',decoder:jpegXLDecoder});
navigator.registerDecoder(imageDecoder); // imaginary API... 
/// from here on any <img> tag which loads a source with the right mime type will use our decoder
// and also imperatively 
const decoded = await imageDecoder.decode(input); // some streaming input
const canvas = ...;
canvas.getContext('2d').putImageData(decoded, 0, 0);

Actually supporting non-streaming inputs as part of the ImageDecoder interface makes sense in this case.
What's your thoughts?

@pthatcherg
Copy link
Contributor

There are two separate things here:

  1. Making a WASM image decoder and then using it to decode input from JS.

  2. Making a WASM image decoder and then expecting an img tag to use it.

The first I believe you can do already today. No new APIs are needed.

The second is somewhat interesting, but in that case, what is the advantage of using the img tag instead of canvas?

@dalecurtis
Copy link
Contributor

dalecurtis commented Apr 17, 2020

I've written an explainer and implemented a prototype of how this might work:
https://github.com/dalecurtis/image-decoder-api/blob/master/explainer.md

Please take a look and let me know if that approach sounds good. If so, we may eventually want to merge it with the WebCodecs explainer/spec.

@eeeps
Copy link

eeeps commented Aug 27, 2020

Just want to say that I'm starting to try to build a demo of JPEG-XL's native progressiveness, and how it might be useful for low-quality image placeholders or even single-file responsive images. I would absolutely love an ImageDecoder that dealt in Streams!

@dalecurtis
Copy link
Contributor

ImageDecoder can already do that. You just give it the ReadableStream as data value.

@chcunningham
Copy link
Collaborator

chcunningham commented Feb 19, 2021

Old issue. Dale's explainer supports streaming, is now implemented in Chrome behind the WebCodecs flag/origin trial, and actively being spec'ed (tracked in #50).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maybe Ideas that might be in scope, and worth discussing
Projects
None yet
Development

No branches or pull requests

7 participants