-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text chunk processing should be in terms of the Encoding Standard #273
Comments
Hi @annevk sounds good, do you have examples of other specs that correctly describe handling of invalid UTF-8 byte sequences by referencing Encoding? Browsers tend not to display the text chunks inside images, so it is hard to test how they handle bad Latn-1 in practice. |
See https://encoding.spec.whatwg.org/#specification-hooks for the possible UTF-8 paths. I suspect you want "UTF-8 decode". If this is true Latin1 you want https://infra.spec.whatwg.org/#isomorphic-decode. Otherwise you want https://encoding.spec.whatwg.org/#decode with the windows-1252 encoding. I guess we could see what various operating systems do and go with that? |
Looks like we will need some WPT tests with |
TweakPNG can be used to view, and edit, the content of |
Yeah, TweakPNG is nice. I used it for the tRNS WPT test. Do browsers not ignore iTXt? Is there a way for the browser to access it? (So it can be used in a WPT?) |
It is possible to get at any part of a PNG file from JavaScript, though it does require writing one's own parser. As an example, PNG file chunk inspector |
(I'm pretty sure browsers don't have an API to get at image data currently, though there have been proposals in the past.) |
I might be able to compile libpng to wasm, giving us a parser. But this seems pretty extreme for a WPT. And it wouldn't really be testing the browser's iTXt handling, anyway. |
How are we writing tests at the moment? I think ideally tests are in some kind of format that many different kind of implementations can consume. And then implementations that end up ignoring iTXt chunks simply don't run those tests (or only run them to ensure they do the correct thing decoding-wise). |
Currently, the tests let the browser raster the image and then query pixels to confirm they are the expected color: The problem I mean is I bet most (all?) browsers see a iTXt chunk and skip past it. I don't think they parse the contents at all. |
Right, so
I don't think we want to drop the feature so that's the best we can do here. |
Oh, I think I follow now. |
@annevk wrote:
PNG already has this: critical chunks are "absolutely required in order to successfully decode a PNG image" while ancillary chunks "may be ignored by a decoder."
Covered under PNG decoders and viewers ... Error handling, specifically:
and
or, in this specific case, a program whose sole purpose is to extract text annotations should decode UTF-8 correctly (insert link to Encoding Standard). @ProgramMax wrote:
Yes. |
There's a couple of issues with the text here as far as I can tell:
I'm not sure it needs to say anything about encoders having data in other encodings. As there are no fields to store other encodings, it seems self-evident they have to convert.
The text was updated successfully, but these errors were encountered: