Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text chunk processing should be in terms of the Encoding Standard #273

Closed
annevk opened this issue Mar 3, 2023 · 13 comments · Fixed by #318
Closed

Text chunk processing should be in terms of the Encoding Standard #273

annevk opened this issue Mar 3, 2023 · 13 comments · Fixed by #318
Labels
blocking-3rd-edition-cr i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. needs edits needs testcase (wpt)

Comments

@annevk
Copy link
Member

annevk commented Mar 3, 2023

There's a couple of issues with the text here as far as I can tell:

I'm not sure it needs to say anything about encoders having data in other encodings. As there are no fields to store other encodings, it seems self-evident they have to convert.

@xfq xfq added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Mar 5, 2023
@svgeesus
Copy link
Contributor

svgeesus commented Mar 6, 2023

Hi @annevk sounds good, do you have examples of other specs that correctly describe handling of invalid UTF-8 byte sequences by referencing Encoding?

Browsers tend not to display the text chunks inside images, so it is hard to test how they handle bad Latn-1 in practice.

@annevk
Copy link
Member Author

annevk commented Mar 6, 2023

See https://encoding.spec.whatwg.org/#specification-hooks for the possible UTF-8 paths. I suspect you want "UTF-8 decode".

If this is true Latin1 you want https://infra.spec.whatwg.org/#isomorphic-decode. Otherwise you want https://encoding.spec.whatwg.org/#decode with the windows-1252 encoding. I guess we could see what various operating systems do and go with that?

@svgeesus
Copy link
Contributor

Looks like we will need some WPT tests with iTXt containing a UTF-8 BOM, to check it gets discarded rather then presented as literal characters or replacement characters.

@svgeesus
Copy link
Contributor

svgeesus commented Mar 14, 2023

TweakPNG can be used to view, and edit, the content of iTXt so should be good for making test files too.

image

@ProgramMax
Copy link
Collaborator

Yeah, TweakPNG is nice. I used it for the tRNS WPT test.

Do browsers not ignore iTXt? Is there a way for the browser to access it? (So it can be used in a WPT?)

@svgeesus
Copy link
Contributor

It is possible to get at any part of a PNG file from JavaScript, though it does require writing one's own parser. As an example, PNG file chunk inspector

@annevk
Copy link
Member Author

annevk commented Apr 27, 2023

(I'm pretty sure browsers don't have an API to get at image data currently, though there have been proposals in the past.)

@ProgramMax
Copy link
Collaborator

I might be able to compile libpng to wasm, giving us a parser. But this seems pretty extreme for a WPT. And it wouldn't really be testing the browser's iTXt handling, anyway.

@annevk
Copy link
Member Author

annevk commented Apr 28, 2023

How are we writing tests at the moment?

I think ideally tests are in some kind of format that many different kind of implementations can consume. And then implementations that end up ignoring iTXt chunks simply don't run those tests (or only run them to ensure they do the correct thing decoding-wise).

@ProgramMax
Copy link
Collaborator

Currently, the tests let the browser raster the image and then query pixels to confirm they are the expected color:
https://wpt.fyi/results/png?label=master&label=experimental&aligned&view=subtest&q=png

The problem I mean is I bet most (all?) browsers see a iTXt chunk and skip past it. I don't think they parse the contents at all.
If I write a parser, I'm not testing the browser. And if my parser then handles the iTXt chunk correctly, all I've done is confirm my own parser behaves correctly. It wouldn't test any sort of conformance.

@annevk
Copy link
Member Author

annevk commented Apr 28, 2023

Right, so

  1. iTXt probably needs to be optional in terms of that it is okay to skip past it. That also means that if you do process it you probably don't want to bail out on the entire image if there's an error (i.e., don't use a fatal text decoder).
  2. Some aspects of this can be tested across all implementations.
  3. Some tests won't be useful for all implementations.

I don't think we want to drop the feature so that's the best we can do here.

@ProgramMax
Copy link
Collaborator

Oh, I think I follow now.
We can create an image with a deliberately broken iTXt chunk and make sure the tests still display an image.

@svgeesus
Copy link
Contributor

@annevk wrote:

iTXt probably needs to be optional in terms of that it is okay to skip past it.

PNG already has this: critical chunks are "absolutely required in order to successfully decode a PNG image" while ancillary chunks "may be ignored by a decoder." iTXt is ancillary.

That also means that if you do process it you probably don't want to bail out on the entire image if there's an error (i.e., don't use a fatal text decoder).

Covered under PNG decoders and viewers ... Error handling, specifically:

Recover from an error, if possible; otherwise fail gracefully. Errors that have little or no effect on the processing of the image may be ignored,

and

This is a specific kind of criticality and one that is not necessarily relevant to every conceivable decoder. For example, a program whose sole purpose is to extract text annotations (for example, copyright information) does not require a viewable image. Another decoder might consider the tRNS and gAMA chunks essential to its proper execution.

or, in this specific case, a program whose sole purpose is to extract text annotations should decode UTF-8 correctly (insert link to Encoding Standard).

@ProgramMax wrote:

We can create an image with a deliberately broken iTXt chunk and make sure the tests still display an image.

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocking-3rd-edition-cr i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. needs edits needs testcase (wpt)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants