Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't flag invalid @charset when @charset follows BOM #249

Closed
dd8 opened this issue Jul 15, 2019 · 4 comments
Closed

Doesn't flag invalid @charset when @charset follows BOM #249

dd8 opened this issue Jul 15, 2019 · 4 comments
Labels

Comments

@dd8
Copy link

dd8 commented Jul 15, 2019

The @charset pattern should only be recognised at position zero in the file - @charset anywhere else (including after the BOM) is invalid/ignored.

https://drafts.csswg.org/css-syntax/#input-byte-stream

If the first 1024 bytes of the stream begin with the hex sequence
40 63 68 61 72 73 65 74 20 22 XX* 22 3B

This file flags no errors and starts with the UTF-8 BOM:
ef bb bf 40 63 68 61 72 73 65 74 20 22 55 54 46 |...@charset "UTF|
http://test.csswg.org/suites/css21_dev/20110323/html4/support/at-charset-014.css

These files flag no errors and start with the UTF-16LE BOM:
ff fe 40 00 63 00 68 00 61 00 72 00 73 00 65 00 |..@.c.h.a.r.s.e.|
http://test.csswg.org/suites/css21_dev/20110323/html4/support/at-charset-015.css
http://test.csswg.org/suites/css21_dev/20110323/html4/support/at-charset-060.css

This file flags no errors and starts with the UTF-16BE BOM:
fe ff 00 40 00 63 00 68 00 61 00 72 00 73 00 65 |...@.c.h.a.r.s.e|
http://test.csswg.org/suites/css21_dev/20110323/html4/support/at-charset-016.css

@dd8
Copy link
Author

dd8 commented Jul 18, 2019

I think there's a difference between the css-syntax-3 spec and the CSS 2.1/2.2 spec, demonstrated by this file:
http://test.csswg.org/suites/css2.1/20110323/html4/support/at-charset-001.css

It's served as Content-Type: text/css; charset=shift_jis. It also starts with a Shift_JIS byte sequence that happens to match the UTF-8 BOM (good test case)

ef bb bf 2e e5 b9 b3 e5 92 8c 0d 0a 7b 0d 0a 20 |............{.. |

CSS 2.1/2.2 specifies that Content-Type wins over any BOM:
https://drafts.csswg.org/css2/syndata.html#charset

css-syntax-3 uses the 'Decode' algorithm with a fallback encoding derived from Content-Type, with a note saying

Note: The decode algorithm gives precedence to a byte order mark (BOM), and only uses the fallback when none is found.
ttps://drafts.csswg.org/css-syntax/#input-byte-stream

This CSS 2.x algorithm gives the correct encoding for the file (Shift_JIS) and the CSS 3 algorithm gives the wrong encoding.

Have reported this as
w3c/csswg-drafts#4126

@tabatkins
Copy link
Member

CSS Syntax spec editor here. This issue is correct, an @charset only works if it's the very first bytes in the file; placing one after a BOM renders it useless, and so should be flagged as an error. (It's no longer an encoding declaration, just an invalid and unrecognized rule.)

@ylafon ylafon added the bug label Feb 20, 2020
@ylafon
Copy link
Member

ylafon commented Feb 20, 2020

It is indeed a bug due to the way BOM is used,
Will fix that.

ylafon added a commit that referenced this issue Feb 25, 2020
…That way, we can check in the case of >CSS3 if the charset rule is valid or not, see #249
@ylafon
Copy link
Member

ylafon commented Mar 2, 2020

@dd8 can you check that at least for CSS3 it works as intended?
Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants