[css‑syntax] `<string‑token>` should probably handle valid surrogate pairs #6352

ExE-Boss · 2021-06-05T11:02:04Z

Spec: https://drafts.csswg.org/css-syntax-3/#consume-string-token

The 6 digit syntax for <string‑token> was only added to CSS in CSS2, with the CSS Syntax module restricting it to not allow surrogate code points, but I would expect that a string containing valid surrogate pairs would work like the equivalent 6 digit syntax, e.g.:

.foo:before {
	/*
	The "CSS1" and "CSS2.1" specifications parse this as:
		- U+D83D (High Surrogate)
		- U+DD25 (Low Surrogate)

	The "CSS Syntax Level 3" specification parses this as:
		- U+FFFD (Replacement Character)
		- U+FFFD (Replacement Character)

	I would expect this to parse as the U+D83D and U+DD25 surrogate code points,
	which would decode to U+1F525 (Fire) at parse time.
	*/
	content: "\D83D\DD25";
}

and

.foo:before {
	/*
	The "CSS1" specification parses this as:
		- U+1F52 (Greek Small Letter Upsilon With Psili And Varia)
		- U+35 (Digit Five)
	The "CSS2.1" and "CSS Syntax Level 3" specifications parse this as:
		- U+1F525 (Fire)
	*/
	content: "\1F525";
}

would be equivalent.

The text was updated successfully, but these errors were encountered:

tabatkins · 2021-06-08T01:11:16Z

Do you have a compat need for this? I'd prefer not to allow it if possible, as it would require us to either allow lone surrogates (the only way these can be produced, as they're otherwise "censored" away during parsing) or add some complication to escaping such that, if you decode a high surrogate, you immediately check if the next characters are an escape for a low surrogate, then decode them together and emit the combined codepoint.

That's possible, I'd just like to avoid it if it's not necessary.

I'm not sure what the significance of your comment about CSS1 only allowing 4-digit escapes is, sorry.

ExE-Boss · 2021-11-25T10:50:27Z

Well, back when WebKit didn’t correctly support the CSS2.1 syntax (before r114876), they implemented support for surrogate pairs^{mailing list} as a workaround.

I'm not sure what the significance of your comment about CSS1 only allowing 4-digit escapes is, sorry.

The CSS1 syntax for Unicode escapes^{CSS1 Appendix B} in case‑insensitive flex^CSS1 ref16 notation is:

unicode		\\[0-9a-f]{1,4}

Which means that anything after the 4th hex digit is not part of the escaped code point in CSS1:

.foo:before {
	/*
	The "CSS1" specification parses this as:
		- U+0000 (Null) → U+FFFD (Replacement Character)
		- U+0034 (Digit Four)
		- U+0031 (Digit One)

	The "CSS2.1" and "CSS Syntax Level 3" specifications parse this as:
		- U+0041 (Latin Capital Letter A)
	*/
	content: "\000041";
}

tabatkins · 2021-11-30T21:55:33Z

Sure, impls did all sorts of weird things in the bad old days, but is there a current compat need for this? Do you know of pages that are currently broken with the specified Syntax behavior, but would be fixed if we allowed lone surrogates to be produced by the escape syntax?

The CSS1 syntax for Unicode escapes[...]

Sure, I'm still just not sure how a CSS1 spec detail is relevant to anything here. CSS2 was first published more than twenty years ago.

tabatkins added the css-syntax-3 label Jun 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[css‑syntax] `<string‑token>` should probably handle valid surrogate pairs #6352

[css‑syntax] `<string‑token>` should probably handle valid surrogate pairs #6352

ExE-Boss commented Jun 5, 2021

tabatkins commented Jun 8, 2021

ExE-Boss commented Nov 25, 2021 •

edited

tabatkins commented Nov 30, 2021

[css‑syntax] <string‑token> should probably handle valid surrogate pairs #6352

[css‑syntax] <string‑token> should probably handle valid surrogate pairs #6352

Comments

ExE-Boss commented Jun 5, 2021

tabatkins commented Jun 8, 2021

ExE-Boss commented Nov 25, 2021 • edited

tabatkins commented Nov 30, 2021

[css‑syntax] `<string‑token>` should probably handle valid surrogate pairs #6352

[css‑syntax] `<string‑token>` should probably handle valid surrogate pairs #6352

ExE-Boss commented Nov 25, 2021 •

edited