Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make MathML attributes ASCII case-insensitive #178

Open
fred-wang opened this issue Dec 13, 2019 · 11 comments
Open

Make MathML attributes ASCII case-insensitive #178

fred-wang opened this issue Dec 13, 2019 · 11 comments
Labels
css / html5 Issues related to CSS or HTML5 interoperability MathML 4 Issues affecting the MathML 4 specification

Comments

@fred-wang
Copy link

This is a follow-up of #22 ; we decided to follow HTML/CSS which treat things as ASCII case-insensitive. Concretely, ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt has this line:

017F; C; 0073; # LATIN SMALL LETTER LONG S

which means that falſe is case-insensitively equal to false. However, it is not ASCII case-insensitively equal to false (only a-z <-> A-Z equivalence are considered in that case).

Currently, the MathML Core spec just says "case-insensitive".

Note: for CSS colors, I reported w3c/csswg-drafts#4599

@fred-wang fred-wang added css / html5 Issues related to CSS or HTML5 interoperability MathML Core Issues affecting the MathML Core specification need resolution Issues needing resolution at MathML Refresh CG meeting need specification update Issues requiring specification changes need tests Issues related to writing WPT tests labels Dec 13, 2019
@fred-wang
Copy link
Author

I did a quick check and for the MathML-specific definitions, I only see case-insensitive against strings with ASCII letters and dashes. So the only difference would be for "LATIN SMALL LETTER LONG S", "KELVIN SIGN" and maybe a few "LATIN SMALL LIGATURE" (e.g. double-STruck). Unlikely for "LATIN CAPITAL LETTER I WITH DOT ABOVE" if the Turkish rule is used. See w3c/csswg-drafts#4599 (comment)

@NSoiffer
Copy link
Contributor

That's a good catch. I'm pretty sure we all agree that we only mean ASCII case-insensitivity. I suggest we add the following to the spec, which is a slight rewording from the HTML spec:

Many strings in the HTML and CSS syntax (e.g. the names of elements and their attributes) are case-insensitive, but only for ASCII upper alphas and ASCII lower alphas. For convenience, in this specification this is just referred to as "case-insensitive".

I suggest this goes into Appendix G.1: Document Conventions.

@fred-wang
Copy link
Author

I would prefer to be explicit everywhere and use "ASCII case-insensitive" with a link to https://infra.spec.whatwg.org/#ascii-case-insensitive ; this seems to be what the HTML and CSS specifications do (or how they would be fixed it e.g. w3c/csswg-drafts#4599 (comment)). I'm sure if we just keep case-insensitive as it is now, people will easily not read the appendix. We should also avoid duplicating definition from HTML5 as it was mentioned in another issue.

@fred-wang
Copy link
Author

Consensus from 2019/12/16: Move to ASCII case-insensitiveness

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Dec 17, 2019
The mathsize and dir attributes are defined modulo ASCII
case-insensitive equivalence and are mapped to CSS font-size and
direction properties [1] [2]. Since the CSS keywords are themselves
defined modulo ASCII case-insensitive equivalence [3], there is not need
to filter out other (Unicode) case-insensitive equivalent keywords
(e.g. "ſmall") in the MathML Code, they will be rejected by the CSS
parser.

This CL replaces DeprecatedEqualIgnoringCase with EqualIgnoringASCIICase
and adds tests to ensure that (Unicode) case-insensitive equivalent
strings remain disallowed.

[1] https://mathml-refresh.github.io/mathml-core/#global-attributes
[2] w3c/mathml#178
[3] https://www.w3.org/TR/css-values-4/#keywords

Bug: 6606
Bug: 627682
Change-Id: Ice84368c8cc7e8fff9faccb454c23fad87b99d59
@fred-wang
Copy link
Author

These are the attributes, with the behavior changes that will require tests:

  • boolean attributes: Behavior change for falSe.
  • dir: No behavior change.
  • mathvariant: Behavior change for e.g. double-StrucK
  • display: Behavior change for blocK.
  • form: Behavior change for poStfix.
  • notation: Behavior change for e.g. updiagonalStriKe
  • frame: Behavior change for e.g. daShed

Other attributes rely on CSS ( https://mathml-refresh.github.io/mathml-core/#types-for-mathml-attribute-values ) so nothing is changed here (although tests can always be added).

fred-wang added a commit to w3c/mathml-core that referenced this issue Dec 17, 2019
fred-wang pushed a commit to web-platform-tests/wpt that referenced this issue Dec 17, 2019
…20807)

The mathsize and dir attributes are defined modulo ASCII
case-insensitive equivalence and are mapped to CSS font-size and
direction properties [1] [2]. Since the CSS keywords are themselves
defined modulo ASCII case-insensitive equivalence [3], there is not need
to filter out other (Unicode) case-insensitive equivalent keywords
(e.g. "ſmall") in the MathML Code, they will be rejected by the CSS
parser.

This CL replaces DeprecatedEqualIgnoringCase with EqualIgnoringASCIICase
and adds tests to ensure that (Unicode) case-insensitive equivalent
strings remain disallowed.

[1] https://mathml-refresh.github.io/mathml-core/#global-attributes
[2] w3c/mathml#178
[3] https://www.w3.org/TR/css-values-4/#keywords

Bug: 6606
Bug: 627682
Change-Id: Ice84368c8cc7e8fff9faccb454c23fad87b99d59
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Dec 17, 2019
The mathsize and dir attributes are defined modulo ASCII
case-insensitive equivalence and are mapped to CSS font-size and
direction properties [1] [2]. Since the CSS keywords are themselves
defined modulo ASCII case-insensitive equivalence [3], there is not need
to filter out other (Unicode) case-insensitive equivalent keywords
(e.g. "ſmall") in the MathML Code, they will be rejected by the CSS
parser.

This CL replaces DeprecatedEqualIgnoringCase with EqualIgnoringASCIICase
and adds tests to ensure that (Unicode) case-insensitive equivalent
strings remain disallowed.

[1] https://mathml-refresh.github.io/mathml-core/#global-attributes
[2] w3c/mathml#178
[3] https://www.w3.org/TR/css-values-4/#keywords

Bug: 6606
Bug: 627682
Change-Id: Ice84368c8cc7e8fff9faccb454c23fad87b99d59
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1970615
Reviewed-by: Kent Tamura <tkent@chromium.org>
Commit-Queue: Frédéric Wang <fwang@igalia.com>
Cr-Commit-Position: refs/heads/master@{#725481}
@davidcarlisle
Copy link
Collaborator

If we want to keep the relax schema there are two choices either we could say in words that values should be ascii-lowercased before validation or we could make the schema do the case insensitive match.

That would mean for example changing

attribute mathvariant {"normal" | "bold" | "italic" | "bold-italic" | "double-struck" | "bold-fraktur" | "script" | "bold-script" | "fraktur" | "sans-serif" | "bold-sans-serif" | "sans-serif-italic" | "sans-serif-bold-italic" | "monospace" | "initial" | "tailed" | "looped" | "stretched"}?,

to

attribute mathvariant {xsd:string{pattern="[Nn][Oo][Rr][Mm][Aa][Ll]|[Bb][Oo][Ll][Dd]|[Ii][Tt][Aa][Ll][Ii][Cc]|[Bb][Oo][Ll][Dd]-[Ii][Tt][Aa][Ll][Ii][Cc]|[Dd][Oo][Uu][Bb][Ll][Ee]-[Ss][Tt][Rr][Uu][Cc][Kk]|[Bb][Oo][Ll][Dd]-[Ff][Rr][Aa][Kk][Tt][Uu][Rr]|[Ss][Cc][Rr][Ii][Pp][Tt]|[Bb][Oo][Ll][Dd]-[Ss][Cc][Rr][Ii][Pp][Tt]|[Ff][Rr][Aa][Kk][Tt][Uu][Rr]|[Ss][Aa][Nn][Ss]-[Ss][Ee][Rr][Ii][Ff]|[Bb][Oo][Ll][Dd]-[Ss][Aa][Nn][Ss]-[Ss][Ee][Rr][Ii][Ff]|[Ss][Aa][Nn][Ss]-[Ss][Ee][Rr][Ii][Ff]-[Ii][Tt][Aa][Ll][Ii][Cc]|[Ss][Aa][Nn][Ss]-[Ss][Ee][Rr][Ii][Ff]-[Bb][Oo][Ll][Dd]-[Ii][Tt][Aa][Ll][Ii][Cc]|[Mm][Oo][Nn][Oo][Ss][Pp][Aa][Cc][Ee]|[Ii][Nn][Ii][Tt][Ii][Aa][Ll]|[Tt][Aa][Ii][Ll][Ee][Dd]|[Ll][Oo][Oo][Pp][Ee][Dd]|[Ss][Tt][Rr][Ee][Tt][Cc][Hh][Ee][Dd]"}}?,

which works but isn't very human readable or informative.

Since we already need some pre-processing described in words to allow data-foo attributes (or onfoo attributes to be ignored, I'm tempted to suggest we keep the existing string match but could be persuaded otherwise....

@fred-wang
Copy link
Author

I think this was already the case since #22 ; not sure how important it is for legacy XML applications. I wonder what is done for HTML5 ?

@davidcarlisle
Copy link
Collaborator

davidcarlisle commented Dec 17, 2019 via email

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Dec 23, 2019
…to validate mathsize and dir, a=testonly

Automatic update from web-platform-tests
[mathml] Use ASCII case-insensitiveness to validate mathsize and dir (#20807)

The mathsize and dir attributes are defined modulo ASCII
case-insensitive equivalence and are mapped to CSS font-size and
direction properties [1] [2]. Since the CSS keywords are themselves
defined modulo ASCII case-insensitive equivalence [3], there is not need
to filter out other (Unicode) case-insensitive equivalent keywords
(e.g. "ſmall") in the MathML Code, they will be rejected by the CSS
parser.

This CL replaces DeprecatedEqualIgnoringCase with EqualIgnoringASCIICase
and adds tests to ensure that (Unicode) case-insensitive equivalent
strings remain disallowed.

[1] https://mathml-refresh.github.io/mathml-core/#global-attributes
[2] w3c/mathml#178
[3] https://www.w3.org/TR/css-values-4/#keywords

Bug: 6606
Bug: 627682
Change-Id: Ice84368c8cc7e8fff9faccb454c23fad87b99d59
--

wpt-commits: 0cc6eb16b46a76070b11ce771e4026da2e328a6d
wpt-pr: 20807
xeonchen pushed a commit to xeonchen/gecko that referenced this issue Dec 23, 2019
…to validate mathsize and dir, a=testonly

Automatic update from web-platform-tests
[mathml] Use ASCII case-insensitiveness to validate mathsize and dir (#20807)

The mathsize and dir attributes are defined modulo ASCII
case-insensitive equivalence and are mapped to CSS font-size and
direction properties [1] [2]. Since the CSS keywords are themselves
defined modulo ASCII case-insensitive equivalence [3], there is not need
to filter out other (Unicode) case-insensitive equivalent keywords
(e.g. "ſmall") in the MathML Code, they will be rejected by the CSS
parser.

This CL replaces DeprecatedEqualIgnoringCase with EqualIgnoringASCIICase
and adds tests to ensure that (Unicode) case-insensitive equivalent
strings remain disallowed.

[1] https://mathml-refresh.github.io/mathml-core/#global-attributes
[2] w3c/mathml#178
[3] https://www.w3.org/TR/css-values-4/#keywords

Bug: 6606
Bug: 627682
Change-Id: Ice84368c8cc7e8fff9faccb454c23fad87b99d59
--

wpt-commits: 0cc6eb16b46a76070b11ce771e4026da2e328a6d
wpt-pr: 20807
@ByteEater-pl
Copy link

legacy XML applications

Could you, please, define this term, @fred-wang? I don't know which XML applications are legacy and which aren't.

@fred-wang
Copy link
Author

legacy XML applications

Could you, please, define this term, @fred-wang? I don't know which XML applications are legacy and which aren't.

I believe I was talking about XML-based MathML3 implementations.

@fred-wang fred-wang removed need resolution Issues needing resolution at MathML Refresh CG meeting need specification update Issues requiring specification changes labels Mar 21, 2020
@fred-wang fred-wang added the MathML 4 Issues affecting the MathML 4 specification label May 22, 2020
@fred-wang
Copy link
Author

Removing "tests" label, we have tests for mathsize and dir. It's not exhaustive, but HTML or CSS don't test exhaustively either...

Also removing core label since the only remaining changes are in mathml full

@fred-wang fred-wang removed MathML Core Issues affecting the MathML Core specification need tests Issues related to writing WPT tests labels May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
css / html5 Issues related to CSS or HTML5 interoperability MathML 4 Issues affecting the MathML 4 specification
Projects
None yet
Development

No branches or pull requests

4 participants