Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should it be called "case-sensitive"? #5067

Closed
aphillips opened this issue Nov 7, 2019 · 6 comments · Fixed by #5538
Closed

should it be called "case-sensitive"? #5067

aphillips opened this issue Nov 7, 2019 · 6 comments · Fixed by #5538
Assignees
Labels
i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on.

Comments

@aphillips
Copy link
Contributor

https://html.spec.whatwg.org/multipage/infrastructure.html#case-sensitivity-and-string-comparison

When discussing issue #5066 members of the I18N WG expressed concern that the use of the term "case-sensitive" was misleading, since what is actually happening is codepoint-by-codepoint comparison. Should "case-sensitive" be called something else, such as "identical" or "codepoint"?

Note: I'm aware that the term "case sensitive" is already used and linked widely, not only in HTML but also in many attendant specs that refer to the definition provided by HTML.

@annevk annevk added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Nov 7, 2019
@annevk
Copy link
Member

annevk commented Nov 7, 2019

I'd be okay with minting "equals" for this (or perhaps "is", since I suspect that's generally what we use these days), in https://infra.spec.whatwg.org/#strings. But someone would have to do the work of identifying all specifications and getting them all changed.

@Yay295
Copy link
Contributor

Yay295 commented Nov 8, 2019

"visually equivalent"? Though I suppose that would have other issues.

@aphillips
Copy link
Contributor Author

@Yay295 There are plenty of visually equivalent strings that should not be treated as equivalent. Cf. here

@Celedence
Copy link

"case dependent" Might be a good way to go too?

@aphillips
Copy link
Contributor Author

I reviewed the current use of case-sensitive in HTML. There are 39 occurrences of it (not counting the 2 in the definition). In a number of cases, the phrase is used to emphasize that value-matching expects string equality, especially with regard to case. And this is, obviously, only in the HTML spec. Before making PRs, I thought I ought to check on the right course of action. What I had in mind was:

  1. In Infra, define is as code point-by-code point comparison. Include a synonym identical to for cases where the spec needs to emphasize the equality.
    a. Include a note emphasizing that this is case and code point sequence (normalization) sensitive.
  2. In HTML replace case-sensitive with a reference to is.
    a. Include a note explaining that this was formerly called case-sensitive and retaining the anchor/definition for other, yet-to-be-modified specs.
  3. Replace the 39 occurrences with suitable edits to is/identical to (or whatever we decide to spell them as)

Does this sound like the right approach?

@aphillips aphillips self-assigned this May 7, 2020
@domenic
Copy link
Member

domenic commented May 7, 2020

Generally sounds good to me.

In Infra, define is as code point-by-code point comparison. Include a synonym identical to for cases where the spec needs to emphasize the equality.

Probably you'd also want to include a note that by default, specs using Infra compare strings in this manner. I don't want people to think that if a spec compares strings without linking to "is" or "identical to", then it's undefined. (This would basically be a port of HTML's "Except where otherwise stated, string comparisons must be performed in a case-sensitive manner.")

a. Include a note explaining that this was formerly called case-sensitive and retaining the anchor/definition for other, yet-to-be-modified specs.

Since HTML doesn't have automatic cross-linking anyway, we'd have to add a bullet point for the relevant "is" to the Infra section of https://html.spec.whatwg.org/#dependencies. At that point maybe you should just add the appropriate anchor there (probably as an empty <span id="case-sensitive"></span>). That seems cleaner than having a leftover <dfn> in https://html.spec.whatwg.org/#case-sensitivity-and-string-comparison.

At that point https://html.spec.whatwg.org/#case-sensitivity-and-string-comparison will be almost deleted; the remainder would just be the uses of "prefix match" which are in AppCache and which @annevk preferred to leave alone until we remove AppCache.

annevk pushed a commit to whatwg/infra that referenced this issue May 12, 2020
Adds definitions for string equality comparisons "is" and "identical to". Includes a blanket statement making these the default for string equality. Also includes a note spelling out the relationship to HTML's former "case-sensitive" comparison and appropriate health warning about visual and encoding sequence identity.

Helps with whatwg/html#5067.
domenic pushed a commit that referenced this issue May 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on.
Development

Successfully merging a pull request may close this issue.

5 participants