Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need to discuss regular expressions? #54

Closed
aphillips opened this issue Feb 2, 2016 · 8 comments
Closed

Do we need to discuss regular expressions? #54

aphillips opened this issue Feb 2, 2016 · 8 comments
Assignees
Labels

Comments

@aphillips
Copy link
Contributor

There is an issue comment in the current text that says:

Following requirements added 2013-10-29. Needs discussion of regular expressions.

Here is the proposed requirement (also in the current text):

[S][I] Specifications that define a regular expression syntax MUST provide at least Basic Unicode
Level 1 support per [UTS18] and SHOULD provide Extended or Tailored (Levels 2 and 3) support.

Should we add more discussion of regular expressions? Is that beyond the scope of our document? Should we keep the above requirement? Or should we do something different?

@aphillips
Copy link
Contributor Author

We need to consider this question in WG.

@aphillips
Copy link
Contributor Author

Discussed in teleconference https://www.w3.org/2017/06/01-i18n-minutes.html

The notes there are not very helpful, since we didn't record the conversation, but changes to spec to follow.

@aphillips aphillips removed the question label Jun 4, 2017
@aphillips aphillips self-assigned this Jun 4, 2017
@aphillips
Copy link
Contributor Author

Currently we have this text in 5.1:

Regular expression syntaxes are sometimes useful in defining a format or protocol, since they allow users to specify values that are only partially known or which can vary. The definition or use of regular expression syntaxes or wildcards when considered over the range of Unicode encoding variations, and particularly when considering character or grapheme boundaries brings with it additional considerations.

[S][I] Specifications that define a regular expression syntax MUST provide at least Basic Unicode Level 1 support per [UTS18] and SHOULD provide Extended or Tailored (Levels 2 and 3) support.

Is that enough to close? Do we need a section discussing regex?

@asmusf
Copy link

asmusf commented Nov 26, 2017

"Unicode encoding variations" is not a defined term, and if people look up variation, they will only find standardized variation sequences, which I am sure were not of uppermost concerns here.

I can't tell (even after looking at the original in more detail) whether the concern about "variation" was about encoding forms or normalization forms. The header of the section mentions normalization, but encoding forms are also discussed.

It may be worth noting that in some cases comparisons should be preferably done in NFD - this is the case for comparing domain names against confusable variants, to give one example.

@aphillips
Copy link
Contributor Author

It's about the various and sundry different ways text can be encoded. It's not meant to be a term. That paragraph should be made clearer.

aphillips added a commit to aphillips/charmod-norm that referenced this issue Nov 27, 2017
aphillips added a commit that referenced this issue Nov 27, 2017
Address #54. Edited text to better address @asmusf's comment.
@aphillips
Copy link
Contributor Author

@asmusf check the above edit and see if that works better. Suggest edits as needed.

@asmusf
Copy link

asmusf commented Nov 28, 2017

Definitely better.

Source code for the text is now a single very long line per paragraph, just pointing that out if it matters.

@aphillips
Copy link
Contributor Author

Closing this issue. Reopen if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants