-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify source bidi isolation rules #781
Conversation
I wish you'd added this as a separate alternative. I don't like that the isolates are part of the You removed unquoted literals from being amenable to bidi isolation, but they should still be isolatable, no? |
Including the isolates in
They are, covered by the change to unquoted = name / number-literal
|
The problem with allowing isolates into
Actually, numbers are complicated in bidi because digits are weakly directional. The minus sign can swing around onto the "wrong" side visually. The other reason I had unquoted and quoted together is that it simplifies what tools have to do. A tool can blindly isolate any literal separate from the decision to quote it and can blindly remove isolates from literals without looking at the contents. |
As proposed, both of those strings would match the So the parsed value of the name would be "name" for both of the above, and they would be considered equal.
But
The proposed change doesn't change the number of constructs for which this can be done; it replaces "unquoted literals" with "names". Doing so lets us remove needing to separately and additionally pick out the LRM/RLM/ALM from the productions that include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change this PR to make your proposal an additional option, not overwriting the original design.
exploration/bidi-usability.md
Outdated
/ (quoted / (unquoted [bidi])) | ||
quoted-pattern = ( open-isolate "{{" pattern "}}" close-isolate) | ||
/ ("{{" pattern "}}") | ||
name = (open-isolate name-body close-isolate) / name-body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a problem because name
is used to build a variety of other constructs (variable
, reserved-keyword
, identifier
, etc.). This change puts the isolates inside these constructs, e.g. $\u2066name\u2069
rather than on the outside.
This will make it harder for implementations, since they can't take the parsed token and compare it immediately. They have to stop to remove isolates. My original design avoided this problem by making the isolates not parse into names/identifiers/tokens.
As requested, refactored as an alternative to the proposed solution. Also addressed the concerns identified in #787 and #788, and added an example showing how I have also validated this solution by implementing it in my parser. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requested changes are editorial. Otherwise I would approve this addition.
I don't think I agree with this option. The strongly directional marks are included in the proposed solution for a different reason than might be assumed (I mention this below) and I don't think putting isolates into name
has been fully accounted for.
2. Rather than patching the `name` rule with an optional trailing LRM/RLM/ALM, | ||
allow for its proper isolation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call what we did above "patching". What we allow above with the strongly directional marks is allow bidi users to include them (to make the string look okay in a normal text editor) the way they might normally do when editing text. The productions we used don't make these marks part of the token, so they don't affect processing.
Allowing isolation is a separate consideration.
|
||
Quoted patterns, quoted literals, and names may be isolated by LRI/RLI/FSI...PDI. | ||
For names and quoted literals, the isolate characters are outside the body of the token, | ||
but for quoted patterns, the isolates are in the middle of the `{{` and `}}` characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"middle" could mean anywhere inside the pattern quotes.
but for quoted patterns, the isolates are in the middle of the `{{` and `}}` characters. | |
but for quoted patterns, the isolates are in between the `{` and `}` in the `{{` and `}}` sequences. |
```abnf | ||
name = [open-isolate] name-start *name-char [close-isolate] | ||
quoted = [open-isolate] "|" *(quoted-char / quoted-escape) "|" [close-isolate] | ||
quoted-pattern = "{" [open-isolate] "{" pattern "}" [close-isolate] "}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This puts the isolate inside the {{
and }}
? Asking to be sure I'm reading this right. The above text didn't seem to mean this, although now I see your intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the intent: {\u2066{
Co-authored-by: Addison Phillips <addison@unicode.org>
Drop the
bidi
rule, and allowname
to be LR/RL/FS -isolated.Allow an LRI immediately after a non-content newline.
Relax expression & markup isolation to not require pairing on a syntactic level, as the LRI can also be terminated by a newline.