-
-
Notifications
You must be signed in to change notification settings - Fork 35
Disallow whitespace as the first character of a reserved-body in a reserved-statement. #730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow whitespace as the first character of a reserved-body in a reserved-statement. #730
Conversation
spec/message.abnf
Outdated
|
|
||
| ; Reserve additional .keywords for use by future versions of this specification. | ||
| reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression) | ||
| reserved-statement = reserved-keyword [s reserved-body-trimmed] 1*([s] expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming here is unlike our usual naming. It would be better to follow the start/part convention and eschew trimmed. Something like:
reserved-body = reserved-body-start [reserved-body-part]
reserved-body-start = *([s] 1*reserved-body-part)
reserved-body-part = reserved-char / reserved-escape / quotedNote that my version makes an empty body possible and also a 1 character body possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to follow the start/part convention
I see what you mean. However, the changes you proposed are a no-op. Let me formulate two alternative proposals.
e418a4e to
43c9264
Compare
…served-statement. In the 'reserved-statement' nonterminal, there is an ambiguity if there is more than one whitespace character between the 'reserved-keyword' and the first non-whitespace character of the 'reserved-body', because these whitespace characters can be seen as part of the 's' nonterminal or as part of the 'reserved-body' nonterminal. According to the principles explained in unicode-org#725 and the proposed resolution of unicode-org#721, it is not desired that a 'reserved-body' starts with a whitespace character; rather, such a whitespace character is meant to be interpreted as part of the preceding 's' nonterminal. Test case: ``` .regex /foo/{xyz}{{hello}} ``` This patch removes this ambiguity, by disallowing whitespace as the first character of a 'reserved-body' in a reserved-statement. It thus fixes the first part of unicode-org#721. Details: - Other occurrences of 'resolved-body' (after a 'reserved-annotation' or 'private-use-annotation') are not affected. - A new nonterminal 'resolved-body-part' is introduced, referenced twice. - A new nonterminal 'reserved-body-in-statement' is introduced, referenced once. Its purpose is to clarify that the two parts belong together. - A new nonterminal 'reserved-body-in-statement-start' is introduced, in order to follow the common *-start / *-part idiom.
43c9264 to
bbf6880
Compare
|
|
||
| ; Reserve additional .keywords for use by future versions of this specification. | ||
| reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression) | ||
| reserved-statement = reserved-keyword [s reserved-body-in-statement] 1*([s] expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should steer clear of the -in-statement part.
reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)
reserved-body = reserved-body-start *([s] 1*reserved-body-part)
reserved-body-start = *([s] 1*reserved-body-part)
reserved-body-part = reserved-char / reserved-escape / quotedI recognize that reserved-body is also used by private-use-annotation. However, the intention there should not be to capture ending whitespace either. private-use-annotation is part of annotation and has the same relationship to e.g. the optional attribute in expression. Trimming private use is thus also desirable to eliminate ambiguity.
|
In response to your comment, I am proposing two alternatives: one in this pull request, and one in #731. Pick the one you prefer and discard the other one. |
|
@aphillips writes:
The ending whitespace has already been handled through #727. Here we are discussing the whitespace at the start / before 'reserved-body'. |
|
Canceling in favour of #731 (group decision from 2024-03-18). |
In the 'reserved-statement' nonterminal, there is an ambiguity if there is more than one whitespace character between the 'reserved-keyword' and the first non-whitespace character of the 'reserved-body', because these whitespace characters can be seen as part of the 's' nonterminal or as part of the 'reserved-body' nonterminal.
According to the principles explained in #725 and the proposed resolution of #721, it is not desired that a 'reserved-body' starts with a whitespace character; rather, such a whitespace character is meant to be interpreted as part of the preceding 's' nonterminal.
Test case:
This patch removes this ambiguity, by disallowing whitespace as the first character of a 'reserved-body' in a reserved-statement.
It thus fixes the first part of #721.
Details: