-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grammar ambiguity #2954
Comments
To be precise, a Parsing Expression Grammars (PEG) is by definition unambiguous -- the first match is the correct match. A simple solution is to enclose the identifier in parentheses: align((Expr):1:32) And just for fun, to demonstrate the grammar precedence (not an honest suggestion): align(blk: {
break :blk Expr;
}:1:32) |
The problem is not that it cannot eventually be disambiguated. It is that this issue breaks the possibility of making a context-free grammar, e.g. using LALR(1) which I happen to be using. pub fn main() void {
const identifier: usize = 8;
const test1 = *align(label: { break :label 8; }:1:32) u8;
const test2 = *align(identifier:1:32) u8;
} issue.zig:4:37: error: invalid token: '1' |
If your goal is to write a
Regarding the error from your example, that is the expected result with today's grammar.
What I meant was to describe the rule-matching behavior of a PEG in general. Note that syntax ambiguities are precisely what inspired moving to the new grammar type. Hope that helps or clears some things up. EDIT: I mistakenly said that a CFG might not be possible to write, when I should have said LR |
A Parsing Expression Grammar (PEG) will indeed be able to parse Increasing lookahead or context dependence slows down compilers. Either by having excessive branching in the parser logic or postponing the problem to semantic analysis. It is trivial to fix the bug in the Zig compiler (after ALIGN LPAREN check for the edge case before branching into the ast_parse_expression). But let's rather fix the syntax instead of forcing the compiler into submission with manual edge case handling. |
From what I can tell, this is a proposal and not a bug. Maybe I'm conflating the spec and its implementation. But I should disengage from the conversation since I seem to be out of my depth. Apologies for convoluting the issue. |
No apologies needed. It gave me chance to discover the zig-spec PEG implementation and give it a test run. |
In PtrTypeStart we have the following syntax:
KEYWORD_align LPAREN Expr (COLON INTEGER COLON INTEGER)? RPAREN
Or as simplified example:
align(Expr:1:32)
This conflicts with the block expression syntax:
Identifier: {...}
The issue arises because Identifier is a valid Expr production and thus it becomes impossible to deduce the meaning of the colon token without consuming additional input. In more theoretical terms this is an instance of a shift-reduce conflict.
The simplest solution is to introduce a :: token and use it instead of the leading colon. It looks almost identical.
KEYWORD_align LPAREN Expr (COLONCOLON INTEGER COLON INTEGER)? RPAREN
Or as simplified example:
align(Expr::1:32)
The text was updated successfully, but these errors were encountered: