Grammar ambiguity #2954

andersfr · 2019-07-27T13:33:23Z

In PtrTypeStart we have the following syntax:
KEYWORD_align LPAREN Expr (COLON INTEGER COLON INTEGER)? RPAREN

Or as simplified example: align(Expr:1:32)

This conflicts with the block expression syntax: Identifier: {...}

The issue arises because Identifier is a valid Expr production and thus it becomes impossible to deduce the meaning of the colon token without consuming additional input. In more theoretical terms this is an instance of a shift-reduce conflict.
The simplest solution is to introduce a :: token and use it instead of the leading colon. It looks almost identical.
KEYWORD_align LPAREN Expr (COLONCOLON INTEGER COLON INTEGER)? RPAREN

Or as simplified example: align(Expr::1:32)

The text was updated successfully, but these errors were encountered:

hryx · 2019-07-27T20:01:17Z

To be precise, a Parsing Expression Grammars (PEG) is by definition unambiguous -- the first match is the correct match.

A simple solution is to enclose the identifier in parentheses:

align((Expr):1:32)

And just for fun, to demonstrate the grammar precedence (not an honest suggestion):

align(blk: {
    break :blk Expr;
}:1:32)

andersfr · 2019-07-28T10:06:32Z

The problem is not that it cannot eventually be disambiguated. It is that this issue breaks the possibility of making a context-free grammar, e.g. using LALR(1) which I happen to be using.
Your conclusion on first match is also incorrect. Upon seeing Identifier: the first match is to assume it is a block label that must be followed by a block, because we are currently parsing an expression. The parser then either throws an error upon seeing an IntegerLiteral or must resort to backtracking to resolve the conflict.
The Zig compiler indeed breaks on this problem as explained:

pub fn main() void {
    const identifier: usize = 8;
    const test1 = *align(label: { break :label 8; }:1:32) u8;
    const test2 = *align(identifier:1:32) u8;
}

issue.zig:4:37: error: invalid token: '1'
const test2 = *align(identifier:1:32) u8;

hryx · 2019-07-28T22:15:18Z

using LALR(1) which I happen to be using

If your goal is to write a ~~CFG~~ LL/LR/LALR for Zig, that may not be possible. I'm not an expert, but from Wikipedia:

[...] but the unlimited lookahead capability of the grammar formalism is then lost. Therefore, not all languages that can be expressed using parsing expression grammars can be parsed by LL or LR parsers.

Regarding the error from your example, that is the expected result with today's grammar. identifier:1:32 will not work. Of course this won't help if you're writing a parser, but as a user, enclosing identifier in parentheses makes the intention possible.

Your conclusion on first match is also incorrect.

What I meant was to describe the rule-matching behavior of a PEG in general.

Note that syntax ambiguities are precisely what inspired moving to the new grammar type. Hope that helps or clears some things up.

EDIT: I mistakenly said that a CFG might not be possible to write, when I should have said LR

andersfr · 2019-07-29T06:26:33Z

A Parsing Expression Grammar (PEG) will indeed be able to parse identifier:1:32 because it has unlimited lookahead and this is only ambiguous with lookahead less than 2. It is not an instance of prioritization between rules (they are mutually exclusive).

Increasing lookahead or context dependence slows down compilers. Either by having excessive branching in the parser logic or postponing the problem to semantic analysis.

It is trivial to fix the bug in the Zig compiler (after ALIGN LPAREN check for the edge case before branching into the ast_parse_expression). But let's rather fix the syntax instead of forcing the compiler into submission with manual edge case handling.

hryx · 2019-07-29T06:54:08Z

From what I can tell, this is a proposal and not a bug. Maybe I'm conflating the spec and its implementation. But I should disengage from the conversation since I seem to be out of my depth. Apologies for convoluting the issue.

andersfr · 2019-07-29T07:10:12Z

No apologies needed. It gave me chance to discover the zig-spec PEG implementation and give it a test run.
I will close the issue and resubmit as proposal for changing the align syntax.

andersfr closed this as completed Jul 29, 2019

andersfr mentioned this issue Jul 29, 2019

Proposal: change the align(Expr:IntegerLiteral:IntegerLiteral) syntax #2973

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grammar ambiguity #2954

Grammar ambiguity #2954

andersfr commented Jul 27, 2019

hryx commented Jul 27, 2019

andersfr commented Jul 28, 2019 •

edited

hryx commented Jul 28, 2019 •

edited

andersfr commented Jul 29, 2019

hryx commented Jul 29, 2019

andersfr commented Jul 29, 2019

Grammar ambiguity #2954

Grammar ambiguity #2954

Comments

andersfr commented Jul 27, 2019

hryx commented Jul 27, 2019

andersfr commented Jul 28, 2019 • edited

hryx commented Jul 28, 2019 • edited

andersfr commented Jul 29, 2019

hryx commented Jul 29, 2019

andersfr commented Jul 29, 2019

andersfr commented Jul 28, 2019 •

edited

hryx commented Jul 28, 2019 •

edited