Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block and hash disambiguation #2

Closed
vickenty opened this issue Jul 19, 2016 · 5 comments
Closed

block and hash disambiguation #2

vickenty opened this issue Jul 19, 2016 · 5 comments

Comments

@vickenty
Copy link
Collaborator

This happens when a pair of curly braces is used as a stand-alone statement in a code block (sub, eval, etc), or as the first argument to one of the operators below:

  • print, printf, say
  • system, exec
  • sort, grep, map

(This is not related to prototypes, these operators are always parsed using special rules, even if parenthesis are used around the arguments. In expressions, after return keyword curlies are always treated as a hash literal.)

While a sufficiently powerful parser can probably handle this, rules used for this disambiguation are rather unique and would complicate the parser too much. The disambiguation rules are also not documented in full. In brief, perl checks if there's a comma right after first thing inside the braces (full disclosure below).

I'd like to make curlies always interpreted as a block:

  • for operators above, hash argument only makes sense in map context;
  • hashes at top-level are rare (eg. do "config.pl" or sub { { foo => 1 } }).

Several possible solutions:

  1. Require a semicolon after opening brace in ambiguous situations.

    map { $_ => 0 } @a; # not ok
    map {; $_ => 0 } @a; # ok
    sub foo { { 1 => 2 } } # not ok
    sub foo { return { 1 => 2 } } # ok
    sub foo { 1 => 2 } # ok
    { my $x; sub foo { $x } } # not ok
    print { $fh } "hi"; # not ok
  2. Require parens around expressions with comma or fat-comma inside ambiguous blocks.

    map { $_ => 0 } @a; # not ok
    map { ($_ => 0) } @a; # ok
    sub foo { { 1 => 2 } } # not ok
    sub foo { return { 1 => 2 } } # ok
    sub foo { 1 => 2 } # ok
    { my $x; sub { $x } } # ok
    print { $fh } "hi"; # ok
  3. Require parens around all expressions with comma or fat-comma, if not inside an expression. This is global change, but in return code block syntax becomes the same everywhere (unless I missed anything).

    map { $_ => 0 } @a; # not ok
    map { ($_ => 0) } @a; # ok
    sub foo { { 1 => 2 } } # not ok
    sub foo { return { 1 => 2 } } # ok
    sub foo { 1 => 2 } # not ok
    { my $x; sub { $x } } # ok
    print { $fh } "hi"; # ok

Disambiguation rules

Exact details are not really important to this issue, but I put them here for reference and entertainment.

Perl parses a pair of curly braces as a hash if one of the following is true:

  • there is nothing in the braces;
  • the second token is a fat comma;
  • the second token is a regular comma and the first token starts with a 'q' or non-lowercase letter.

Here token means a quoted string or command ('', "", ````, q{}, `qq{}` and `qx{}`) or a sequence of word characters.

{ } # hash
{ 1 } # block
{ 1, 2 } # hash
{ fuss, 2 } # block
{ Pack, 2 } # hash
{ quiz, 2 } # hash
{ fuss => 2 } # hash
{ qq{} => 2 } # hash
{ qr{} => 2 } # block
@vickenty
Copy link
Collaborator Author

Personally, I like option 3 the best. It is the biggest change to the language, but it makes all blocks parse the same.

@xsawyerx
Copy link
Owner

xsawyerx commented Aug 3, 2016

Let's embrace the solution then?

vickenty added a commit that referenced this issue Sep 9, 2016
Only top-level blocks and do keyword are added. The main goal is to
expose parsing ambiguity between blocks and hashes, discussed in #2.

Tests in t/block.t codify solution no. 3 from #2. Currently they fail,
because grammar is ambiguous.
@vickenty
Copy link
Collaborator Author

vickenty commented Sep 9, 2016

Before we embrace it, I want to prove that this solution works by implementing it in the Marpa parser. I already have code blocks and hash literals implemented, and tests that check for ambiguity in the parser.

@xsawyerx
Copy link
Owner

xsawyerx commented May 7, 2020

I think we resolved this using NonBraceExpression, in #14.

The rule, for posterity, is:

  • Keywords that support blocks (whether optional or not) cannot receive any argument that starts with a left brace, namely a hashref.
  • Top level expressions cannot include brace-leading expressions.

For example, using print {STUFF} MORE_STUFF:

  • print uses NonBraceExpression as the first argument. This means that {STUFF} will only be a block, never a hashref.
  • Since NonBraceExpression supports Unary operator, the statement print +{...} will still work correctly as hahsref.
  • Additionally, sub foo () { {...} } will always view the last {...} as a block.
  • If there is a comma (or fat comma) inside it, we will loudly complain because that's not allowed.
  • return {} will work because return accepts any Expression, not just NonBraceExpression. (It also doesn't accept a Block as an argument, so it will not accidentally misparse it.

There is still the case of sub foo () { {} } not understanding this is a hashref, but instead thinking it's a block. We can handle this by requiring a return as the beginning of the last statement in a subroutine.

@vickenty did I miss anything?

@xsawyerx xsawyerx closed this as completed May 7, 2020
xsawyerx added a commit that referenced this issue May 7, 2020
sort has multiple options:

1. sort { $a <=> $b } @foo;
2. sort @foo;
3. sort $subname @foo;
4. sort subname @foo;

Option #1, #2, and #3 are supported.
Option #3 requires explicitly a scalar variable, not an expression.
Option #4 is very odd. It explicitly wants a bareword of a sub name.

Option #4 is *NOT* supported.
@vickenty
Copy link
Collaborator Author

vickenty commented May 8, 2020

This may require some fine tuning later: print {} $b would be parsed by guac, but not by perl. We may need to additionally prohibit empty blocks in ambiguous positions, or something.

We also removed comma and fatcomma as top-level operators in blocks: sub foo { 1, 2 } is not valid syntax. This was done to avoid disambiguation rules in perl that look for comma inside the braces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants