Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow precedences to be specified using strings and a partial ordering relation #939

Merged
merged 3 commits into from Feb 25, 2021

Conversation

maxbrunsfeld
Copy link
Contributor

@maxbrunsfeld maxbrunsfeld commented Feb 25, 2021

Background

For some grammars, it is very awkward to specify all of their rules' precedence as a total ordering, using integers. The problem is that we might need to statically (via precedence) resolve conflicts between rules e, f, and g, and also between rules t, u, and v, but want to avoid resolving any conflicts between, say, e and u, or g and t.

This bug in the TypeScript grammar is an example of a case where it's really hard to do this using the integer precedence system.

What we need is a way of specifying precedences as a partial ordering. There's some discussion about this model in this blog post: Operator Precedence: We Can Do Better.

Solution

This PR makes two backward-compatible changes to the grammar DSL:

  1. Grammars have a new optional property called precedences, that specifies a set of precedence names, and a set of ordering relations between those names. Each ordering relation is structured as an array of names, in descending order of their strength.
  2. Within grammar rules, the prec family of functions can now take a precedence name instead of an integer. When constructing the parser, these names will be compared according to the grammar's precedences lists. For some pairs of names, there will be no ordering relation that includes both names. These two names will be treated as equal in strength, for conflict-resolution purposes.

Example

We might use these features in the TypeScript grammar, doing something like this:

grammar({
  // ...
  precedences: () => [
    // Within expressions,
    // * dot binds tighter than comparison
    // * comparison binds tighter than logical-and
    // * logical-and binds tighter than logical-or
    [
      'member',
      'compare',
      'and',
      'or',
    ],
    // Within types:
    // * dot binds tighter than intersection
    // * intersection binds tighter than union
    [
      'type_member',
      'type_intersection',
      'type_union',
    ]
  ],

  rules: {
    // ...
    
    // This binds more tightly than other expression rules, but has *no* static resolution
    // for any conflicts with *type*-related rules.
    member_expression: $ => prec('member', seq(
      $._primary_expression,
      '.',
      $.identifier
    )),
    
    // This binds more tightly than other type rules, but has *no* static resolution
    // for any conflicts with *expression*-related rules.
    nested_type_identifier: $ => prec('type_member', seq(
      choice(
        $._type_identifier,
        $.generic_type,
        $.nested_type_identifier,
      ),
      '.',
      $._type_identifier
    )),
  }
});

Details

Grammars using integer precedence will not be affected. It's also valid to use a mixture of integer precedences and named precedences: those two groups will not be compared to each other.

Right now, the strings are not used in comparisons, but they
are passed through the grammar processing pipeline, and are
available to the parse table construction algorithm.

This also cleans up a confusing aspect of the parse table
construction, in which precedences and associativities were
temporarily stored in the parse table data structure itself.
@maxbrunsfeld maxbrunsfeld merged commit e49a56e into master Feb 25, 2021
@maxbrunsfeld maxbrunsfeld deleted the partial-order-precedence branch February 25, 2021 21:16
@ubolonton
Copy link
Contributor

Thanks so much for this! IMO this is a huge usability improvement.

I stopped working on the Scala grammar due to the frustration with the total ordering. With this I think I'll start working on it again.

@maxbrunsfeld
Copy link
Contributor Author

That's great to hear!

@razzeee
Copy link
Contributor

razzeee commented Feb 27, 2021

This seems to be missing from the dsl types?

/// <reference types="tree-sitter-cli/dsl" />
// @ts-check

Causes this to error

@ahlinc
Copy link
Contributor

ahlinc commented Feb 27, 2021

/// <reference types="tree-sitter-cli/dsl" />
// @ts-check

@razzeee could you tell where do you usually put such lines? Is it globals.d.ts file or somewhere else?

@razzeee
Copy link
Contributor

razzeee commented Feb 27, 2021

Top of your grammar file. I just have one which is grammar.js

https://www.github.com/elm-tooling/tree-sitter-elm/tree/master/grammar.js

@maxbrunsfeld
Copy link
Contributor Author

Thanks for the heads up @razzeee.

@maxbrunsfeld
Copy link
Contributor Author

seems to be missing from the dsl types?

Fixed on master.

@alemuller
Copy link
Contributor

It would be nice if commenting a partial ordering relation on precedence showed a conflict example, like it happens when commenting a tuple of rules on conflicts. I find the "Unresolved conflict" message useful to remember the conflicts.

I've been using the following strategy to workaround the "Undeclared precedence" error:

//['foo', 'bar']
['foo']
['bar']

Maybe the "Unresolved conflict" could be thrown after "Undeclared precedence" error?

By the way, I appreciate the implementation of this elegant solution for the precedence problem.

@maxbrunsfeld
Copy link
Contributor Author

Ok, an update on this - In Tree-sitter v0.19.1, I've added a feature make the partial orderings easier to specify. You can control the precedence of an entire grammar rule by including a reference to the rule (using the $.my_rule syntax) directly in the precedences list.

For example, in JavaScript, to control the precedence of await expression and arrow functions relative to binary operators, you can simply do this:

precedences: $ => [
  [
    'binary_times',
    'binary_plus',
    // ...
    $.await_expression,
    $.arrow_function
  ],
],

I'll write up more documentation on this in the main docs site soon.

@woolsweater
Copy link

This is a great change; looking forward to using it. I had already been trying to do something like this (very informally) myself 😄 :

const PRECEDENCE = {
    //...
    // Prefer `type_identifier` after the colon in '{ ( identifier : identifier'
    type_over_primary: 1,
    // Prefer `parameter_list` over `void_type` in `func f(_: () -> Foo)`
    param_over_void: 1,
    // Prefer expression _patterns_ over bare expressions
    pattern_over_expr: 1,
    // ...
}

@mingodad
Copy link

Also need to update the file ´cli/src/generate/grammar-schema.json´ to include the precedences key.

@kammoh
Copy link

kammoh commented Oct 14, 2021

This is a great feature and should be included in the documentations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants