Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement states for conditional language features #244

Closed
wsnyder opened this issue Dec 1, 2018 · 4 comments
Closed

How to implement states for conditional language features #244

wsnyder opened this issue Dec 1, 2018 · 4 comments
Labels

Comments

@wsnyder
Copy link

wsnyder commented Dec 1, 2018

SystemVerilog has the following:

  `begin_keywords "1364-2001"
  bit foo;  // bit is a module name, foo a cell name
  reg bit;  // bit is an identifier
  `begin_keywords "1800-2017"
  bit foo;  // bit is a data type, foo an identifier
  reg bit;  // syntax error

That is the grammar changes based on the current language revision, and the parsing itself can change the revision.

How can this be expressed in a grammar, e.g. suppress certain rules based on runtime state?

In flex this was handled with different states, e.g.

  <S13642001,S18002017>{
    "reg" { return REG; }
  }
  <S18002017>{
    "bit" { return BIT; }
  }

This seems similar to the documented support for multi-language-documents, but I prefer not to maintain (nor preprocess to create) a grammar for each language version.

Thanks

@maxbrunsfeld
Copy link
Contributor

In general, our approach has been to handle things like this in an approximate way, by handling a superset of the actual language, and trying to parse ambiguous constructs in the most way that's most likely to be correct.

For example, in C it's technically impossible to parse a single source file by itself, without looking at all the other source files that are brought in via #include, and knowing what -D flags were passed to the compiler. Certain constructs are ambiguous without this information. So tree-sitter-c has to do the best job it can at parsing, given its constraints.

Similarly, tree-sitter-python needs to be able to handle arbitrary source files without knowing whether they're Python 2 or Python 3, so we try to handle the union of the two language versions.

I don't know anything about SystemVerilog, but this may be the most practical approach for you - design the parser so that it handles reg bit without error (in case keywords 1800-2017 is in use), and parses bit foo as something reasonable (for the purposes of syntax highlighting and in-editor code analysis) but not 100% correct.

You also could use an external scanner for parsing reg and bit, and begin_keywords directives. Then you could carry arbitrary state about which language revision is active. Are begin_keywords always local to a given source file? If so, this approach might work.

@wsnyder
Copy link
Author

wsnyder commented Dec 2, 2018

Thanks. Let's say I carry around which revision is active (i.e.. a state). How do I use that to enable/disable rules in the grammar.json code?

@ahlinc
Copy link
Contributor

ahlinc commented Aug 18, 2023

It's totally late answer, I wasn't here at the time.

How do I use that to enable/disable rules in the grammar.json code?

You can introduce several external scanner tokens that external scanner would emit depending on the state like _mod2001, _mod2017 and write your rules in a way:

  externals: $ => [
    // mode capturing externals
    $.mod2001,
    $.mod2017,
    // mode enforcing externals
    $._mod2001,
    $._mod2017,
  ],

  rules: {
    _statements: $ => choice(
      $._mod_selection,
      seq($._mod2001, $._statements_2001),
      seq($._mod2017, $._statements_2017),
    ),

    _statements_2001: $ => choice(
      // ... repeat here common statements + 1364-2001 specific
    ),

    _statements_2017: $ => choice(
      // ... repeat here common statements + 1800-2017 specific
    ),

    _mod_selection: $ => seq("`begin_keywords",
      '"',
      choice(
        alias($.mod2001, "1364-2001"),
        alias($.mod2017, "1800-2017"),
      ),
      '"'
    ),
  }

The mode enforcing externals are special invisible zero sized tokens that would have no impact on position of other tokens in a resulting parsing tree.

@ahlinc
Copy link
Contributor

ahlinc commented Aug 18, 2023

I'm closing this because it's very old question, feel free to re-open it or move the discussion into the Discussions tab.

@ahlinc ahlinc closed this as completed Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants