[question] Is this a bug? I can't seem to skip over an optional block #1665

Claudenw · 2022-02-24T15:02:46Z

Claudenw
Feb 24, 2022

I am not certain how to describe this issue but I will give it a try. I am attempting to create a Cassandra CQL parser. One of the simple statements is DROP USER [IF EXISTS] [KEYSPACE . ] NAME I have included a minimal grammar.js below.

When I try to parse "DROP USER IF EXISTS boone" the parser fails.

If I comment out the optional( seq( keyspace, dot)), from the drop_user rule it works fine.
If I change the query to "DROP USER IF EXISTS keyspace.boone" (as does "DROP USER IF EXISTS keyspace . boone")

What is the proper technique to get the parser to skip the keyspace segment if it is not present?

As a note, the keywords in the language are not case sensitive thus the kw() function to create non case sensitive regex.

const

    squote = "'",
    dot = ".",

    if_exists = seq(kw( "IF"),kw("EXISTS")),

    name_chars  = /[a-zA-Z][A-Za-z0-9_$]+/,
    qname  = choice( name_chars, seq(squote, name_chars, squote)),

    user = field( "user", qname),
    keyspace = field( "keyspace",  qname )


module.exports = grammar({
    name: 'cql',

    conflicts: ($, original) => original.concat([
    ]),
    rules: {
        source_file: $ => repeat($._statement),

        _statement: $ =>
            seq(
                choice(
                    $.drop_user,
                ),
                optional(";"),
            ),

        keyspace : $ => token( keyspace ),

        drop_user : $ =>
            seq(
                kw( "DROP"),
                kw("USER"),
                optional( if_exists ),
                optional( seq( keyspace, dot)),
                $.user,
            ),
        user : $ => token( user ),
    },
});

function kw(keyword) {
    if (keyword.toUpperCase() != keyword) {
        throw new Error(`Expected upper case keyword got ${keyword}`);
    }
    return alias(new RegExp( keyword, "i" ), keyword);
}

Answered by maxbrunsfeld

Feb 25, 2022

You have user as its own distinct token in your grammar:

        user : $ => token( user ),

But that means that you're asking the lexer to distinguish between a user token and any other usage of qname. The lexer doesn't have enough information to do that: it's just matching regular expressions. When two different tokens both match the same sequence of characters, the lexer has very limited options for how to decide between the two: see the documentation about conflicting tokens: https://tree-sitter.github.io/tree-sitter/creating-parsers#conflicting-tokens.

In your case, the lexer is probably picking the keyspace rule semi-arbitrarily, because it is listed first in the grammar.

You should …

View full answer

maxbrunsfeld · 2022-02-25T01:04:59Z

maxbrunsfeld
Feb 25, 2022
Maintainer

You have user as its own distinct token in your grammar:

        user : $ => token( user ),

But that means that you're asking the lexer to distinguish between a user token and any other usage of qname. The lexer doesn't have enough information to do that: it's just matching regular expressions. When two different tokens both match the same sequence of characters, the lexer has very limited options for how to decide between the two: see the documentation about conflicting tokens: https://tree-sitter.github.io/tree-sitter/creating-parsers#conflicting-tokens.

In your case, the lexer is probably picking the keyspace rule semi-arbitrarily, because it is listed first in the grammar.

You should probably make qname an explicit rule in the grammar, and use it everywhere. That way, when matching a drop_user rule for example, the lexer only needs to return a qname, and the parser can decide whether it is a user or a keyspace by using its usual logic (i.e. looking ahead by one token and making the appropriate decision).

module.exports = grammar({
  rules: {
    // ...
    qname: $ => token(choice(name_chars, seq(squote, name_chars, squote))),
  }
});

If you want to suppress the qname from the final syntax tree, you can use an alias to make it appear with the name user or keyspace, or whatever.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] Is this a bug? I can't seem to skip over an optional block #1665

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

[question] Is this a bug? I can't seem to skip over an optional block #1665

Claudenw Feb 24, 2022

Replies: 1 comment

maxbrunsfeld Feb 25, 2022 Maintainer

Claudenw
Feb 24, 2022

maxbrunsfeld
Feb 25, 2022
Maintainer