[question] Is this a bug? I can't seem to skip over an optional block #1665
-
I am not certain how to describe this issue but I will give it a try. I am attempting to create a Cassandra CQL parser. One of the simple statements is When I try to parse "DROP USER IF EXISTS boone" the parser fails.
What is the proper technique to get the parser to skip the keyspace segment if it is not present? As a note, the keywords in the language are not case sensitive thus the const
squote = "'",
dot = ".",
if_exists = seq(kw( "IF"),kw("EXISTS")),
name_chars = /[a-zA-Z][A-Za-z0-9_$]+/,
qname = choice( name_chars, seq(squote, name_chars, squote)),
user = field( "user", qname),
keyspace = field( "keyspace", qname )
module.exports = grammar({
name: 'cql',
conflicts: ($, original) => original.concat([
]),
rules: {
source_file: $ => repeat($._statement),
_statement: $ =>
seq(
choice(
$.drop_user,
),
optional(";"),
),
keyspace : $ => token( keyspace ),
drop_user : $ =>
seq(
kw( "DROP"),
kw("USER"),
optional( if_exists ),
optional( seq( keyspace, dot)),
$.user,
),
user : $ => token( user ),
},
});
function kw(keyword) {
if (keyword.toUpperCase() != keyword) {
throw new Error(`Expected upper case keyword got ${keyword}`);
}
return alias(new RegExp( keyword, "i" ), keyword);
} |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
You have user : $ => token( user ), But that means that you're asking the lexer to distinguish between a In your case, the lexer is probably picking the keyspace rule semi-arbitrarily, because it is listed first in the grammar. You should probably make module.exports = grammar({
rules: {
// ...
qname: $ => token(choice(name_chars, seq(squote, name_chars, squote))),
}
}); If you want to suppress the |
Beta Was this translation helpful? Give feedback.
You have
user
as its own distinct token in your grammar:But that means that you're asking the lexer to distinguish between a
user
token and any other usage ofqname
. The lexer doesn't have enough information to do that: it's just matching regular expressions. When two different tokens both match the same sequence of characters, the lexer has very limited options for how to decide between the two: see the documentation about conflicting tokens: https://tree-sitter.github.io/tree-sitter/creating-parsers#conflicting-tokens.In your case, the lexer is probably picking the keyspace rule semi-arbitrarily, because it is listed first in the grammar.
You should …