-
-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethink whitespace handling #58
Comments
Whitespace has to be handled in sequences and rule references. Quantifiers are out of scope here, because quantified lexical rules have not to be treated special regarding whitespace and quantified rule references are already handled (see above). Current problem: For correct whitespace handling in sequences it is needed to know, if the current rulePart of the sequence is lexical or not. If the rulePart is a subrule, this is not clear before parsing the subrule. Solution: We need to return whitespace but not to combine the ruleParts of the sequence (see 2nd case above). |
TODO: Fix |
In general we talk about lexical tokens, which are atomic units of text / sequences of characters. A parser parses a stream of tokens. The Javaslang parser determines lexical tokens while parsing, i.e. there is only one parse phase. This is in contrast to Antlr, which has two phases - a lexing phase where tokens are gathered and a parsing phase where the sequence of known tokens is determined. So the overall goal here is to determine, if the Javaslang parser has to skip whitespace and if tokens have to be combined, while reading the input stream char by char. The underlying data structure of the parser is a tree. Because it is a descend parser, the children of a node are parsed first. Therefore the node has to decide |
SyntaxThe core grammar syntax describes a formal language targeted to parsers of finite character sequences. Character GroupsThese are the atomic building parts to match groups of characters:
Single character matchers (Any, Range and Charset) and
The following rules hold:
The operator precedence of RulesWe distinguish between parser rules and lexer rules, which have the same syntax:
where
In the following, the difference between lexer and parser rules is described. Lexer rules
Parser rules
Lexical Token Definitions within Parser RulesA parser rule part
Purely lexical parse results are combined to a token. |
Deducing the parser design from the grammar description above:
|
I think it is necessary to allow lexical token definitions within parser rules (see definition above of pure lexical parser rules). This allows us to write Note: Because a literal |
See also #23.
Rule names consisting of one char:
a
matches 1.A
matches 3.The text was updated successfully, but these errors were encountered: