Uniform attributation: skip vs. fragment vs. hidden #27

danieldietrich · 2014-09-13T11:46:50Z

The Antlr parser has two phases: lexing and parsing.

The lexer may skip characters the parser does not need to see, e.g. whitespace, comments, etc.
The parser hides so called fragment rules from the parse tree.

We see that Antlr distinguishes the fact of hiding something from the result in a technical way. Parts of the grammar are attributed in different ways because the author of the grammar implicitly knows how the Altr framework works.

The Jslp (Javaslang Parser) has only one phase which combines lexing and parsing. The author of Jslp grammars should be able to attribute parts of the grammar in a uniform way. E.g. the Antlr lets us declare associativity of operators as <assoc=right> and <assoc=left>. Additionally it attributes rules as prefix fragment and it lets us declare (lexer?) rule alternatives as -> skip. That are three different ways to attribute something, which is too diverse, imo.

Therefore I suggest to simplify attributation, e.g. like this

rule<hidden> : alternative1
             | alternative2<hidden>
             | ( subrule1 | subrule2 )<hidden>
             | INT op<assoc=right> INT
             | ( '/*' ~'*/'* '*/' )<combined, hidden> // same as <combined=true, hidden=true>
             ;
WS : [ \t\r\n]+<hidden> ; // same as WS<combined, hidden> : [ \t\r\n]+ ;

Attributes are technically <key=value> pairs and semantically properties of the attributed element. In the case of a boolean property, value may be omitted if it is true, i.e. <hidden=true> is the same as <hidden>.

Perhaps it is better for readability to add the rule attributes after ;, like this:

rule : alternative1
     | alternative2<hidden>
     | ( subrule1 | subrule2 )<hidden>
     | INT op<assoc=right> INT
     | ( '/*' ~'*/'* '*/' )<combined, hidden> // same as <combined=true, hidden=true>
     ;<hidden>
WS : [ \t\r\n]+<hidden> ; // same as WS : [ \t\r\n]+ ;<combined, hidden>

But on the other hand, Java's annotations are prefixed, so we may also do the same here:

<hidden>
rule : alternative1
     | alternative2<hidden>
     | ( subrule1 | subrule2 )<hidden>
     | INT op<assoc=right> INT
     | ( '/*' ~'*/'* '*/' )<combined, hidden> // same as <combined=true, hidden=true>
     ;
WS : [ \t\r\n]+<hidden> ;
// same as:
// <combined, hidden>
// WS : [ \t\r\n]+ ;

The text was updated successfully, but these errors were encountered:

danieldietrich · 2014-10-03T08:41:17Z

Consider adding attribution with annotations. It is more consistent to use a single syntactic element to augment grammar rules with meta information for processing parser input.

At rule definitions:

@skip
* : WS | COMMENT

rule : ID ':' rulePart ( '|' rulePart )*

@fragment
rulePart : ruleRef | sequence | ...

Notes:

Annotations are denoted in front of an element
@skip is the same as @skip=true

At rule parts:

* : @skip WS | COMMENT // skips whitespace

* : @skip ( WS | COMMENT ) // skips whitespace and comments

Example:

* : @skip [ \n\r\t]+                                   // whitespace
  | '//' @name=text !( EOL | EOF ) @skip ( EOL | EOF ) // single-line comment
  | '/*' @name=text !'*/' @skip '*/'                   // multi-line comment

For better readability braces may be used:

* : @skip [ \n\r\t]+                                           // whitespace
  | '//' ( @name=text !( EOL | EOF ) ) ( @skip ( EOL | EOF ) ) // single-line comment
  | '/*' ( @name=text !'*/' ) ( @skip '*/' )                   // multi-line comment

Tip: Use fragments when more attributes are needed:

* : @skip [ \n\r\t]+                         // whitespace
  | '//' ( @name=text !End ) ( @skip End )   // single-line comment
  | '/*' ( @name=text !'*/' ) ( @skip '*/' ) // multi-line comment

@fragment
@attribute1=value1
@attribute2=value2
@attribute3=value3
End : EOL | EOF

danieldietrich · 2014-10-03T16:03:46Z

Another idea is to write

-rule instead of @fragment rule (semantic: produces (a b c) instead of (a (rule b c)))
-rulePart instead of @skip rulePart (semantic: omits a node (and its children) completely)

because it is not so much noise.

danieldietrich · 2016-10-23T21:14:14Z

moved to javaslang-parser

danieldietrich added the enhancement label Sep 13, 2014

danieldietrich self-assigned this Sep 13, 2014

danieldietrich mentioned this issue Sep 14, 2014

Fix whitespace handling #23

Closed

danieldietrich added this to the 1.1.0 M1 milestone Sep 23, 2014

danieldietrich removed their assignment Sep 25, 2014

danieldietrich modified the milestones: 1.1.0 M5 Additional Parser Features, 1.1.0 M1 - Parser Core Sep 25, 2014

danieldietrich added the [parser] label Oct 3, 2014

danieldietrich changed the title ~~[parser] Uniform attributation: skip vs. fragment vs. hidden~~ Uniform attributation: skip vs. fragment vs. hidden Oct 3, 2014

danieldietrich mentioned this issue Oct 4, 2014

Let the grammar smile #60

Closed

danieldietrich modified the milestones: 1.1.0 M2 Additional Parser Features, BACKLOG Oct 5, 2014

danieldietrich removed enhancement labels Apr 26, 2015

danieldietrich removed this from the 1.2.0 M3 Extending the Parser milestone Apr 29, 2015

danieldietrich added the design/refactoring/improvement label May 1, 2015

danieldietrich added this to the ?.?.? Parser milestone May 16, 2015

danieldietrich closed this as completed Oct 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniform attributation: skip vs. fragment vs. hidden #27

Uniform attributation: skip vs. fragment vs. hidden #27

danieldietrich commented Sep 13, 2014

danieldietrich commented Oct 3, 2014

danieldietrich commented Oct 3, 2014

danieldietrich commented Oct 23, 2016

Uniform attributation: skip vs. fragment vs. hidden #27

Uniform attributation: skip vs. fragment vs. hidden #27

Comments

danieldietrich commented Sep 13, 2014

danieldietrich commented Oct 3, 2014

danieldietrich commented Oct 3, 2014

danieldietrich commented Oct 23, 2016