Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniform attributation: skip vs. fragment vs. hidden #27

Closed
danieldietrich opened this issue Sep 13, 2014 · 3 comments
Closed

Uniform attributation: skip vs. fragment vs. hidden #27

danieldietrich opened this issue Sep 13, 2014 · 3 comments

Comments

@danieldietrich
Copy link
Contributor

The Antlr parser has two phases: lexing and parsing.

  • The lexer may skip characters the parser does not need to see, e.g. whitespace, comments, etc.
  • The parser hides so called fragment rules from the parse tree.

We see that Antlr distinguishes the fact of hiding something from the result in a technical way. Parts of the grammar are attributed in different ways because the author of the grammar implicitly knows how the Altr framework works.

The Jslp (Javaslang Parser) has only one phase which combines lexing and parsing. The author of Jslp grammars should be able to attribute parts of the grammar in a uniform way. E.g. the Antlr lets us declare associativity of operators as <assoc=right> and <assoc=left>. Additionally it attributes rules as prefix fragment and it lets us declare (lexer?) rule alternatives as -> skip. That are three different ways to attribute something, which is too diverse, imo.

Therefore I suggest to simplify attributation, e.g. like this

rule<hidden> : alternative1
             | alternative2<hidden>
             | ( subrule1 | subrule2 )<hidden>
             | INT op<assoc=right> INT
             | ( '/*' ~'*/'* '*/' )<combined, hidden> // same as <combined=true, hidden=true>
             ;
WS : [ \t\r\n]+<hidden> ; // same as WS<combined, hidden> : [ \t\r\n]+ ;

Attributes are technically <key=value> pairs and semantically properties of the attributed element. In the case of a boolean property, value may be omitted if it is true, i.e. <hidden=true> is the same as <hidden>.

Perhaps it is better for readability to add the rule attributes after ;, like this:

rule : alternative1
     | alternative2<hidden>
     | ( subrule1 | subrule2 )<hidden>
     | INT op<assoc=right> INT
     | ( '/*' ~'*/'* '*/' )<combined, hidden> // same as <combined=true, hidden=true>
     ;<hidden>
WS : [ \t\r\n]+<hidden> ; // same as WS : [ \t\r\n]+ ;<combined, hidden>

But on the other hand, Java's annotations are prefixed, so we may also do the same here:

<hidden>
rule : alternative1
     | alternative2<hidden>
     | ( subrule1 | subrule2 )<hidden>
     | INT op<assoc=right> INT
     | ( '/*' ~'*/'* '*/' )<combined, hidden> // same as <combined=true, hidden=true>
     ;
WS : [ \t\r\n]+<hidden> ;
// same as:
// <combined, hidden>
// WS : [ \t\r\n]+ ;
@danieldietrich danieldietrich self-assigned this Sep 13, 2014
@danieldietrich danieldietrich added this to the 1.1.0 M1 milestone Sep 23, 2014
@danieldietrich danieldietrich removed their assignment Sep 25, 2014
@danieldietrich danieldietrich modified the milestones: 1.1.0 M5 Additional Parser Features, 1.1.0 M1 - Parser Core Sep 25, 2014
@danieldietrich
Copy link
Contributor Author

Consider adding attribution with annotations. It is more consistent to use a single syntactic element to augment grammar rules with meta information for processing parser input.

  1. At rule definitions:
@skip
* : WS | COMMENT

rule : ID ':' rulePart ( '|' rulePart )*

@fragment
rulePart : ruleRef | sequence | ...

Notes:

  • Annotations are denoted in front of an element
  • @skip is the same as @skip=true
  1. At rule parts:
* : @skip WS | COMMENT // skips whitespace

* : @skip ( WS | COMMENT ) // skips whitespace and comments

Example:

* : @skip [ \n\r\t]+                                   // whitespace
  | '//' @name=text !( EOL | EOF ) @skip ( EOL | EOF ) // single-line comment
  | '/*' @name=text !'*/' @skip '*/'                   // multi-line comment

For better readability braces may be used:

* : @skip [ \n\r\t]+                                           // whitespace
  | '//' ( @name=text !( EOL | EOF ) ) ( @skip ( EOL | EOF ) ) // single-line comment
  | '/*' ( @name=text !'*/' ) ( @skip '*/' )                   // multi-line comment

Tip: Use fragments when more attributes are needed:

* : @skip [ \n\r\t]+                         // whitespace
  | '//' ( @name=text !End ) ( @skip End )   // single-line comment
  | '/*' ( @name=text !'*/' ) ( @skip '*/' ) // multi-line comment

@fragment
@attribute1=value1
@attribute2=value2
@attribute3=value3
End : EOL | EOF

@danieldietrich danieldietrich changed the title [parser] Uniform attributation: skip vs. fragment vs. hidden Uniform attributation: skip vs. fragment vs. hidden Oct 3, 2014
@danieldietrich
Copy link
Contributor Author

Another idea is to write

  • -rule instead of @fragment rule (semantic: produces (a b c) instead of (a (rule b c)))
  • -rulePart instead of @skip rulePart (semantic: omits a node (and its children) completely)

because it is not so much noise.

@danieldietrich danieldietrich modified the milestones: 1.1.0 M2 Additional Parser Features, BACKLOG Oct 5, 2014
@danieldietrich danieldietrich removed this from the 1.2.0 M3 Extending the Parser milestone Apr 29, 2015
@danieldietrich danieldietrich added this to the ?.?.? Parser milestone May 16, 2015
@danieldietrich
Copy link
Contributor Author

moved to javaslang-parser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant