-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implicit / missing details about label selection #29
Comments
I also agree that labels in general deserve being documented deeper because it's a powerful feature of Polygen which I struggled to master due to the lack of examples in the docs — I never realized you could do some of the things in the examples you've brought forth. Adding more and better examples could reduce the learning curve significantly. Although formal definitions of PML features are important, IMO they are easier to grasp through examples — i.e. examples allow the reader to make sense of the rules, much more than the other way around. I suggest that if we amend the PML concrete syntax in §5.1 then we should also bump up the MINOR version number in the next release (i.e. from |
Labels are an advanced features: most Polygen authors in the past struggled to understand labels and to use them proficently. A brief historical digression: I have never been fully satisfied with the label system: the main feature of Polygen 2.0 (which has been under development for years - lazy me :D) is indeed a totally redesigned label system -- together with the import primitive, for enabling libraries of non-terminal symbols. The new label system though has a major disadvantage - which is also a major reason for not having released them yet: the new label system is syntactically not compatible with the current one. As far as @tajmone discovery regarding the supposedly wrong parsing rules, that is a wanted feature. I know that's unsound -- and that's partly why I have always felt the label system needed an improvement -- but that way selection becomes powerful, albeit hardly predictable. At the time being, however, I'm not sure the grammar fix @tajmone suggests is totally desirable. One easy way to find out whether it is or not is to apply the fix and compile: if the parser semantic actions (the code annotated on the right for each production) still compiles, then the grammar fix is compatible with the current AST data structures, meaning that the program will work. |
Just a clarification: it's @RBastianini who spotted the problem and suggested the fix (merits due where they're due). Also, I've never quite grasped the full extent of labels usage (although I did use them in some simple way) so I'm not sure about their concrete syntax. But @RBastianini fix proposal (
Mhh... this seems a rather complex way to approach it (especially with all the trouble compiling OCaml code under Win OS). Also, I think that the case at hand here has more to do with Polygen's BNF meta-grammar, as presented to the reader as a clarifying tool where he/she might double check the learned notions. This BNF doesn't necessary match an actual grammar used by a parser generator (which might have to handle some subtleties which are irrelevant in this context). Externalizing Examples' Code and Result via Live Polygen ExecutionsAnother approach I actually though of, although it tackles the problem from a slightly different angle, would be to externalize all of the examples code and productions into real sources and transcripts, execute them via Polygen, and then include them selectively in the source documents. Lately, I've actually been using a similar approach for a project documenting a text adventures (IF) Library, were I was facing problems with the constant updates to the library code, as well as the IF language itself, which would make examples and their output obsolete from time to time. What I did was to move all code examples in real text adventures sources, compile them and run them against automated command scripts and capture the game session transcript, which are then imported into the source documents in real time. This now allows me to catch if any example is broken (due to library update) from the compile error report monitored by the build toolchain, and also ensure that game session transcript match the output of a real use-case scenario. The project in question doesn't use pandoc but Asciidoctor, which makes the whole process simpler thanks to selective text inclusion via the I did tinker with the idea of switching to this system at some point in the future, even if it would mean switching to Asciidoctor, but have refrained due to some considerations:
Anyhow, this was an idea that has been on my mind for a while, and wanted to share, so I just grabbed this opportunity to expose it to you. Keep it in mind, in case it might be helpful in the future (for the PML Spec or any other documents that might be added, e.g. tutorials, etc.). |
Yes you are correct, I was actually referring to the concrete and abstract notation paragraphs of the documentation. There were a few changes that I suggested in addition to the one you mention ( About externalising the code examples in the documentation to use the Polygen parser, I believe that it's actually a great idea, but since as you noted, using Polygen to generate the example productions would be impractical, it could just be used to check the correctness of the example sources, so I don't know whether this alone would justify switching to a different documentation tool... |
@RBastianini, I'm trying to figure out the fixes to the syntax you propose by studying the current EBNF grammars in App. 5 (§5.1 and §5.2). I tend to lose myself when following all these entanglements of all the possible ways that grammar definitions might branch out ... I'm still not quite sure whether these proposed fixes are already implicit in the current EBNF grammars, or not. I really need to find a few hours hole in which I can look into this with due time, a relaxed mind and no distractions — but these are surely improvements that will have to go in a future update. Some suggestions might be syntactically correct but bring little benefit to the reader — e.g. the Editing "§5.5 Translation rules" would require careful consideration and extreme caution, for that section is rather complex (not to mention how many times I had to double check it to ensure that the correct styles and colors were being reproduced). Externalizing Examples' Code...
The question of whether it's worth (or justified) to switch to Asciidoctor is a bit complex for various reasons. The main problem is that currently Asciidoctor only supports inclusion of external code from UTF-8 encoded files, which means that Polygen sources (and output) would have to be first converted from ISO-8859-1 to UTF-8 via tools like Another problem is the need to be able to include selective snippets from a source file, because many of the examples do not define the Asciidoctor allows to mark regions of text to be imported by using comments lines to set tags to mark the beginning and end of each region, which makes it very easy to pack multiple examples from the Spec into a single large source, and then extract single non-terminal definitions as required. With pandoc, on the other hand, we need to rely on PP for a similar feature, and PP does provide a native So, on the one hand I'm tempted to switch from pandoc to Asciidoctor, but I would definitely wait for the next Asciidoctor release, which will allow including contents for non-UTF-8 files. As for externalizing examples in order to check their correctness using Polygen, I think it would be a worth effort only if:
Surely, both the switch to Asciidoctor and the externalization of code are (and will still be) on my mind, I'm just waiting to see how things evolve. As a general rule, I tend to use always Asciidoctor for my documentation projects, but for this repo I picked pandoc for various reasons: simplicity of toolchain setup, ease of extending the limits of pandoc via PP, higher control over documents templates (compared to Asciidoctor), and because it would be easier to create a GitHub Pages website in the future (i.e. adding navigation menus to the documents via templates). Weighing the pros and cons of pandoc vs Asciidoctor is not easy, they are both exceptional tools which shine in their own right for the tasks they were designed for. |
In both the English and Italian version of the PML documentation, there are some missing / implicit details regarding a few behaviours around label selection. They might not be of importance, but since I was surprised when I discovered them, I thought they were worth bringing up to your attention.
A consequence of some of these findings is that the formal language definition (§5.1) is incorrect / incomplete.
Label group concatenation
In §2.5.2 is shown how multiple label selection groups can be concatenated to reduce the verbosity of the grammar definition, such as in
S ::= Conjug.(S|P).(sp|pp) ;
. However, I believe this contradicts the concrete syntax in (§5.1), where an atom is defined as follows:and thus I think the correct definition should instead be
In order to implement compatibility with this syntax in Polygen-PHP (which I wrote according to the abstract and concrete language definitions from the readme on the official site and thus, did not support it at first), I resorted to adding this extra abstract-to-concrete-syntax conversion step (I think you might at most be interested in the docblock at the beginning of the file, where the conversion step is described through an example). Depending on how this was originally implemented in Polygen, it might also mean that there exists an extra undocumented conversion step, and thus that §5.5 needs amending as well. However I might be completely wrong on this part, as support for this syntax might be embedded in the parser for the concrete definition. I remember close to nothing about my OCaml days at the university, so I couldn't figure this out on my own.
Single label concatenation
Although it might be somewhat considered implicit in §2.5.2, multiple label selections can not only be concatenated when in groups (
S ::= Something.(l1|l2).(l3|l4);
), but also when taken singularly (S ::= Something.l1.l2.l3.l4;
). Both declarations are correctly accepted by Polygen and influence label selection as expected. I believe that this means that the concrete syntax forATOM
is again partially incorrect and should be amended as follows:Dot concatenation
For completeness sake, although not particularly useful, it is possible to concatenate multiple dots:
S ::= Something.....;
. This does not change the behaviour of the dot operator, and would still result in the same output no matter if one or one thousand dots were employed.Again, this require updating the
ATOM
concrete definition:Resetting on non-(non-terminals)
Another oddity I discovered, is that label reset token can not only be employed on non-terminals, but also on terminals, although it does not seem to affect the selection in any way. In order to prove this for Polygen-PHP I wrote this test where the following grammar demonstrating this property can be found.
This grammar is correctly parsed by Polygen, but can only produce one result, which is
and b and
.To be fair, although I don't think this is explicitly stated in the documentation, it is correctly represented by the
ATOM
definition (which also tells us that we can use the selection reset token on^
,\
and_
, which similarly does not affect the the label scope).Selecting on non-(non-terminals)
Everything in the previous section also applies when using labels (as correctly reported by the
TERM
definition). So bothS ::= Something and.also.this;
andS ::= Something and.(this|that);
are acceptable declarations, but the selected labels don't affect the generation in neither of the examples.Mixing labels, groups and dots
Another interesting discovery I made, is that dot labels, label groups and label reset tokens can be mixed when selecting, like in the following declaration:
S ::= Something.and..(notice|the|double|dots).before.the.round.braces.(and|at|the).end..;
. This once again means that theATOM
definition is incorrect. I'm unsure about how to fix this, but I believe this could work:Label precedence
A consequence of the previous discovery, is that we can mix together label selection and label reset, so I got curious about the precedence of these label operations during parsing.
Turns out that in Polygen, label operations are processed from right to left, as demonstrated by this test) in Polygen-PHP.
The test uses the following grammar
that can only produce the following output:
a aa b bb
, basically ignoring every other selection that comes to the right of the label reset token.The text was updated successfully, but these errors were encountered: