Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexer confusion about operators #3066

Closed
ericmorand opened this issue Jun 17, 2019 · 4 comments
Closed

Lexer confusion about operators #3066

ericmorand opened this issue Jun 17, 2019 · 4 comments

Comments

@ericmorand
Copy link
Contributor

ericmorand commented Jun 17, 2019

Consider the following template:

{{in}}

When lexed, here is what is returned:

VAR_START_TYPE()
NAME_TYPE(in)
VAR_END_TYPE()
EOF_TYPE()

Now consider the following one:

{{in }}

When lexed, here is what is returned:

VAR_START_TYPE()
OPERATOR_TYPE(in)
VAR_END_TYPE()
EOF_TYPE()

As you can see, in the latter case, ìn is recognized as an operator, while in the former it is a name. The lexer is not able to distinguish an operator from a variable name. It is confused by formatting characters (in the second template, the before the }}) that are not supposed to be relevant inside blocks:

{{ foo.bar }} is lexically identical to {{foo.bar}} in Twig, like {% foo %} and {%foo%}.

More generally, the lexer is not very robust when it comes to operators. It is not predictable when the lexer will find an operator token or a name token:

{% for in in in %} is tokenized into:

BLOCK_START_TYPE()
NAME_TYPE(for)
OPERATOR_TYPE(in)
OPERATOR_TYPE(in)
OPERATOR_TYPE(in)
BLOCK_END_TYPE()
EOF_TYPE()

While the first and last in actually are variable names.

{{ in.in }} is tokenized into:

VAR_START_TYPE()
NAME_TYPE(in)
PUNCTUATION_TYPE(.)
OPERATOR_TYPE(in)
VAR_END_TYPE()
EOF_TYPE()

While the lexically identical template {{in.in}} is tokenized into:

VAR_START_TYPE()
NAME_TYPE(in)
PUNCTUATION_TYPE(.)
NAME_TYPE(in)
VAR_END_TYPE()
EOF_TYPE()

I can't find the official lexical specs of the language - I assume it is an internal document at Symfony's, thus I can't be sure that this is the expected behavior. But from an external point of view, this makes the lexer not very robust and not quite what is expected from a syntactic analyzer tool.

@stof
Copy link
Member

stof commented Jun 17, 2019

I can't find the official lexical specs of the language - I assume it is an internal document at Symfony's, thus I can't be sure that this is the expected behavior.

I don't think there is such a document actually.

@ericmorand
Copy link
Contributor Author

Well, I always considered TwigPHP sources as the de-facto specifications, so that's fine. But the PHP lexer is quite confusing and in my course of providing a standalone nodejs lexer for Twig, I first thought that my implementation was faulty.

I'll stick with my mantra: TwigPHP sources are the reference implemetation of the specs. But I think something should be improved there in the future.

@fabpot
Copy link
Contributor

fabpot commented Aug 5, 2019

Let's close as I won't have time to actually change this (especially as it works as is).

@fabpot fabpot closed this as completed Aug 5, 2019
@ericmorand
Copy link
Contributor Author

Actually, it's not a real issue with TwigPHP as long as the parser knows what to do with the lexed tokens - and it does.

It's more an issue for people that want to use the lexer to provide other implementations, linting and code analysis: they have to write a lexer that gives different results than the reference implementation.

For example, @PolyPik, who is working on twig.js, have some concerns with the TypeScript lexer we wrote because it does not match your reference lexer:

NightlyCommit/twig-lexer#10
NightlyCommit/twig-lexer#12

The lexer works quite well, is lossless and removes a few confusion (but not the one we are talking about here unfortunately) but his concerns remain valid: he'd like to write a parser that handles the official tokens instead of the arbitrary ones of twig-lexer. But the absence of specs there can't guarantee anything and we had to make some choice that we are not sure are what you would do would you rewrite your lexer in the future.

I see that Twig 3 is in the work. Is there something new about the lexer? Can we help (by we I mean the community and the nodejs one mainly because this is where Twig support is very active recently) on establishing some specs or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants