Lexer confusion about operators #3066

ericmorand · 2019-06-17T12:12:08Z

Consider the following template:

{{in}}

When lexed, here is what is returned:

VAR_START_TYPE()
NAME_TYPE(in)
VAR_END_TYPE()
EOF_TYPE()

Now consider the following one:

{{in }}

When lexed, here is what is returned:

VAR_START_TYPE()
OPERATOR_TYPE(in)
VAR_END_TYPE()
EOF_TYPE()

As you can see, in the latter case, ìn is recognized as an operator, while in the former it is a name. The lexer is not able to distinguish an operator from a variable name. It is confused by formatting characters (in the second template, the before the }}) that are not supposed to be relevant inside blocks:

{{ foo.bar }} is lexically identical to {{foo.bar}} in Twig, like {% foo %} and {%foo%}.

More generally, the lexer is not very robust when it comes to operators. It is not predictable when the lexer will find an operator token or a name token:

{% for in in in %} is tokenized into:

BLOCK_START_TYPE()
NAME_TYPE(for)
OPERATOR_TYPE(in)
OPERATOR_TYPE(in)
OPERATOR_TYPE(in)
BLOCK_END_TYPE()
EOF_TYPE()

While the first and last in actually are variable names.

{{ in.in }} is tokenized into:

VAR_START_TYPE()
NAME_TYPE(in)
PUNCTUATION_TYPE(.)
OPERATOR_TYPE(in)
VAR_END_TYPE()
EOF_TYPE()

While the lexically identical template {{in.in}} is tokenized into:

VAR_START_TYPE()
NAME_TYPE(in)
PUNCTUATION_TYPE(.)
NAME_TYPE(in)
VAR_END_TYPE()
EOF_TYPE()

I can't find the official lexical specs of the language - I assume it is an internal document at Symfony's, thus I can't be sure that this is the expected behavior. But from an external point of view, this makes the lexer not very robust and not quite what is expected from a syntactic analyzer tool.

The text was updated successfully, but these errors were encountered:

stof · 2019-06-17T12:59:30Z

I can't find the official lexical specs of the language - I assume it is an internal document at Symfony's, thus I can't be sure that this is the expected behavior.

I don't think there is such a document actually.

ericmorand · 2019-06-17T13:08:57Z

Well, I always considered TwigPHP sources as the de-facto specifications, so that's fine. But the PHP lexer is quite confusing and in my course of providing a standalone nodejs lexer for Twig, I first thought that my implementation was faulty.

I'll stick with my mantra: TwigPHP sources are the reference implemetation of the specs. But I think something should be improved there in the future.

fabpot · 2019-08-05T16:27:44Z

Let's close as I won't have time to actually change this (especially as it works as is).

ericmorand · 2019-08-06T07:51:15Z

Actually, it's not a real issue with TwigPHP as long as the parser knows what to do with the lexed tokens - and it does.

It's more an issue for people that want to use the lexer to provide other implementations, linting and code analysis: they have to write a lexer that gives different results than the reference implementation.

For example, @PolyPik, who is working on twig.js, have some concerns with the TypeScript lexer we wrote because it does not match your reference lexer:

NightlyCommit/twig-lexer#10
NightlyCommit/twig-lexer#12

The lexer works quite well, is lossless and removes a few confusion (but not the one we are talking about here unfortunately) but his concerns remain valid: he'd like to write a parser that handles the official tokens instead of the arbitrary ones of twig-lexer. But the absence of specs there can't guarantee anything and we had to make some choice that we are not sure are what you would do would you rewrite your lexer in the future.

I see that Twig 3 is in the work. Is there something new about the lexer? Can we help (by we I mean the community and the nodejs one mainly because this is where Twig support is very active recently) on establishing some specs or something?

fabpot closed this as completed Aug 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lexer confusion about operators #3066

Lexer confusion about operators #3066

ericmorand commented Jun 17, 2019 •

edited

Loading

stof commented Jun 17, 2019

ericmorand commented Jun 17, 2019

fabpot commented Aug 5, 2019

ericmorand commented Aug 6, 2019

Lexer confusion about operators #3066

Lexer confusion about operators #3066

Comments

ericmorand commented Jun 17, 2019 • edited Loading

stof commented Jun 17, 2019

ericmorand commented Jun 17, 2019

fabpot commented Aug 5, 2019

ericmorand commented Aug 6, 2019

ericmorand commented Jun 17, 2019 •

edited

Loading