Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Char class in comment gives m4 "ERROR: end of file in string" #553

Closed
gitamohr opened this issue Feb 17, 2023 · 17 comments
Closed

Char class in comment gives m4 "ERROR: end of file in string" #553

gitamohr opened this issue Feb 17, 2023 · 17 comments

Comments

@gitamohr
Copy link

The following input to flex 2.6.4 gives an m4 error:

%%
A { return 'A'; }
    /*
     * Bug: [[:alnum:]_]
     */
%%
> flex bug.ll
/bin/m4:stdin:1315: ERROR: end of file in string

The error disappears if I remove the underscore character from the comment, like * Bug: [[:alnum:]]

@Mightyjo
Copy link
Contributor

Mightyjo commented Feb 17, 2023 via email

@gitamohr
Copy link
Author

Well, the contents of a comment should not affect the output. But FWIW using that char class in the grammar works fine:

%%
[[:alnum:]_] { return 'Z'; }
    /*
     * Bug: [[:alnum:]]
     */
%%

(If I add the underscore back into the comment like [[:alnum:]_] the error returns.)

@Mightyjo
Copy link
Contributor

Mightyjo commented Feb 17, 2023 via email

@gitamohr
Copy link
Author

gitamohr commented Feb 17, 2023

No that's just what the file happens to be named. I renamed it to to bug.l. I'm invoking flex with no arguments other than the input file. I built flex 2.6.4 into /usr/local by ./configure && make && sudo make install. Here's a complete terminal session repro. I tried to reduce the repro as much as I could:

> cat bug.l
%%
A { return 'A'; }
    /*
     * Bug: [[:alnum:]_]
     */
%%
> flex bug.l
/bin/m4:stdin:1315: ERROR: end of file in string
> flex -V
flex 2.6.4

And just to say it, this isn't just a spurious error; flex's output is truncated and invalid. I can work around it by modifying my comments, but it seems like a bona fide bug in flex's comment handling, so I wanted to report it.

@gitamohr
Copy link
Author

Here's a slightly more reduced repro. This fails:

%%
A { return 'A'; }
    /* [[:alnum:]_] */
%%

This works:

%%
A { return 'A'; } /* [[:alnum:]_] */
%%

@gitamohr
Copy link
Author

There is something crucial about having additional characters after the [:alnum:] character class expression in the comment. Having characters before (like [_[:alnum:]]) works fine. Also the particular characters that follow don't seem to matter, I've tried whitespace, letters, digits, special chars. Also the character class expression name doesn't matter -- I've tried :alpha:, :digit:, and even :bogus: and they all repro.

@Mightyjo
Copy link
Contributor

First, sorry for saying [[:alnum:]_] was a misspelling. I was holding on to a false notion that character class names in flex included the outer square braces. Probably because of the next thing.

Second, I found the problem but I can't fix it right now. It's peculiar to comment handling, as you noticed. Flex wraps comments in its customized M4 quotes, which happen to be [[ and ]]. Because the character classes aren't being scanned and replaced in the comments, M4 is reading the braces around them as quotation marks. This is usually okay when they are balanced (i.e. [[:alnum:]]). It leads to the error you saw when they look like unbalanced quotes (i.e. [[:alnum:]_]).

Options:

  • Replace the outer square braces with parens
  • Unquote the outer square braces. This will be fiddly and possibly fragile. Looks like,
    ]][[[ ... ]]][[
  • Express the combined character class using pipe unions

Sorry the comment quoting makes this edge case complicated.

@gitamohr
Copy link
Author

No worries -- thanks for taking a look. I can easily work around this. For what it's worth, this example works in version 2.5.39, so the bug was introduced somewhere between 2.5.39 and 2.6.4.

@Mightyjo
Copy link
Contributor

Mightyjo commented Feb 28, 2023 via email

@Mightyjo
Copy link
Contributor

Mightyjo commented Mar 6, 2023

Think I found it. Mainly for my reference when writing a test & patch: we aren't escaping m4qstart and m4qend in the COMMENT_DISCARD condition the same way we are in COMMENT. I think that's the source of this. I'll write tests based on the cases above, thanks for those!

@Mightyjo
Copy link
Contributor

Mightyjo commented Mar 8, 2023

Nope, none of that worked.

@gitamohr, exactly what example did you test in 2.5.39? I'm trying to reproduce a working test from your comments above and finding no differences between 2.5.39 and HEAD.

%%
g {; } /* after action comment [[:alnum:]] /
h {; }
/
after action comment [[:alnum:]
] */
%%

Flex accepts g but dies on h.

Here's what's up:
The comment after the h action is scanned as ... I don't know what. Could be a comment, could be an action. Looks like it just gets echoed a byte at a time either way.

However! The following construction works for long comments in 2.5.39 and HEAD:

i {; } /*
* after action comment [[:alnum:]_] */

Outcomes:
I'm adding tests for multiline comments with unmatched braces to tests/quotes.l. I'll include the g and i constructions only for now.

@gitamohr
Copy link
Author

gitamohr commented Mar 8, 2023

I just tried the shortest example from above:

> cat bug.l
%%
A { return 'A'; }
    /* [[:alnum:]_] */
%%

> flex bug.l 
/bin/m4:stdin:1315: ERROR: end of file in string

> flex -V
flex 2.6.4

> /old/flex bug.l

> /old/flex -V
flex 2.5.39

cheers!

@Mightyjo
Copy link
Contributor

Mightyjo commented Mar 8, 2023

In this thread: I show myself to be an idiot. I have my trusty, old "2.5.39" folder connected to the 2.6.4 tag for some reason.

Beg your pardon. Be back with better results shortly.

@Mightyjo
Copy link
Contributor

Mightyjo commented Mar 8, 2023

Well, I'm back where we started. I see the issue, but I can't fix it for a while.

> cat bug.l %% A { return 'A'; } /* [[:alnum:]_] */ %%

Flex sees the comment after A's action as a "CODE_COMMENT". Those aren't m4 quoted the same way as other comments because quoting them cause other problems. Until we get rid of the m4 dependency, I can't change the behavior back to what you came to expect in 2.5.39 without breaking other functionality.

That said, you can use the constructions I provided above instead. I'm about done with the tests for them so we'll notice before losing any more comment functionality.

@gitamohr
Copy link
Author

gitamohr commented Mar 8, 2023

Yep no worries, as I've mentioned this is no real impediment; just something I noticed.

@westes
Copy link
Owner

westes commented Mar 9, 2023

fixed by #557

@westes westes closed this as completed Mar 9, 2023
@matthew-wozniczka
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants