Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happens at the end of the file for lex? - input() return value #448

Open
spth opened this issue Jun 3, 2020 · 2 comments
Open

What happens at the end of the file for lex? - input() return value #448

spth opened this issue Jun 3, 2020 · 2 comments

Comments

@spth
Copy link

spth commented Jun 3, 2020

I wonder what is supposed to happen when a lex lexer reaches the end of
the input calling input().

This information needs to be included in the manual.
The only information I found in the flex manual states:

If 'input()' encounters an end-of-file the normal 'yywrap()' processing
is done. A "real" end-of-file is returned by 'input()' as 'EOF'.

That part is the same for flex 2.5.4 and flex 2.6.4. But the two versions behave quite differently. Also what is an ''end-of-file" vs a "'real' end-of-file"?

The breaking change between 2.5.4 and 2.6.4 should be documented in the manual. And I wonder why it was made.

This small example reproduces the difference:


%%
.       {for(int i = 0; i < 4; i++) {int ch = input(); printf("%d\n", ch);}}

%%
main()
        {
        yylex();
        }
        
int
yywrap (void)
{
  printf("yywrap!\n");
  return 1;
}

Invoked on input containing only one character, I see

philipp@notebook5:/tmp$ ./a.out < test.c
10
yywrap!
-1
yywrap!
-1
yywrap!
-1
yywrap!

i.e. input() returning EOF for flex 2.5.4, and

philipp@notebook5:/tmp$ ./a.out < test.c
10
yywrap!
0
yywrap!
0
yywrap!
0
yywrap!

i.e. input() returning 0 for flex 2.6.4.

There is also a Debian bug report about this:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415
and there was a previous issue reported here, but closed without comment:
#394

This change breaks e.g. the Small Device C Compiler.

@spth
Copy link
Author

spth commented Jun 3, 2020

The change was made here, but there is no information as to why, and no corresponding change in documentation:
f863c94

@markjdb
Copy link

markjdb commented Feb 12, 2021

This also breaks the scanner used by libdtrace in FreeBSD.

freebsd-git pushed a commit to freebsd/freebsd-src that referenced this issue Feb 17, 2021
Importing flex 2.6.4 has introduced a regression: input() now returns 0
instead of EOF to indicate that the end of input was reached, just like
traditional AT&T and POSIX lex.  Note the behavior contradicts flex(1).
See "INCOMPATIBILITIES WITH LEX AND POSIX" section for information.
This incompatibility traces back to the original version and documented
in its manual page by the Vern Paxson.

Apparently, it has been reported in a few places, e.g.,

westes/flex#448
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415

Unfortunately, this also breaks the scanner used by libdtrace and
dtrace is unable to resolve some probe argument types as a result.  See
PR253440 for more information.

Note the regression was introduced by the following upstream commit
without any explanation or documentation change:

westes/flex@f863c94

Now we restore the traditional flex behavior unless lex-compatibility
mode is set with "-l" option because I believe the author originally
wanted to make it more lex and POSIX compatible.

PR:		253440
Reported by:	markj
freebsd-git pushed a commit to freebsd/freebsd-src that referenced this issue Feb 22, 2021
Importing flex 2.6.4 has introduced a regression: input() now returns 0
instead of EOF to indicate that the end of input was reached, just like
traditional AT&T and POSIX lex.  Note the behavior contradicts flex(1).
See "INCOMPATIBILITIES WITH LEX AND POSIX" section for information.
This incompatibility traces back to the original version and documented
in its manual page by the Vern Paxson.

Apparently, it has been reported in a few places, e.g.,

westes/flex#448
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415

Unfortunately, this also breaks the scanner used by libdtrace and
dtrace is unable to resolve some probe argument types as a result.  See
PR253440 for more information.

Note the regression was introduced by the following upstream commit
without any explanation or documentation change:

westes/flex@f863c94

Now we restore the traditional flex behavior unless lex-compatibility
mode is set with "-l" option because I believe the author originally
wanted to make it more lex and POSIX compatible.

PR:		253440
Reported by:	markj

(cherry picked from commit 6b7e592)
freebsd-git pushed a commit to freebsd/freebsd-src that referenced this issue Feb 22, 2021
Importing flex 2.6.4 has introduced a regression: input() now returns 0
instead of EOF to indicate that the end of input was reached, just like
traditional AT&T and POSIX lex.  Note the behavior contradicts flex(1).
See "INCOMPATIBILITIES WITH LEX AND POSIX" section for information.
This incompatibility traces back to the original version and documented
in its manual page by the Vern Paxson.

Apparently, it has been reported in a few places, e.g.,

westes/flex#448
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415

Unfortunately, this also breaks the scanner used by libdtrace and
dtrace is unable to resolve some probe argument types as a result.  See
PR253440 for more information.

Note the regression was introduced by the following upstream commit
without any explanation or documentation change:

westes/flex@f863c94

Now we restore the traditional flex behavior unless lex-compatibility
mode is set with "-l" option because I believe the author originally
wanted to make it more lex and POSIX compatible.

PR:		253440
Reported by:	markj
Approved by:	re (gjb)

(cherry picked from commit 6b7e592)
ericbsd pushed a commit to ghostbsd/ghostbsd-src that referenced this issue Mar 24, 2021
Importing flex 2.6.4 has introduced a regression: input() now returns 0
instead of EOF to indicate that the end of input was reached, just like
traditional AT&T and POSIX lex.  Note the behavior contradicts flex(1).
See "INCOMPATIBILITIES WITH LEX AND POSIX" section for information.
This incompatibility traces back to the original version and documented
in its manual page by the Vern Paxson.

Apparently, it has been reported in a few places, e.g.,

westes/flex#448
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415

Unfortunately, this also breaks the scanner used by libdtrace and
dtrace is unable to resolve some probe argument types as a result.  See
PR253440 for more information.

Note the regression was introduced by the following upstream commit
without any explanation or documentation change:

westes/flex@f863c94

Now we restore the traditional flex behavior unless lex-compatibility
mode is set with "-l" option because I believe the author originally
wanted to make it more lex and POSIX compatible.

PR:		253440
Reported by:	markj

(cherry picked from commit 6b7e592)
brooksdavis pushed a commit to CTSRD-CHERI/cheribsd that referenced this issue Oct 28, 2021
Importing flex 2.6.4 has introduced a regression: input() now returns 0
instead of EOF to indicate that the end of input was reached, just like
traditional AT&T and POSIX lex.  Note the behavior contradicts flex(1).
See "INCOMPATIBILITIES WITH LEX AND POSIX" section for information.
This incompatibility traces back to the original version and documented
in its manual page by the Vern Paxson.

Apparently, it has been reported in a few places, e.g.,

westes/flex#448
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415

Unfortunately, this also breaks the scanner used by libdtrace and
dtrace is unable to resolve some probe argument types as a result.  See
PR253440 for more information.

Note the regression was introduced by the following upstream commit
without any explanation or documentation change:

westes/flex@f863c94

Now we restore the traditional flex behavior unless lex-compatibility
mode is set with "-l" option because I believe the author originally
wanted to make it more lex and POSIX compatible.

PR:		253440
Reported by:	markj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants