What happens at the end of the file for lex? - input() return value #448

spth · 2020-06-03T08:29:25Z

I wonder what is supposed to happen when a lex lexer reaches the end of
the input calling input().

This information needs to be included in the manual.
The only information I found in the flex manual states:

If 'input()' encounters an end-of-file the normal 'yywrap()' processing
is done. A "real" end-of-file is returned by 'input()' as 'EOF'.

That part is the same for flex 2.5.4 and flex 2.6.4. But the two versions behave quite differently. Also what is an ''end-of-file" vs a "'real' end-of-file"?

The breaking change between 2.5.4 and 2.6.4 should be documented in the manual. And I wonder why it was made.

This small example reproduces the difference:


%%
.       {for(int i = 0; i < 4; i++) {int ch = input(); printf("%d\n", ch);}}

%%
main()
        {
        yylex();
        }
        
int
yywrap (void)
{
  printf("yywrap!\n");
  return 1;
}

Invoked on input containing only one character, I see

philipp@notebook5:/tmp$ ./a.out < test.c
10
yywrap!
-1
yywrap!
-1
yywrap!
-1
yywrap!

i.e. input() returning EOF for flex 2.5.4, and

philipp@notebook5:/tmp$ ./a.out < test.c
10
yywrap!
0
yywrap!
0
yywrap!
0
yywrap!

i.e. input() returning 0 for flex 2.6.4.

There is also a Debian bug report about this:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415
and there was a previous issue reported here, but closed without comment:
#394

This change breaks e.g. the Small Device C Compiler.

The text was updated successfully, but these errors were encountered:

spth · 2020-06-03T08:43:07Z

The change was made here, but there is no information as to why, and no corresponding change in documentation:
f863c94

markjdb · 2021-02-12T19:49:14Z

This also breaks the scanner used by libdtrace in FreeBSD.

Importing flex 2.6.4 has introduced a regression: input() now returns 0 instead of EOF to indicate that the end of input was reached, just like traditional AT&T and POSIX lex. Note the behavior contradicts flex(1). See "INCOMPATIBILITIES WITH LEX AND POSIX" section for information. This incompatibility traces back to the original version and documented in its manual page by the Vern Paxson. Apparently, it has been reported in a few places, e.g., westes/flex#448 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415 Unfortunately, this also breaks the scanner used by libdtrace and dtrace is unable to resolve some probe argument types as a result. See PR253440 for more information. Note the regression was introduced by the following upstream commit without any explanation or documentation change: westes/flex@f863c94 Now we restore the traditional flex behavior unless lex-compatibility mode is set with "-l" option because I believe the author originally wanted to make it more lex and POSIX compatible. PR: 253440 Reported by: markj

Importing flex 2.6.4 has introduced a regression: input() now returns 0 instead of EOF to indicate that the end of input was reached, just like traditional AT&T and POSIX lex. Note the behavior contradicts flex(1). See "INCOMPATIBILITIES WITH LEX AND POSIX" section for information. This incompatibility traces back to the original version and documented in its manual page by the Vern Paxson. Apparently, it has been reported in a few places, e.g., westes/flex#448 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415 Unfortunately, this also breaks the scanner used by libdtrace and dtrace is unable to resolve some probe argument types as a result. See PR253440 for more information. Note the regression was introduced by the following upstream commit without any explanation or documentation change: westes/flex@f863c94 Now we restore the traditional flex behavior unless lex-compatibility mode is set with "-l" option because I believe the author originally wanted to make it more lex and POSIX compatible. PR: 253440 Reported by: markj (cherry picked from commit 6b7e592)

Importing flex 2.6.4 has introduced a regression: input() now returns 0 instead of EOF to indicate that the end of input was reached, just like traditional AT&T and POSIX lex. Note the behavior contradicts flex(1). See "INCOMPATIBILITIES WITH LEX AND POSIX" section for information. This incompatibility traces back to the original version and documented in its manual page by the Vern Paxson. Apparently, it has been reported in a few places, e.g., westes/flex#448 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415 Unfortunately, this also breaks the scanner used by libdtrace and dtrace is unable to resolve some probe argument types as a result. See PR253440 for more information. Note the regression was introduced by the following upstream commit without any explanation or documentation change: westes/flex@f863c94 Now we restore the traditional flex behavior unless lex-compatibility mode is set with "-l" option because I believe the author originally wanted to make it more lex and POSIX compatible. PR: 253440 Reported by: markj Approved by: re (gjb) (cherry picked from commit 6b7e592)

Importing flex 2.6.4 has introduced a regression: input() now returns 0 instead of EOF to indicate that the end of input was reached, just like traditional AT&T and POSIX lex. Note the behavior contradicts flex(1). See "INCOMPATIBILITIES WITH LEX AND POSIX" section for information. This incompatibility traces back to the original version and documented in its manual page by the Vern Paxson. Apparently, it has been reported in a few places, e.g., westes/flex#448 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415 Unfortunately, this also breaks the scanner used by libdtrace and dtrace is unable to resolve some probe argument types as a result. See PR253440 for more information. Note the regression was introduced by the following upstream commit without any explanation or documentation change: westes/flex@f863c94 Now we restore the traditional flex behavior unless lex-compatibility mode is set with "-l" option because I believe the author originally wanted to make it more lex and POSIX compatible. PR: 253440 Reported by: markj (cherry picked from commit 6b7e592)

Importing flex 2.6.4 has introduced a regression: input() now returns 0 instead of EOF to indicate that the end of input was reached, just like traditional AT&T and POSIX lex. Note the behavior contradicts flex(1). See "INCOMPATIBILITIES WITH LEX AND POSIX" section for information. This incompatibility traces back to the original version and documented in its manual page by the Vern Paxson. Apparently, it has been reported in a few places, e.g., westes/flex#448 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415 Unfortunately, this also breaks the scanner used by libdtrace and dtrace is unable to resolve some probe argument types as a result. See PR253440 for more information. Note the regression was introduced by the following upstream commit without any explanation or documentation change: westes/flex@f863c94 Now we restore the traditional flex behavior unless lex-compatibility mode is set with "-l" option because I believe the author originally wanted to make it more lex and POSIX compatible. PR: 253440 Reported by: markj

SouravKB mentioned this issue Dec 6, 2022

Calling input() at EOF doesn't return EOF character(-1). #548

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happens at the end of the file for lex? - input() return value #448

What happens at the end of the file for lex? - input() return value #448

spth commented Jun 3, 2020

spth commented Jun 3, 2020

markjdb commented Feb 12, 2021

What happens at the end of the file for lex? - input() return value #448

What happens at the end of the file for lex? - input() return value #448

Comments

spth commented Jun 3, 2020

spth commented Jun 3, 2020

markjdb commented Feb 12, 2021