Description
This is really a long-standing bug; I believe it was introduced between versions 2.5.4 and 2.5.33, with the refactoring of case-insensitivity.
A small test case:
%option case-insensitivity
%%
i{1,2} { printf("%s\n", yytext); }
.|\n ;
Given input ii II
, the expected output is:
ii
II
but the actual output is
ii
I
I
DIAGNOSIS
The problem is triggered in parse.y around line 769 where case-insensitive single-character singletons are turned into an alternation:
$$ = mkor (mkstate($1), mkstate(reverse_case($1)));
But the real problem is in mkor(first, second)
, which does not modify firstst[first]
before returning. Effectively, it assumes that first < second
although that precondition is not noted. Since the order of evaluation of arguments to C functions is not specified, it is quite likely that the second argument to the call to mkor
above is evaluated before the first argument, in which case the unstated requirement will not be satisfied. The consequence is that when dupmachine
is eventually called in the expansion of the brace repeat, it fails to copy one of the alternatives leading to the behaviour noted above.
SUGGESTED FIX
I believe the best fix to the problem would be to add
firstst[first] = MIN(firstst[first], firstst[second]);
at the end of mkor
.
HISTORY and IMPACT:
This problem seems to have been noticed in 2009, when an attempt to build the Shakespeare language failed. (See http://stackoverflow.com/questions/1948372/compiling-and-executing-the-shakespeare-programming-language-translator-spl2c-on). Unfortunately, it was incorrectly diagnosed as a bad regular expression for Roman numerals. Although the original regular expression was correct, it used lower-case letters, while the suggested replacement regular expression happened to use upper-case letters. Since the source code of the Shakespeare example programs uses upper-case, the replacement regular expression worked whereas the original did not. (The Shakespeare flex input file has %option case-insensitive
so it should not matter, and when Shakespeare was written, it didn't.)
A problem with Shakespeare was reported on SO a couple of days ago which referenced the 2009 link, and in the course of investigating I uncovered the Flex bug. There are sporadic reports of the same problem here and there on the 'net, but since Shakespeare is not used in mission-critical applications, afaik, the problem was never investigated fully.
This suggests that using the brace-repeat operator with single-character case-insensitive arguments is fairly rare (Roman numerals are one of the few examples I could think of) and so the impact is low.
Nonetheless, I think the fix is simple enough that it is worth doing.