Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

unput doesn't work correctly #104

Closed
asselin opened this Issue Jun 6, 2012 · 6 comments

Comments

Projects
None yet
2 participants

asselin commented Jun 6, 2012

unput() puts a character back in the stream, but doesn't fixup yytext, etc., resulting in the same character appearing in the result twice. unput() needs something like this added to it:

    this.yytext = this.yytext.substr(0, this.yytext.length-1);
    this.yyleng--;
    this.match = this.match.substr(0, this.match.length-1);;
    this.matched = this.matched.substr(0, this.matched.length-1);;
    var lines = ch.match(/\n/);
    if (lines) this.yylineno--;
Owner

zaach commented Jun 6, 2012

unput is not intended to be a rewind, as seems to be warranted by your usecase. It simply puts a character into the input stream, not necessarily the one that was just scanned. Perhaps the prefix "un" is a bit confusing in this context. It's modeled after flex' unput.

@zaach zaach closed this Jun 6, 2012

asselin commented Jun 6, 2012

FLEX actually does pull the character off of the matched stream. Here's an example to illustrate:

FLEX example:

%{

#define STRING  1000

%}

%option noyywrap

%%

'[^'\n]*'   {
    int c = input();

    unput(c);   /* just peeking */
    if(c != '\'') {
        return STRING;
    } else
        yymore();
}

%%

main(int ac,  char **av)
{
char buf[100];

int ret = yylex();
sprintf(buf, "yylex returned %d\nyytext is ", ret);

fwrite(buf, strlen(buf), 1, yyout);
fwrite( yytext, yyleng, 1, yyout );
}

Equivalent JISON

/* description: Parses end executes mathematical expressions. */

/* lexical grammar */
%lex
%options flex
%%

"'"[^'\r\n]*"'"   {
      var c = this.input();

      this.unput(c); /* just peeking */
      if (c != '\'') {
         return 'STRING';
      } else
 this.more();
 }

\s+                     { 
                       ///console.log("whitespace"); 
                        }

<<EOF>>                 return 'EOF'
.                       return 'INVALID'

/lex


%start where_clause

%% /* language grammar */

where_clause:
      STRING EOF                      {
                                                   console.log("yytext is", $1);
                                                   return $1;
                                                }
   ;

Run each with this input (that's 2 single quotes in the middle):

'abc''def'

FLEX Output:

yylex returned 1000
yytext is 'abc''def'

JISON Output:

yytext is 'abc'''def'

(JISON's yytext has 3 single quotes in the middle vs. FLEX which has just the original 2)

@zaach zaach reopened this Jun 7, 2012

Owner

zaach commented Jun 7, 2012

Interesting -- thanks for the test case.

Owner

zaach commented Jun 14, 2012

@asselin What happens in cases where more text is unput than was originally matched?

asselin commented Jun 14, 2012

yytext would be garbage at that point, but otherwise it's valid. Here's an example from the FLEX manual that does exactly that (see http://flex.sourceforge.net/manual/Actions.html#Actions).

unput(c) puts the character c back onto the input stream. It will be the next character scanned. The following action will take the current token and cause it to be rescanned enclosed in parentheses.

     {
     int i;
     /* Copy yytext because unput() trashes yytext */
     char *yycopy = strdup( yytext );
     unput( ')' );
     for ( i = yyleng - 1; i >= 0; --i )
         unput( yycopy[i] );
     unput( '(' );
     free( yycopy );
     }

Note that since each unput() puts the given character back at the beginning of the input stream, pushing back strings must be done back-to-front.

An important potential problem when using unput() is that if you are using %pointer (the default), a call to unput() destroys the contents of yytext, starting with its rightmost character and devouring one character to the left with each call. If you need the value of yytext preserved after a call to unput() (as in the above example), you must either first copy it elsewhere, or build your scanner using %array instead (see Matching).

Finally, note that you cannot put back `EOF' to attempt to mark the input stream with an end-of-file.

Owner

zaach commented Jun 14, 2012

Right, I'm wondering what happens to yyleng, yylineno, and other location information. From the example it looks like yyleng is uneffected by unput. It would strange if they could go further back than the original match.

zaach added a commit that referenced this issue Jun 15, 2012

@zaach zaach closed this Jun 16, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment