Skip to content
This repository

unput doesn't work correctly #104

Closed
asselin opened this Issue June 05, 2012 · 6 comments

2 participants

Andre Asselin Zach Carter
Andre Asselin

unput() puts a character back in the stream, but doesn't fixup yytext, etc., resulting in the same character appearing in the result twice. unput() needs something like this added to it:

    this.yytext = this.yytext.substr(0, this.yytext.length-1);
    this.yyleng--;
    this.match = this.match.substr(0, this.match.length-1);;
    this.matched = this.matched.substr(0, this.matched.length-1);;
    var lines = ch.match(/\n/);
    if (lines) this.yylineno--;
Zach Carter
Owner
zaach commented June 05, 2012

unput is not intended to be a rewind, as seems to be warranted by your usecase. It simply puts a character into the input stream, not necessarily the one that was just scanned. Perhaps the prefix "un" is a bit confusing in this context. It's modeled after flex' unput.

Zach Carter zaach closed this June 05, 2012
Andre Asselin

FLEX actually does pull the character off of the matched stream. Here's an example to illustrate:

FLEX example:

%{

#define STRING  1000

%}

%option noyywrap

%%

'[^'\n]*'   {
    int c = input();

    unput(c);   /* just peeking */
    if(c != '\'') {
        return STRING;
    } else
        yymore();
}

%%

main(int ac,  char **av)
{
char buf[100];

int ret = yylex();
sprintf(buf, "yylex returned %d\nyytext is ", ret);

fwrite(buf, strlen(buf), 1, yyout);
fwrite( yytext, yyleng, 1, yyout );
}

Equivalent JISON

/* description: Parses end executes mathematical expressions. */

/* lexical grammar */
%lex
%options flex
%%

"'"[^'\r\n]*"'"   {
      var c = this.input();

      this.unput(c); /* just peeking */
      if (c != '\'') {
         return 'STRING';
      } else
 this.more();
 }

\s+                     { 
                       ///console.log("whitespace"); 
                        }

<<EOF>>                 return 'EOF'
.                       return 'INVALID'

/lex


%start where_clause

%% /* language grammar */

where_clause:
      STRING EOF                      {
                                                   console.log("yytext is", $1);
                                                   return $1;
                                                }
   ;

Run each with this input (that's 2 single quotes in the middle):

'abc''def'

FLEX Output:

yylex returned 1000
yytext is 'abc''def'

JISON Output:

yytext is 'abc'''def'

(JISON's yytext has 3 single quotes in the middle vs. FLEX which has just the original 2)

Zach Carter zaach reopened this June 06, 2012
Zach Carter
Owner
zaach commented June 06, 2012

Interesting -- thanks for the test case.

Zach Carter
Owner
zaach commented June 13, 2012

@asselin What happens in cases where more text is unput than was originally matched?

Andre Asselin

yytext would be garbage at that point, but otherwise it's valid. Here's an example from the FLEX manual that does exactly that (see http://flex.sourceforge.net/manual/Actions.html#Actions).

unput(c) puts the character c back onto the input stream. It will be the next character scanned. The following action will take the current token and cause it to be rescanned enclosed in parentheses.

     {
     int i;
     /* Copy yytext because unput() trashes yytext */
     char *yycopy = strdup( yytext );
     unput( ')' );
     for ( i = yyleng - 1; i >= 0; --i )
         unput( yycopy[i] );
     unput( '(' );
     free( yycopy );
     }

Note that since each unput() puts the given character back at the beginning of the input stream, pushing back strings must be done back-to-front.

An important potential problem when using unput() is that if you are using %pointer (the default), a call to unput() destroys the contents of yytext, starting with its rightmost character and devouring one character to the left with each call. If you need the value of yytext preserved after a call to unput() (as in the above example), you must either first copy it elsewhere, or build your scanner using %array instead (see Matching).

Finally, note that you cannot put back `EOF' to attempt to mark the input stream with an end-of-file.

Zach Carter
Owner
zaach commented June 14, 2012

Right, I'm wondering what happens to yyleng, yylineno, and other location information. From the example it looks like yyleng is uneffected by unput. It would strange if they could go further back than the original match.

Zach Carter zaach referenced this issue from a commit June 10, 2012
Zach Carter Fix unput - issue #104 6c14a2b
Zach Carter zaach closed this June 16, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.