Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case insensitive literal not working with backreferences #216

Closed
mingodad opened this issue Jun 20, 2022 · 8 comments
Closed

Case insensitive literal not working with backreferences #216

mingodad opened this issue Jun 20, 2022 · 8 comments

Comments

@mingodad
Copy link
Contributor

See discussion here and the examples tested on cpp-peglib playground.

@yhirose
Copy link
Owner

yhirose commented Jun 20, 2022

@mingodad, could you put the smallest possible PEG grammar here, so that I can reproduce it on my machine easily? Thanks!

@ChrisHixon
Copy link

I'm seeing various corruption in the error message with this grammar on the playground:

ROOT          <- CONTENT !.
CONTENT       <- (ELEMENT / TEXT)*
ELEMENT       <- $(STAG CONTENT ETAG)
STAG          <- '<'  < $tag<TAGNAME> > '>'
ETAG          <- '</' < $tag > '>'
TAGNAME <- 'a' / 'b'i
TEXT          <- (![<] .)+

Input: <a>foo</A>

On Firefox, the error I'm currently seeing with the above grammar/input:

1:9 syntax error, unexpected 'A', expecting 'd tota�� % success fail definition 13 4 '.

It seems more apt to happen if i is added to the literals in TAGNAME, but I've seen corruption in simpler cases. Minor edits of TAGNAME change the corruption, even things like altering number of spaces. I see corruption in both Chromium and Firefox, even after refreshing, clearing cookies and local data, etc.

The command line lint seems to always show the error I believe is the proper error (with lots of variations on the TAGNAME): 1:9: syntax error, unexpected 'A', expecting 'a'.

I'll see if I can narrow it down to simpler grammar any...

@ChrisHixon
Copy link

This is about as simple as I can get it and still see consistent corruption:

ROOT          <- CONTENT !.
CONTENT       <- (ELEMENT / TEXT)*
ELEMENT       <- $(STAG CONTENT ETAG)
STAG          <- '<'  < $tag<"a"> > '>'
ETAG          <- '</' < $tag > '>'
TEXT          <- (![<] .)+

Input: <a>foo</A>
Most of the time error is: 1:9 syntax error, unexpected 'A', expecting 'd '.
Occasionally: 1:9 syntax error, unexpected 'A', expecting 's) i'.

@yhirose yhirose added the bug label Jun 25, 2022
@yhirose
Copy link
Owner

yhirose commented Jun 25, 2022

@ChrisHixon, thanks for the problem report. I fixed it at 3c2a53c.

@yhirose yhirose removed the bug label Jun 25, 2022
@yhirose
Copy link
Owner

yhirose commented Jun 25, 2022

@mingodad, I would like to make sure I understand what you are mentioning here.

The current cpp-peglib backreference behavior is 'exact match' to the captured string, and same as the regular expression.

image

If your suggestion says this example should succeed, I am not sure if it's correct. Could you explain more clearly?

@mingodad
Copy link
Contributor Author

After you showing it with regex I can see your point.
Also in the same topic it would be nice to have character class case insensitive [a-z]i for grammars where identifiers are case insensitive (SQL, Pascal, ...).

@mingodad
Copy link
Contributor Author

Here is an example on peggy playground https://peggyjs.org/online.html (also implemented here https://github.com/mingodad/peg):

start = name_char+ 
name_char =
	 [a-z0-9$_]i* [ \t\n]

Input:

one
Two
One

@yhirose
Copy link
Owner

yhirose commented Jun 25, 2022

@mingodad, thanks for the response. I'll close this issue. Could you make a separate issue for [...]i operator?

@yhirose yhirose closed this as completed Jun 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants