-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C-99 comment rule flawed #609
Comments
You're right - I think gobbling any trailing stars fixes the regex
or to make it easier to read with whitespace ignored
I'll make the change if you think this is right |
Zartaj
I think this would fail with /* text ** more text */
The intermediate ** will not match.
If the \ option worked e.g. “*”\[^/] the solution would be trivial
Mark
Regards
Mark Ogden
…________________________________
From: Zartaj Majeed ***@***.***>
Sent: Wednesday, December 13, 2023 1:06:15 PM
To: westes/flex ***@***.***>
Cc: Mark Ogden ***@***.***>; Author ***@***.***>
Subject: Re: [westes/flex] C-99 comment rule flawed (Issue #609)
You're right - I think gobbling any trailing stars fixes the regex
"/*"([^*]|"*"[^/])*"*"+"/"
or to make it easier to read with whitespace ignored
(?x: "/*" ( [^*] | "*"[^/*] )* "*"+"/" )
I'll make the change if you think this is right
—
Reply to this email directly, view it on GitHub<#609 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAZ6DD4KMTPVYT2CNUPNXDDYJGR4PAVCNFSM6AAAAAA7P2CSQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJTHA4DIMJWGY>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Sorry - that had a typo - I corrected it above - it should have been
This accepts
But it has another problem - stemming from the original regex - where it accepts invalid comments like
This is because for the intermediate |
I feel this cannot be done with Flex regex - it seems to require some lookahead to avoid prematurely consuming the star from a comment end delimiter I'm actually surprised that every single basic regex solution I found online is wrong! Some of these posts are decades old The Flex doc also has a FAQ on matching C comments - there it only has couple example patterns that are clearly labelled wrong and doesn't purport to offer a working regex - that's probably what needs to be done for this section too The trailing context |
Zartag
I guess this is one case where the non greedy glob works.
(?s:”/*”.*?”*/“)
Mark
Regards
Mark Ogden
…________________________________
From: Zartaj Majeed ***@***.***>
Sent: Thursday, December 14, 2023 1:22:55 PM
To: westes/flex ***@***.***>
Cc: Mark Ogden ***@***.***>; Author ***@***.***>
Subject: Re: [westes/flex] C-99 comment rule flawed (Issue #609)
I feel this cannot be done with Flex regex - it seems to require some lookahead to avoid prematurely consuming the star from a comment end delimiter */
I'm actually surprised that every single basic regex solution I found online is wrong! Some of these posts are decades old
The Flex doc also has a FAQ on matching C comments - there it only has couple example patterns that are clearly labelled wrong and doesn't purport to offer a working regex - that's probably what needs to be done for this section too
The trailing context "*"/[^/] you tried can't be used inside group parentheses - that's why you got the error - I've always avoided using it for this and other limitations
—
Reply to this email directly, view it on GitHub<#609 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAZ6DD7JBYYXI745K4LNRUTYJL4S7AVCNFSM6AAAAAA7P2CSQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVHA2DINBXHE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Zartag
Just had a thought
(?xs:”/*” ([^/*]* | “*” )* “*/“)
Might work as the */ is longer than “*”
Mark
Regards
Mark Ogden
…________________________________
From: ***@***.*** ***@***.***>
Sent: Thursday, December 14, 2023 2:05:06 PM
To: westes/flex ***@***.***>; westes/flex ***@***.***>
Cc: Author ***@***.***>
Subject: Re: [westes/flex] C-99 comment rule flawed (Issue #609)
Zartag
I guess this is one case where the non greedy glob works.
(?s:”/*”.*?”*/“)
Mark
Regards
Mark Ogden
________________________________
From: Zartaj Majeed ***@***.***>
Sent: Thursday, December 14, 2023 1:22:55 PM
To: westes/flex ***@***.***>
Cc: Mark Ogden ***@***.***>; Author ***@***.***>
Subject: Re: [westes/flex] C-99 comment rule flawed (Issue #609)
I feel this cannot be done with Flex regex - it seems to require some lookahead to avoid prematurely consuming the star from a comment end delimiter */
I'm actually surprised that every single basic regex solution I found online is wrong! Some of these posts are decades old
The Flex doc also has a FAQ on matching C comments - there it only has couple example patterns that are clearly labelled wrong and doesn't purport to offer a working regex - that's probably what needs to be done for this section too
The trailing context "*"/[^/] you tried can't be used inside group parentheses - that's why you got the error - I've always avoided using it for this and other limitations
—
Reply to this email directly, view it on GitHub<#609 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAZ6DD7JBYYXI745K4LNRUTYJL4S7AVCNFSM6AAAAAA7P2CSQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVHA2DINBXHE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Oops
The first part should be [^*] otherwise / will not be allowed
Mark
Regards
Mark Ogden
…________________________________
From: ***@***.*** ***@***.***>
Sent: Thursday, December 14, 2023 2:42:27 PM
To: westes/flex ***@***.***>; westes/flex ***@***.***>
Cc: Author ***@***.***>
Subject: Re: [westes/flex] C-99 comment rule flawed (Issue #609)
Zartag
Just had a thought
(?xs:”/*” ([^/*]* | “*” )* “*/“)
Might work as the */ is longer than “*”
Mark
Regards
Mark Ogden
________________________________
From: ***@***.*** ***@***.***>
Sent: Thursday, December 14, 2023 2:05:06 PM
To: westes/flex ***@***.***>; westes/flex ***@***.***>
Cc: Author ***@***.***>
Subject: Re: [westes/flex] C-99 comment rule flawed (Issue #609)
Zartag
I guess this is one case where the non greedy glob works.
(?s:”/*”.*?”*/“)
Mark
Regards
Mark Ogden
________________________________
From: Zartaj Majeed ***@***.***>
Sent: Thursday, December 14, 2023 1:22:55 PM
To: westes/flex ***@***.***>
Cc: Mark Ogden ***@***.***>; Author ***@***.***>
Subject: Re: [westes/flex] C-99 comment rule flawed (Issue #609)
I feel this cannot be done with Flex regex - it seems to require some lookahead to avoid prematurely consuming the star from a comment end delimiter */
I'm actually surprised that every single basic regex solution I found online is wrong! Some of these posts are decades old
The Flex doc also has a FAQ on matching C comments - there it only has couple example patterns that are clearly labelled wrong and doesn't purport to offer a working regex - that's probably what needs to be done for this section too
The trailing context "*"/[^/] you tried can't be used inside group parentheses - that's why you got the error - I've always avoided using it for this and other limitations
—
Reply to this email directly, view it on GitHub<#609 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAZ6DD7JBYYXI745K4LNRUTYJL4S7AVCNFSM6AAAAAA7P2CSQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVHA2DINBXHE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Zartag
I found a solution on the internet
(?xs: “/*” ([^*] | “*”+[^*/])* “*”+”/“)
Mark
Regards
Mark
…________________________________
From: ***@***.*** ***@***.***>
Sent: Thursday, December 14, 2023 2:45:20 PM
To: westes/flex ***@***.***>; westes/flex ***@***.***>
Cc: Author ***@***.***>
Subject: Re: [westes/flex] C-99 comment rule flawed (Issue #609)
Oops
The first part should be [^*] otherwise / will not be allowed
Mark
Regards
Mark Ogden
________________________________
From: ***@***.*** ***@***.***>
Sent: Thursday, December 14, 2023 2:42:27 PM
To: westes/flex ***@***.***>; westes/flex ***@***.***>
Cc: Author ***@***.***>
Subject: Re: [westes/flex] C-99 comment rule flawed (Issue #609)
Zartag
Just had a thought
(?xs:”/*” ([^/*]* | “*” )* “*/“)
Might work as the */ is longer than “*”
Mark
Regards
Mark Ogden
________________________________
From: ***@***.*** ***@***.***>
Sent: Thursday, December 14, 2023 2:05:06 PM
To: westes/flex ***@***.***>; westes/flex ***@***.***>
Cc: Author ***@***.***>
Subject: Re: [westes/flex] C-99 comment rule flawed (Issue #609)
Zartag
I guess this is one case where the non greedy glob works.
(?s:”/*”.*?”*/“)
Mark
Regards
Mark Ogden
________________________________
From: Zartaj Majeed ***@***.***>
Sent: Thursday, December 14, 2023 1:22:55 PM
To: westes/flex ***@***.***>
Cc: Mark Ogden ***@***.***>; Author ***@***.***>
Subject: Re: [westes/flex] C-99 comment rule flawed (Issue #609)
I feel this cannot be done with Flex regex - it seems to require some lookahead to avoid prematurely consuming the star from a comment end delimiter */
I'm actually surprised that every single basic regex solution I found online is wrong! Some of these posts are decades old
The Flex doc also has a FAQ on matching C comments - there it only has couple example patterns that are clearly labelled wrong and doesn't purport to offer a working regex - that's probably what needs to be done for this section too
The trailing context "*"/[^/] you tried can't be used inside group parentheses - that's why you got the error - I've always avoided using it for this and other limitations
—
Reply to this email directly, view it on GitHub<#609 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAZ6DD7JBYYXI745K4LNRUTYJL4S7AVCNFSM6AAAAAA7P2CSQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVHA2DINBXHE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Yep - this works - now the middle part of the regex Can test with grep
and multiline comments
Incidentally the dotall flag |
The C++ comment regex doesn't account for newline escapes in the comment body either - it fails for
Adding another match for escaped newlines after comment start fixes it
|
So the full correct regex for C and C++ comments is
or
|
Fix in PR 614, #614 |
fixed by #614 |
In section A.4.3 of the documentation, the C99 comment pattern is given as
("/*"([^*]|"*"[^/])*"*/")|("/"(\\\n)*"/"[^\n]*)
The first part of this for recognising /* ... */ patterns appears to be incorrect, specifically
/* some text **/
would fail as the ** would match"*"[^/]
, preventing the*/
matching.Note I tried inserting a / between the "*" and the [^/] i.e. to check for a single * not followed by a /
i.e.
("/*"([^*]|"*"
/[^/])*"*/")
but flex generates an unrecognized rule error
The text was updated successfully, but these errors were encountered: