-
-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix suffixed-+'s on regex's #2536
Conversation
well, maybe this was actually intended to be something else? i can't find a reference in libstdc++ for this kind of style being recognised or meaning something special.. |
I will wait for @Digitalone1 opinion on this as she is the one that worked the most with the regex strings. At least on Arch Linux and |
it's handled by the c++ stdlib (libc++/libstdc++/...), and yea, the gcc libstdc++ handles this fine. i guess i am curious on how it actually works with libstdc++ at all myself, even if this fix is correct, because it strikes me as bizarre :D |
Download the latest artifacts for this pull request: |
the however, the c++ standard's ecmascript syntax does not specify possessive quantifiers; as far as i can tell, they exist in PCRE and in java, as well as in python (from version 3.11), perl, ruby, and boost, but not in standard c++ or javascript this means they should be removed, but probably with care, as the meaning can change subtly i guess libstdc++ implements them as an extension but fails to mention anywhere that it does |
more findings: i just tested this with libstdc++ and it does not actually implement the semantics of possessive quantifiers; testing my so this might as well be a bug in libstdc++, as they don't mention the extension anywhere (and perhaps @Digitalone1 implemented it with the assumption that they match possessively, but they do not) |
I'm against this. Possessive quantifiers are more efficient and safer because if they are not respected, the regex fails faster. More info here:
This is because they prevent backtracking. More info here.
I know that C++ reference does not mention them and that's surprising to me (I only found it here), but it works for most, if not all, Linux distributions and it's good to have them. A specific patch for Chimera could be made as it was made for @wwmm you decide if this is worth to merge. The problem supporting other compilers is that maybe today it's only regex (and also std::from_chars), but tomorrow can be another thing that forces us to change plans. Don't know if it's worth to do this if most of Linux distributions work out of the box with libstdc++ in an easier and more efficient way. |
@Digitalone1 libstdc++ literally does not implement them; the semantics are identical to ordinary, just check it |
this is not possessive semantics: https://ideone.com/j75Rdp the libstdc++ documentation does not mention the existence of any such extension, therefore it can safely be concluded that the only reason that syntax seemingly works is that the implementation has a bug, not that the implementation implements the extension |
Already tested with debug messages on the command line when I introduced them. https://github.com/wwmm/easyeffects/blob/master/src/equalizer_ui.cpp#L428-L431
We try to use C++20. gcc is now on 13.2.1 |
at least with gcc 12 it still does not match possessively |
gcc 13: https://godbolt.org/z/1WYK417Mz |
additionally, i'm grepping the libstdc++ codebase and there are zero references to any such extension existing in it |
made an upstream report too: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111129 |
@Digitalone1 GCC upstream says:
|
That's interesting. I will retest it on my system and let you know the result. |
I don't know how you tested it, but that
Doesn't matter, the std::regex semantics are unchanged except for the addition of the |
That's weird. See https://en.cppreference.com/w/cpp/regex/regex_search
|
yes, because the semantics are not possessive, so it returns a match basically, adding a + to a quantifier with libstdc++ is identical to as if there was no +, as this is a gcc bug,and the easyeffects code is relying on a specific bug to work, despite there being no change in behavior over correct code (without the pluses) |
GCC shouldn't even compile the regex, because it's not valid. But we incorrectly accept the invalid regex and also say it matches. I'm not sure what we do, maybe we just completely ignore the extra Edit: But as qp66 says, there's no benefit to using an invalid regex that happens to "work" with one implementation due to a bug (which will get fixed), rather than just using a correct regex that works with all implementations. |
I'm pretty sure it was returning |
I tested it another time and I confirm it returns true on the following std::string s = "aaa";
std::smatch matches;
auto res = std::regex_search(s, matches, std::regex("(a++a)")); So it was my mistake. I'm sorry. This PR is okay to me. @q66 @nekopsykose Also the lines here can be removed, thanks: https://github.com/wwmm/easyeffects/blob/master/src/equalizer_ui.cpp#L428-L431 |
these are not valid, and libc++ fails on these with: libc++abi: terminating due to uncaught exception of type std::__1::regex_error: One of *?+{ was not preceded by a valid regular expression. it happens to work on libstdc++, but the intention is the same without the + and works with both
5680bec
to
f8ce8f2
Compare
It returned false with GCC 4.8 when Since GCC 4.9.0 (the first release with working |
It was 2-3 years ago, could not be a version so low. I misunderstood the results of the tests and the compiler was not failing adding the Anyway, in 2023 not having this feature in C++ is a bit awkward. PHP has it from years. |
I think PHP uses PCRE for its regex engine. C++ supports ecmascript and posix regexes, and neither of those supports that feature. |
So if I want to optimize a regex avoiding the backtracking, is there no way to do it on C++? At least atomic groups are supported? |
You want to optimize a regex in C++, don't use std::regex. PCRE2 can be used in C++. |
How? With an external library? It's doable, but I don't like to add an extra dependency on this project. Let's keep it simple, even if it's not optimized... |
these are not valid, and libc++ fails on these with:
libc++abi: terminating due to uncaught exception of type std::__1::regex_error: One of *?+{ was not preceded by a valid regular expression.
it happens to work on libstdc++, but the intention is the same without the + and works with both
part of #2535