New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix VRE_capture() interface wrt more matching groups than group count #3725
base: master
Are you sure you want to change the base?
Conversation
9a2fa48
to
bc7b5cf
Compare
On a related question: Shouldn't callers of |
Another related question: Or should |
pcre2 has the means to get that info from a compiled regex, so VRE could expose it. |
Not sure what you mean by this. If the caller doesn't want to capture more than N groups, well then don't. One way to ensure that is to write the regex so that it has no more than N captures. Do you have something in mind like "tell pcre2 to never capture than N groups, because I'll never need more"? pcre2 has a lot of bells and whistles, but I don't recall that one. There are so many ways to configure pcre2 that I could easily be wrong. But I think it's likely that Philip Hazel reasoned that if you don't want more than N captures, then you wouldn't have more in your regex in the first place. |
The point is to fail the compilation if the pattern contains more than N groups, rather than failing later for a match. |
Got it. That's certainly possible and should be easy, same reason as for the other point -- pcre2 can tell you how many groups there are for a compiled regex. |
About that, VCL Because of this hard limit, I'm tempted to say querying the number of groups should be a pcre2 operation not available in VRE, accessible via |
bc7b5cf
to
1f15164
Compare
After the last comment from @Dridi I have now added a compile time check for a maximum of 9 capture groups: As explained by @Dridi , we have no facility in varnish-cache to use more capture groups, so we should not accept them at compile time. As a side effect, this could also avoid potential memory consumption issues when matching. For the case where capture groups are not used for capturing, a hint is output with the error message:
ping @slimhazard |
Has it been possible in the past to have more than 9 capturing groups in a regex, and yet not a problem as long as you don't go higher than 9 in any use of regsub()? That's a marginal case, of course, I'm just wondering if this could be someone's breaking change, if the number of groups has never been checked as an error condition before. |
@slimhazard yes, this is a breaking change, thus the tip in the error message |
c60d4d7
to
21ec451
Compare
I think it is too heavy-handed to limit the number of capturing groups at regex compile time, and it goes against the spirit of 6503249. Not everyone knows that a group doesn't have to capture:
My comment about the 10 capturing groups was regarding our In its current state, the new VRE API should be an improvement in that regard, not a regression. Now regarding the original problem, the way you initially tried to solve it in 9a2fa48 is probably better if you:
Now that I have a better understanding, I think the original patch is better. That makes |
d0783b6
to
f529fdc
Compare
d861038
to
03b0478
Compare
in order to signal potential regsub() failures at compile time.
The new VRE_capture() interface did not provide a way to detect the case where the number of matching subexpressions was higher than the group count passed. This patch basically restores the original pcre (pcre1 if you will) behaviour to return a number higher than the number of groups to signal this case. Alternatively, if we wanted to have the pcre2-like semantics (pcre2 returns 0 for this case), we would basically need to make the VRE_capture() interface identical to vre_capture() and allow to pass a count pointer, otherwise we had no way to return the number of matches.
PCRE2_ERROR_TOO_MANY_CAPTURES was added in April 2019, which, apparently, is too new for some of our circ^Wbuild boxes.
03b0478
to
ef8e75c
Compare
Related: Working on a recent addition to the re vmod, the question arose if the VRE API could be changed to allow this use case. Unless I overlook something, in addition to this ticket, only a JIT option argument would be needed in @Dridi I think I might also have understood a reason for PCRE2 to use offsets for captures: They are a natural choice when relocating the subject (you can just keep them unaltered if you do). |
The new
VRE_capture()
interface does not provide a way to detect the case where the number of matching subexpressions is higher than the group count passed.This patch basically restores the original pcre (pcre1 if you will) behaviour to return a number higher than the number of groups to signal this case.
Alternatively, if we wanted to have the pcre2-like semantics ...
... we would basically need to make the
VRE_capture()
interface identical tovre_capture()
and allow to pass a count pointer, otherwise we had no way to return the number of matches.ping @slimhazard
ref: https://gitlab.com/uplex/varnish/libvmod-re/-/merge_requests/1