Move to pcre2 #3559

dridi · 2021-03-22T15:06:27Z

Check that it's OK for all the platforms we care about.

ingvarha · 2021-05-18T15:02:54Z

Fedora bug: https://bugzilla.redhat.com/show_bug.cgi?id=1938985

It was brittle because we would figure at build time whether pcre was itself built with JIT compilation support, which could be different than the pcre that would be used at run time. With pcre2 we wouldn't run into the recursion limit and fail the match in r1576: ---- s1 EXPECT req.http.found (1) == "<undef>" failed So we can also remove the undocumented `pcre_jit` feature check from varnishtest and the single disabled test that was making use of it. Refs varnishcache#3559

Otherwise we get it from either libvarnish or libvarnishapi. Refs varnishcache#3559

It is interpreted as "T" by pcre but pcre2 is stricter and treats it as a syntax error. While there may be a bit flag to ignore unknown escape sequences, it is probably worth hardening instead. Refs varnishcache#3559

We might want to do something about it once pcre2 is in place, but it should be noted that concurrent matches on a vre_t could compete over the limits we can tweak. Refs varnishcache#3559

The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration, and non of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559

dridi · 2021-05-19T18:22:03Z

#3616

dridi · 2021-05-19T18:23:02Z

Feedback from testing on Fedora is welcome.

It is interpreted as "T" by pcre but pcre2 is stricter and treats it as a syntax error. While there may be a bit flag to ignore unknown escape sequences, it is probably worth hardening instead. Refs varnishcache#3559

We might want to do something about it once pcre2 is in place, but it should be noted that concurrent matches on a vre_t could compete over the limits we can tweak. Refs varnishcache#3559

The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration, and non of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559

The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration and none of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559

Otherwise we get it from either libvarnish or libvarnishapi. Refs #3559

It is interpreted as "T" by pcre but pcre2 is stricter and treats it as a syntax error. While there may be a bit flag to ignore unknown escape sequences, it is probably worth hardening instead. Refs #3559

We might want to do something about it once pcre2 is in place, but it should be noted that concurrent matches on a vre_t could compete over the limits we can tweak. Refs #3559

... to make our SunOS vtest build happy again ld: fatal: file /opt/local/lib/libpcre.so: wrong ELF class: ELFCLASS64 ld: fatal: file processing errors. No output written to vjsn_test collect2: error: ld returned 1 exit status The issue here was order of the -L -l arguments when a (32bit) version of a library needs to be found first in an overridden LDPATH. Ref 12bbe31 Ref #3559