-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to pcre2 #3559
Labels
Comments
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
May 19, 2021
It was brittle because we would figure at build time whether pcre was itself built with JIT compilation support, which could be different than the pcre that would be used at run time. With pcre2 we wouldn't run into the recursion limit and fail the match in r1576: ---- s1 EXPECT req.http.found (1) == "<undef>" failed So we can also remove the undocumented `pcre_jit` feature check from varnishtest and the single disabled test that was making use of it. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
May 19, 2021
Otherwise we get it from either libvarnish or libvarnishapi. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
May 19, 2021
It is interpreted as "T" by pcre but pcre2 is stricter and treats it as a syntax error. While there may be a bit flag to ignore unknown escape sequences, it is probably worth hardening instead. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
May 19, 2021
We might want to do something about it once pcre2 is in place, but it should be noted that concurrent matches on a vre_t could compete over the limits we can tweak. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
May 19, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration, and non of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559
Feedback from testing on Fedora is welcome. |
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
May 20, 2021
It is interpreted as "T" by pcre but pcre2 is stricter and treats it as a syntax error. While there may be a bit flag to ignore unknown escape sequences, it is probably worth hardening instead. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
May 20, 2021
We might want to do something about it once pcre2 is in place, but it should be noted that concurrent matches on a vre_t could compete over the limits we can tweak. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
May 20, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration, and non of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
May 20, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration and none of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559
nigoroll
pushed a commit
that referenced
this issue
Jun 10, 2021
Otherwise we get it from either libvarnish or libvarnishapi. Refs #3559
nigoroll
pushed a commit
that referenced
this issue
Jun 10, 2021
It is interpreted as "T" by pcre but pcre2 is stricter and treats it as a syntax error. While there may be a bit flag to ignore unknown escape sequences, it is probably worth hardening instead. Refs #3559
nigoroll
pushed a commit
that referenced
this issue
Jun 10, 2021
We might want to do something about it once pcre2 is in place, but it should be noted that concurrent matches on a vre_t could compete over the limits we can tweak. Refs #3559
nigoroll
added a commit
that referenced
this issue
Jun 10, 2021
... to make our SunOS vtest build happy again ld: fatal: file /opt/local/lib/libpcre.so: wrong ELF class: ELFCLASS64 ld: fatal: file processing errors. No output written to vjsn_test collect2: error: ld returned 1 exit status The issue here was order of the -L -l arguments when a (32bit) version of a library needs to be found first in an overridden LDPATH. Ref 12bbe31 Ref #3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 10, 2021
It was brittle because we would figure at build time whether pcre was itself built with JIT compilation support, which could be different than the pcre that would be used at run time. With pcre2 we wouldn't run into the recursion limit and fail the match in r1576: ---- s1 EXPECT req.http.found (1) == "<undef>" failed So we can also remove the undocumented `pcre_jit` feature check from varnishtest and the single disabled test that was making use of it. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 10, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration and none of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 10, 2021
It was brittle because we would figure at build time whether pcre was itself built with JIT compilation support, which could be different than the pcre that would be used at run time. With pcre2 we wouldn't run into the recursion limit and fail the match in r1576: ---- s1 EXPECT req.http.found (1) == "<undef>" failed So we can also remove the undocumented `pcre_jit` feature check from varnishtest and the single disabled test that was making use of it. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 10, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration and none of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559
nigoroll
pushed a commit
to dridi/varnish-cache
that referenced
this issue
Jun 10, 2021
It was brittle because we would figure at build time whether pcre was itself built with JIT compilation support, which could be different than the pcre that would be used at run time. With pcre2 we wouldn't run into the recursion limit and fail the match in r1576: ---- s1 EXPECT req.http.found (1) == "<undef>" failed So we can also remove the undocumented `pcre_jit` feature check from varnishtest and the single disabled test that was making use of it. Refs varnishcache#3559
nigoroll
pushed a commit
to dridi/varnish-cache
that referenced
this issue
Jun 10, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration and none of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 14, 2021
It was brittle because we would figure at build time whether pcre was itself built with JIT compilation support, which could be different than the pcre that would be used at run time. With pcre2 we wouldn't run into the recursion limit and fail the match in r1576: ---- s1 EXPECT req.http.found (1) == "<undef>" failed So we can also remove the undocumented `pcre_jit` feature check from varnishtest and the single disabled test that was making use of it. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 14, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration and none of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 15, 2021
It was brittle because we would figure at build time whether pcre was itself built with JIT compilation support, which could be different than the pcre that would be used at run time. With pcre2 we wouldn't run into the recursion limit and fail the match in r1576: ---- s1 EXPECT req.http.found (1) == "<undef>" failed So we can also remove the undocumented `pcre_jit` feature check from varnishtest and the single disabled test that was making use of it. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 15, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration and none of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 16, 2021
It was brittle because we would figure at build time whether pcre was itself built with JIT compilation support, which could be different than the pcre that would be used at run time. With pcre2 we wouldn't run into the recursion limit and fail the match in r1576: ---- s1 EXPECT req.http.found (1) == "<undef>" failed So we can also remove the undocumented `pcre_jit` feature check from varnishtest and the single disabled test that was making use of it. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 16, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if I hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration and none of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Closes varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 16, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the source tree, performing the PCRE2 migration until it compiles, and fixing migration mistakes until the test suite passes. This migration could have been free of breaking changes if I hadn't renamed the following parameters: - pcre_match_limit to pcre2_match_limit - pcre_match_limit_recursion to pcre2_depth_limit I decided to make it clear that moving to PCRE2 was not anecdotal and a good-enough reason to break things. For this reason, in addition to the VRE_has_jit earlier retirement I introduced one more breakage in struct vre_limits by renaming the match_recursion field to depth. Speaking of breaking changes, PCRE2 seems to be stricter and rejects unknown escape sequences. There may be other breaking changes to the syntax and it should be noted in the release notes that VCL may fail to compile because of that. We should however be able to count on helpful error messages from libvcc. There are outstanding issues with this migration that need scrutiny. It focuses on source code migration and none of the RST documentation (like the aforementioned release notes) were updated to reflect the change. In particular, installation documentation. The other outstanding issues are related to correctness and performance: the ban code doesn't care about what matched, only whether a subject matched or not. PCRE2 insists on allocating a pcre2_match_data object to store the "ovector" where the match offsets are written. The solution for bans was to share a dummy pcre2_match_data, but it might be unsafe. This was done to avoid heap allocation in what is essentially a critical path. In VRE_exec() we have the same problem, we can't take the caller's ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse, it is no longer a vector of int so in order to preserve VRE_exec()'s signature a copy/conversion of the PCRE2_SIZE vector is needed in addition to the pcre2_match_data allocation. Unlike struct vre_limit that is primarily used by our parameter machinery, I decided to preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside of VRE. I am however not sure we can avoid the heap allocation that now occurs in VRE_exec() that can be executed from a critical path. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 16, 2021
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 16, 2021
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 16, 2021
A lot of complexity goes away but we technically break VCL. That was already the case when it comes to unknown escape sequences failing to compile, but now invalid escape sequences in a replacement string will also fail to match. The big bad breakage is that pcre2 allows back references to capture groups with the dollar sign in a replacement string, instead of the backslash character in the old homemade VRT_regsub(). It means that VCL referring to capture groups with sequences like "\0" or "\1" will now fail hard. What should happen in libvcc is a compile time check of a regsub when both the regex and replacement string arguments are constants, which should cover the majority of cases. With this Tadd() goes away with its bogus handling of negative lengths. On the VRE side, the sole consumer of VRE_NOTEMPTY is gone. The sole consumer of an ovector is also gone. This means that we can get rid of both. For the ovector, a VRE_match() function shall replace VRE_exec() and only concern itself with the result of a match similarly to how the ban code currently uses a dummy pcre2_match_data. Regarding VRE_NOTEMPTY, VRE_match() may take an options argument like VRE_exec() does today to keep the door open for supporting a match option in the future even though we don't need any now. Refs varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 16, 2021
With the introduction of VRE_sub() many VRE_exec() use cases went away, including the start offset. The subject string length is now optional and zero can be passed to signal that the subject string is terminated with a null character. Performing this migration, I noticed that the match could fail if there isn't enough space so the dummy match_data now has an arbitrary ovector size of 30 like it used to. Refs varnishcache#3559
Merged
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 28, 2021
Now that VRE is the only regular expression API we use, we can migrate its backend to pcre2. The existing 'pcre_*' parameters are also renamed to reflect this migration, and 'pcre_match_limit_recursion' gets special treatment and is renamed to pcre2_depth_limit. This creates an additional API breakage in VRE: the `match_recursion` field in `struct vre_limits` is also renamed. One last breakage is the removal of `VRE_has_jit` used by only one undocumented varnishtest feature, and the pcre_jit feature is only used by one test case that no longer fails. The pcre jit compilation feature was broken anyway: sealing it at compile time will not reflect what VRE actually links to. Once we have a test case needing the jit feature, we can introduce a better API for that check. There is one outstanding performance problem, the ovector that was previously allocated on the stack now needs to be allocated from the heap. It might be possible to implement a pcre2 context to fix that or maybe pool them, but for now we have heap allocations on the critical path. The VRE_sub() function makes sure to make a single ovector allocation (technically a pcre2_match_data allocation) since it's the only one guaranteed to loop on a single regular expression for the `regsuball()` use case. Closes varnishcache#3616 Closes varnishcache#3559
Merged
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jun 30, 2021
Now that VRE is the only regular expression API we use, we can migrate its backend to pcre2. The existing 'pcre_*' parameters are also renamed to reflect this migration, and 'pcre_match_limit_recursion' gets special treatment and is renamed to pcre2_depth_limit. This creates an additional API breakage in VRE: the `match_recursion` field in `struct vre_limits` is also renamed. One last breakage is the removal of `VRE_has_jit` used by only one undocumented varnishtest feature, and the pcre_jit feature is only used by one test case that no longer fails. The pcre jit compilation feature was broken anyway: sealing it at compile time will not reflect what VRE actually links to. Once we have a test case needing the jit feature, we can introduce a better API for that check. There is one outstanding performance problem, the ovector that was previously allocated on the stack now needs to be allocated from the heap. It might be possible to implement a pcre2 context to fix that or maybe pool them, but for now we have heap allocations on the critical path. The VRE_sub() function makes sure to make a single ovector allocation (technically a pcre2_match_data allocation) since it's the only one guaranteed to loop on a single regular expression for the `regsuball()` use case. Closes varnishcache#3616 Closes varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jul 5, 2021
Now that VRE is the only regular expression API we use, we can migrate its backend to pcre2. The existing 'pcre_*' parameters are also renamed to reflect this migration, and 'pcre_match_limit_recursion' gets special treatment and is renamed to pcre2_depth_limit. This creates an additional API breakage in VRE: the `match_recursion` field in `struct vre_limits` is also renamed. One last breakage is the removal of `VRE_has_jit` used by only one undocumented varnishtest feature, and the pcre_jit feature is only used by one test case that no longer fails. The pcre jit compilation feature was broken anyway: sealing it at compile time will not reflect what VRE actually links to. Once we have a test case needing the jit feature, we can introduce a better API for that check. There is one outstanding performance problem, the ovector that was previously allocated on the stack now needs to be allocated from the heap. It might be possible to implement a pcre2 context to fix that or maybe pool them, but for now we have heap allocations on the critical path. The VRE_sub() function makes sure to make a single ovector allocation (technically a pcre2_match_data allocation) since it's the only one guaranteed to loop on a single regular expression for the `regsuball()` use case. Closes varnishcache#3616 Closes varnishcache#3559
dridi
added a commit
to dridi/varnish-cache
that referenced
this issue
Jul 6, 2021
Now that VRE is the only regular expression API we use, we can migrate its backend to pcre2. The existing 'pcre_*' parameters are also renamed to reflect this migration, and 'pcre_match_limit_recursion' gets special treatment and is renamed to pcre2_depth_limit. This creates an additional API breakage in VRE: the `match_recursion` field in `struct vre_limits` is also renamed. One last breakage is the removal of `VRE_has_jit` used by only one undocumented varnishtest feature, and the pcre_jit feature is only used by one test case that no longer fails. The pcre jit compilation feature was broken anyway: sealing it at compile time will not reflect what VRE actually links to. Once we have a test case needing the jit feature, we can introduce a better API for that check. There is one outstanding performance problem, the ovector that was previously allocated on the stack now needs to be allocated from the heap. It might be possible to implement a pcre2 context to fix that or maybe pool them, but for now we have heap allocations on the critical path. The VRE_sub() function makes sure to make a single ovector allocation (technically a pcre2_match_data allocation) since it's the only one guaranteed to loop on a single regular expression for the `regsuball()` use case. On the documentation front, the SmartOS installation instructions are hidden for lack of a pcre2 package. Closes varnishcache#3616 Closes varnishcache#3559
dridi
added a commit
that referenced
this issue
Jul 6, 2021
Now that VRE is the only regular expression API we use, we can migrate its backend to pcre2. The existing 'pcre_*' parameters are also renamed to reflect this migration, and 'pcre_match_limit_recursion' gets special treatment and is renamed to pcre2_depth_limit. This creates an additional API breakage in VRE: the `match_recursion` field in `struct vre_limits` is also renamed. One last breakage is the removal of `VRE_has_jit` used by only one undocumented varnishtest feature, and the pcre_jit feature is only used by one test case that no longer fails. The pcre jit compilation feature was broken anyway: sealing it at compile time will not reflect what VRE actually links to. Once we have a test case needing the jit feature, we can introduce a better API for that check. There is one outstanding performance problem, the ovector that was previously allocated on the stack now needs to be allocated from the heap. It might be possible to implement a pcre2 context to fix that or maybe pool them, but for now we have heap allocations on the critical path. The VRE_sub() function makes sure to make a single ovector allocation (technically a pcre2_match_data allocation) since it's the only one guaranteed to loop on a single regular expression for the `regsuball()` use case. On the documentation front, the SmartOS installation instructions are hidden for lack of a pcre2 package. Closes #3616 Closes #3559
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Check that it's OK for all the platforms we care about.
The text was updated successfully, but these errors were encountered: