Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to pcre2 #3559

Closed
dridi opened this issue Mar 22, 2021 · 3 comments · Fixed by #3635
Closed

Move to pcre2 #3559

dridi opened this issue Mar 22, 2021 · 3 comments · Fixed by #3635

Comments

@dridi
Copy link
Member

dridi commented Mar 22, 2021

Check that it's OK for all the platforms we care about.

@ingvarha
Copy link
Contributor

dridi added a commit to dridi/varnish-cache that referenced this issue May 19, 2021
It was brittle because we would figure at build time whether pcre was
itself built with JIT compilation support, which could be different than
the pcre that would be used at run time. With pcre2 we wouldn't run into
the recursion limit and fail the match in r1576:

    ---- s1    EXPECT req.http.found (1) == "<undef>" failed

So we can also remove the undocumented `pcre_jit` feature check from
varnishtest and the single disabled test that was making use of it.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue May 19, 2021
Otherwise we get it from either libvarnish or libvarnishapi.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue May 19, 2021
It is interpreted as "T" by pcre but pcre2 is stricter and treats it as
a syntax error. While there may be a bit flag to ignore unknown escape
sequences, it is probably worth hardening instead.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue May 19, 2021
We might want to do something about it once pcre2 is in place, but it
should be noted that concurrent matches on a vre_t could compete over
the limits we can tweak.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue May 19, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the
source tree, performing the PCRE2 migration until it compiles, and
fixing migration mistakes until the test suite passes.

This migration could have been free of breaking changes if hadn't
renamed the following parameters:

- pcre_match_limit to pcre2_match_limit
- pcre_match_limit_recursion to pcre2_depth_limit

I decided to make it clear that moving to PCRE2 was not anecdotal and a
good-enough reason to break things. For this reason, in addition to the
VRE_has_jit earlier retirement I introduced one more breakage in struct
vre_limits by renaming the match_recursion field to depth.

Speaking of breaking changes, PCRE2 seems to be stricter and rejects
unknown escape sequences. There may be other breaking changes to the
syntax and it should be noted in the release notes that VCL may fail to
compile because of that. We should however be able to count on helpful
error messages from libvcc.

There are outstanding issues with this migration that need scrutiny. It
focuses on source code migration, and non of the RST documentation (like
the aforementioned release notes) were updated to reflect the change. In
particular, installation documentation.

The other outstanding issues are related to correctness and performance:
the ban code doesn't care about what matched, only whether a subject
matched or not. PCRE2 insists on allocating a pcre2_match_data object to
store the "ovector" where the match offsets are written. The solution
for bans was to share a dummy pcre2_match_data, but it might be unsafe.
This was done to avoid heap allocation in what is essentially a critical
path.

In VRE_exec() we have the same problem, we can't take the caller's
ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse,
it is no longer a vector of int so in order to preserve VRE_exec()'s
signature a copy/conversion of the PCRE2_SIZE vector is needed in
addition to the pcre2_match_data allocation. Unlike struct vre_limit
that is primarily used by our parameter machinery, I decided to
preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside
of VRE.

I am however not sure we can avoid the heap allocation that now occurs
in VRE_exec() that can be executed from a critical path.

Closes varnishcache#3559
@dridi
Copy link
Member Author

dridi commented May 19, 2021

#3616

@dridi
Copy link
Member Author

dridi commented May 19, 2021

Feedback from testing on Fedora is welcome.

dridi added a commit to dridi/varnish-cache that referenced this issue May 20, 2021
It is interpreted as "T" by pcre but pcre2 is stricter and treats it as
a syntax error. While there may be a bit flag to ignore unknown escape
sequences, it is probably worth hardening instead.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue May 20, 2021
We might want to do something about it once pcre2 is in place, but it
should be noted that concurrent matches on a vre_t could compete over
the limits we can tweak.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue May 20, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the
source tree, performing the PCRE2 migration until it compiles, and
fixing migration mistakes until the test suite passes.

This migration could have been free of breaking changes if hadn't
renamed the following parameters:

- pcre_match_limit to pcre2_match_limit
- pcre_match_limit_recursion to pcre2_depth_limit

I decided to make it clear that moving to PCRE2 was not anecdotal and a
good-enough reason to break things. For this reason, in addition to the
VRE_has_jit earlier retirement I introduced one more breakage in struct
vre_limits by renaming the match_recursion field to depth.

Speaking of breaking changes, PCRE2 seems to be stricter and rejects
unknown escape sequences. There may be other breaking changes to the
syntax and it should be noted in the release notes that VCL may fail to
compile because of that. We should however be able to count on helpful
error messages from libvcc.

There are outstanding issues with this migration that need scrutiny. It
focuses on source code migration, and non of the RST documentation (like
the aforementioned release notes) were updated to reflect the change. In
particular, installation documentation.

The other outstanding issues are related to correctness and performance:
the ban code doesn't care about what matched, only whether a subject
matched or not. PCRE2 insists on allocating a pcre2_match_data object to
store the "ovector" where the match offsets are written. The solution
for bans was to share a dummy pcre2_match_data, but it might be unsafe.
This was done to avoid heap allocation in what is essentially a critical
path.

In VRE_exec() we have the same problem, we can't take the caller's
ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse,
it is no longer a vector of int so in order to preserve VRE_exec()'s
signature a copy/conversion of the PCRE2_SIZE vector is needed in
addition to the pcre2_match_data allocation. Unlike struct vre_limit
that is primarily used by our parameter machinery, I decided to
preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside
of VRE.

I am however not sure we can avoid the heap allocation that now occurs
in VRE_exec() that can be executed from a critical path.

Closes varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue May 20, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the
source tree, performing the PCRE2 migration until it compiles, and
fixing migration mistakes until the test suite passes.

This migration could have been free of breaking changes if hadn't
renamed the following parameters:

- pcre_match_limit to pcre2_match_limit
- pcre_match_limit_recursion to pcre2_depth_limit

I decided to make it clear that moving to PCRE2 was not anecdotal and a
good-enough reason to break things. For this reason, in addition to the
VRE_has_jit earlier retirement I introduced one more breakage in struct
vre_limits by renaming the match_recursion field to depth.

Speaking of breaking changes, PCRE2 seems to be stricter and rejects
unknown escape sequences. There may be other breaking changes to the
syntax and it should be noted in the release notes that VCL may fail to
compile because of that. We should however be able to count on helpful
error messages from libvcc.

There are outstanding issues with this migration that need scrutiny. It
focuses on source code migration and none of the RST documentation (like
the aforementioned release notes) were updated to reflect the change. In
particular, installation documentation.

The other outstanding issues are related to correctness and performance:
the ban code doesn't care about what matched, only whether a subject
matched or not. PCRE2 insists on allocating a pcre2_match_data object to
store the "ovector" where the match offsets are written. The solution
for bans was to share a dummy pcre2_match_data, but it might be unsafe.
This was done to avoid heap allocation in what is essentially a critical
path.

In VRE_exec() we have the same problem, we can't take the caller's
ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse,
it is no longer a vector of int so in order to preserve VRE_exec()'s
signature a copy/conversion of the PCRE2_SIZE vector is needed in
addition to the pcre2_match_data allocation. Unlike struct vre_limit
that is primarily used by our parameter machinery, I decided to
preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside
of VRE.

I am however not sure we can avoid the heap allocation that now occurs
in VRE_exec() that can be executed from a critical path.

Closes varnishcache#3559
nigoroll pushed a commit that referenced this issue Jun 10, 2021
Otherwise we get it from either libvarnish or libvarnishapi.

Refs #3559
nigoroll pushed a commit that referenced this issue Jun 10, 2021
It is interpreted as "T" by pcre but pcre2 is stricter and treats it as
a syntax error. While there may be a bit flag to ignore unknown escape
sequences, it is probably worth hardening instead.

Refs #3559
nigoroll pushed a commit that referenced this issue Jun 10, 2021
We might want to do something about it once pcre2 is in place, but it
should be noted that concurrent matches on a vre_t could compete over
the limits we can tweak.

Refs #3559
nigoroll added a commit that referenced this issue Jun 10, 2021
... to make our SunOS vtest build happy again

ld: fatal: file /opt/local/lib/libpcre.so: wrong ELF class: ELFCLASS64
ld: fatal: file processing errors. No output written to vjsn_test
collect2: error: ld returned 1 exit status

The issue here was order of the -L -l arguments when a (32bit) version
of a library needs to be found first in an overridden LDPATH.

Ref 12bbe31
Ref #3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 10, 2021
It was brittle because we would figure at build time whether pcre was
itself built with JIT compilation support, which could be different than
the pcre that would be used at run time. With pcre2 we wouldn't run into
the recursion limit and fail the match in r1576:

    ---- s1    EXPECT req.http.found (1) == "<undef>" failed

So we can also remove the undocumented `pcre_jit` feature check from
varnishtest and the single disabled test that was making use of it.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 10, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the
source tree, performing the PCRE2 migration until it compiles, and
fixing migration mistakes until the test suite passes.

This migration could have been free of breaking changes if hadn't
renamed the following parameters:

- pcre_match_limit to pcre2_match_limit
- pcre_match_limit_recursion to pcre2_depth_limit

I decided to make it clear that moving to PCRE2 was not anecdotal and a
good-enough reason to break things. For this reason, in addition to the
VRE_has_jit earlier retirement I introduced one more breakage in struct
vre_limits by renaming the match_recursion field to depth.

Speaking of breaking changes, PCRE2 seems to be stricter and rejects
unknown escape sequences. There may be other breaking changes to the
syntax and it should be noted in the release notes that VCL may fail to
compile because of that. We should however be able to count on helpful
error messages from libvcc.

There are outstanding issues with this migration that need scrutiny. It
focuses on source code migration and none of the RST documentation (like
the aforementioned release notes) were updated to reflect the change. In
particular, installation documentation.

The other outstanding issues are related to correctness and performance:
the ban code doesn't care about what matched, only whether a subject
matched or not. PCRE2 insists on allocating a pcre2_match_data object to
store the "ovector" where the match offsets are written. The solution
for bans was to share a dummy pcre2_match_data, but it might be unsafe.
This was done to avoid heap allocation in what is essentially a critical
path.

In VRE_exec() we have the same problem, we can't take the caller's
ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse,
it is no longer a vector of int so in order to preserve VRE_exec()'s
signature a copy/conversion of the PCRE2_SIZE vector is needed in
addition to the pcre2_match_data allocation. Unlike struct vre_limit
that is primarily used by our parameter machinery, I decided to
preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside
of VRE.

I am however not sure we can avoid the heap allocation that now occurs
in VRE_exec() that can be executed from a critical path.

Closes varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 10, 2021
It was brittle because we would figure at build time whether pcre was
itself built with JIT compilation support, which could be different than
the pcre that would be used at run time. With pcre2 we wouldn't run into
the recursion limit and fail the match in r1576:

    ---- s1    EXPECT req.http.found (1) == "<undef>" failed

So we can also remove the undocumented `pcre_jit` feature check from
varnishtest and the single disabled test that was making use of it.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 10, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the
source tree, performing the PCRE2 migration until it compiles, and
fixing migration mistakes until the test suite passes.

This migration could have been free of breaking changes if hadn't
renamed the following parameters:

- pcre_match_limit to pcre2_match_limit
- pcre_match_limit_recursion to pcre2_depth_limit

I decided to make it clear that moving to PCRE2 was not anecdotal and a
good-enough reason to break things. For this reason, in addition to the
VRE_has_jit earlier retirement I introduced one more breakage in struct
vre_limits by renaming the match_recursion field to depth.

Speaking of breaking changes, PCRE2 seems to be stricter and rejects
unknown escape sequences. There may be other breaking changes to the
syntax and it should be noted in the release notes that VCL may fail to
compile because of that. We should however be able to count on helpful
error messages from libvcc.

There are outstanding issues with this migration that need scrutiny. It
focuses on source code migration and none of the RST documentation (like
the aforementioned release notes) were updated to reflect the change. In
particular, installation documentation.

The other outstanding issues are related to correctness and performance:
the ban code doesn't care about what matched, only whether a subject
matched or not. PCRE2 insists on allocating a pcre2_match_data object to
store the "ovector" where the match offsets are written. The solution
for bans was to share a dummy pcre2_match_data, but it might be unsafe.
This was done to avoid heap allocation in what is essentially a critical
path.

In VRE_exec() we have the same problem, we can't take the caller's
ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse,
it is no longer a vector of int so in order to preserve VRE_exec()'s
signature a copy/conversion of the PCRE2_SIZE vector is needed in
addition to the pcre2_match_data allocation. Unlike struct vre_limit
that is primarily used by our parameter machinery, I decided to
preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside
of VRE.

I am however not sure we can avoid the heap allocation that now occurs
in VRE_exec() that can be executed from a critical path.

Closes varnishcache#3559
nigoroll pushed a commit to dridi/varnish-cache that referenced this issue Jun 10, 2021
It was brittle because we would figure at build time whether pcre was
itself built with JIT compilation support, which could be different than
the pcre that would be used at run time. With pcre2 we wouldn't run into
the recursion limit and fail the match in r1576:

    ---- s1    EXPECT req.http.found (1) == "<undef>" failed

So we can also remove the undocumented `pcre_jit` feature check from
varnishtest and the single disabled test that was making use of it.

Refs varnishcache#3559
nigoroll pushed a commit to dridi/varnish-cache that referenced this issue Jun 10, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the
source tree, performing the PCRE2 migration until it compiles, and
fixing migration mistakes until the test suite passes.

This migration could have been free of breaking changes if hadn't
renamed the following parameters:

- pcre_match_limit to pcre2_match_limit
- pcre_match_limit_recursion to pcre2_depth_limit

I decided to make it clear that moving to PCRE2 was not anecdotal and a
good-enough reason to break things. For this reason, in addition to the
VRE_has_jit earlier retirement I introduced one more breakage in struct
vre_limits by renaming the match_recursion field to depth.

Speaking of breaking changes, PCRE2 seems to be stricter and rejects
unknown escape sequences. There may be other breaking changes to the
syntax and it should be noted in the release notes that VCL may fail to
compile because of that. We should however be able to count on helpful
error messages from libvcc.

There are outstanding issues with this migration that need scrutiny. It
focuses on source code migration and none of the RST documentation (like
the aforementioned release notes) were updated to reflect the change. In
particular, installation documentation.

The other outstanding issues are related to correctness and performance:
the ban code doesn't care about what matched, only whether a subject
matched or not. PCRE2 insists on allocating a pcre2_match_data object to
store the "ovector" where the match offsets are written. The solution
for bans was to share a dummy pcre2_match_data, but it might be unsafe.
This was done to avoid heap allocation in what is essentially a critical
path.

In VRE_exec() we have the same problem, we can't take the caller's
ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse,
it is no longer a vector of int so in order to preserve VRE_exec()'s
signature a copy/conversion of the PCRE2_SIZE vector is needed in
addition to the pcre2_match_data allocation. Unlike struct vre_limit
that is primarily used by our parameter machinery, I decided to
preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside
of VRE.

I am however not sure we can avoid the heap allocation that now occurs
in VRE_exec() that can be executed from a critical path.

Closes varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 14, 2021
It was brittle because we would figure at build time whether pcre was
itself built with JIT compilation support, which could be different than
the pcre that would be used at run time. With pcre2 we wouldn't run into
the recursion limit and fail the match in r1576:

    ---- s1    EXPECT req.http.found (1) == "<undef>" failed

So we can also remove the undocumented `pcre_jit` feature check from
varnishtest and the single disabled test that was making use of it.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 14, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the
source tree, performing the PCRE2 migration until it compiles, and
fixing migration mistakes until the test suite passes.

This migration could have been free of breaking changes if hadn't
renamed the following parameters:

- pcre_match_limit to pcre2_match_limit
- pcre_match_limit_recursion to pcre2_depth_limit

I decided to make it clear that moving to PCRE2 was not anecdotal and a
good-enough reason to break things. For this reason, in addition to the
VRE_has_jit earlier retirement I introduced one more breakage in struct
vre_limits by renaming the match_recursion field to depth.

Speaking of breaking changes, PCRE2 seems to be stricter and rejects
unknown escape sequences. There may be other breaking changes to the
syntax and it should be noted in the release notes that VCL may fail to
compile because of that. We should however be able to count on helpful
error messages from libvcc.

There are outstanding issues with this migration that need scrutiny. It
focuses on source code migration and none of the RST documentation (like
the aforementioned release notes) were updated to reflect the change. In
particular, installation documentation.

The other outstanding issues are related to correctness and performance:
the ban code doesn't care about what matched, only whether a subject
matched or not. PCRE2 insists on allocating a pcre2_match_data object to
store the "ovector" where the match offsets are written. The solution
for bans was to share a dummy pcre2_match_data, but it might be unsafe.
This was done to avoid heap allocation in what is essentially a critical
path.

In VRE_exec() we have the same problem, we can't take the caller's
ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse,
it is no longer a vector of int so in order to preserve VRE_exec()'s
signature a copy/conversion of the PCRE2_SIZE vector is needed in
addition to the pcre2_match_data allocation. Unlike struct vre_limit
that is primarily used by our parameter machinery, I decided to
preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside
of VRE.

I am however not sure we can avoid the heap allocation that now occurs
in VRE_exec() that can be executed from a critical path.

Closes varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 15, 2021
It was brittle because we would figure at build time whether pcre was
itself built with JIT compilation support, which could be different than
the pcre that would be used at run time. With pcre2 we wouldn't run into
the recursion limit and fail the match in r1576:

    ---- s1    EXPECT req.http.found (1) == "<undef>" failed

So we can also remove the undocumented `pcre_jit` feature check from
varnishtest and the single disabled test that was making use of it.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 15, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the
source tree, performing the PCRE2 migration until it compiles, and
fixing migration mistakes until the test suite passes.

This migration could have been free of breaking changes if hadn't
renamed the following parameters:

- pcre_match_limit to pcre2_match_limit
- pcre_match_limit_recursion to pcre2_depth_limit

I decided to make it clear that moving to PCRE2 was not anecdotal and a
good-enough reason to break things. For this reason, in addition to the
VRE_has_jit earlier retirement I introduced one more breakage in struct
vre_limits by renaming the match_recursion field to depth.

Speaking of breaking changes, PCRE2 seems to be stricter and rejects
unknown escape sequences. There may be other breaking changes to the
syntax and it should be noted in the release notes that VCL may fail to
compile because of that. We should however be able to count on helpful
error messages from libvcc.

There are outstanding issues with this migration that need scrutiny. It
focuses on source code migration and none of the RST documentation (like
the aforementioned release notes) were updated to reflect the change. In
particular, installation documentation.

The other outstanding issues are related to correctness and performance:
the ban code doesn't care about what matched, only whether a subject
matched or not. PCRE2 insists on allocating a pcre2_match_data object to
store the "ovector" where the match offsets are written. The solution
for bans was to share a dummy pcre2_match_data, but it might be unsafe.
This was done to avoid heap allocation in what is essentially a critical
path.

In VRE_exec() we have the same problem, we can't take the caller's
ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse,
it is no longer a vector of int so in order to preserve VRE_exec()'s
signature a copy/conversion of the PCRE2_SIZE vector is needed in
addition to the pcre2_match_data allocation. Unlike struct vre_limit
that is primarily used by our parameter machinery, I decided to
preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside
of VRE.

I am however not sure we can avoid the heap allocation that now occurs
in VRE_exec() that can be executed from a critical path.

Closes varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 16, 2021
It was brittle because we would figure at build time whether pcre was
itself built with JIT compilation support, which could be different than
the pcre that would be used at run time. With pcre2 we wouldn't run into
the recursion limit and fail the match in r1576:

    ---- s1    EXPECT req.http.found (1) == "<undef>" failed

So we can also remove the undocumented `pcre_jit` feature check from
varnishtest and the single disabled test that was making use of it.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 16, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the
source tree, performing the PCRE2 migration until it compiles, and
fixing migration mistakes until the test suite passes.

This migration could have been free of breaking changes if I hadn't
renamed the following parameters:

- pcre_match_limit to pcre2_match_limit
- pcre_match_limit_recursion to pcre2_depth_limit

I decided to make it clear that moving to PCRE2 was not anecdotal and a
good-enough reason to break things. For this reason, in addition to the
VRE_has_jit earlier retirement I introduced one more breakage in struct
vre_limits by renaming the match_recursion field to depth.

Speaking of breaking changes, PCRE2 seems to be stricter and rejects
unknown escape sequences. There may be other breaking changes to the
syntax and it should be noted in the release notes that VCL may fail to
compile because of that. We should however be able to count on helpful
error messages from libvcc.

There are outstanding issues with this migration that need scrutiny. It
focuses on source code migration and none of the RST documentation (like
the aforementioned release notes) were updated to reflect the change. In
particular, installation documentation.

The other outstanding issues are related to correctness and performance:
the ban code doesn't care about what matched, only whether a subject
matched or not. PCRE2 insists on allocating a pcre2_match_data object to
store the "ovector" where the match offsets are written. The solution
for bans was to share a dummy pcre2_match_data, but it might be unsafe.
This was done to avoid heap allocation in what is essentially a critical
path.

In VRE_exec() we have the same problem, we can't take the caller's
ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse,
it is no longer a vector of int so in order to preserve VRE_exec()'s
signature a copy/conversion of the PCRE2_SIZE vector is needed in
addition to the pcre2_match_data allocation. Unlike struct vre_limit
that is primarily used by our parameter machinery, I decided to
preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside
of VRE.

I am however not sure we can avoid the heap allocation that now occurs
in VRE_exec() that can be executed from a critical path.

Closes varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 16, 2021
The process can be summarized as running `sed 's/pcre/\02/gi'` in the
source tree, performing the PCRE2 migration until it compiles, and
fixing migration mistakes until the test suite passes.

This migration could have been free of breaking changes if I hadn't
renamed the following parameters:

- pcre_match_limit to pcre2_match_limit
- pcre_match_limit_recursion to pcre2_depth_limit

I decided to make it clear that moving to PCRE2 was not anecdotal and a
good-enough reason to break things. For this reason, in addition to the
VRE_has_jit earlier retirement I introduced one more breakage in struct
vre_limits by renaming the match_recursion field to depth.

Speaking of breaking changes, PCRE2 seems to be stricter and rejects
unknown escape sequences. There may be other breaking changes to the
syntax and it should be noted in the release notes that VCL may fail to
compile because of that. We should however be able to count on helpful
error messages from libvcc.

There are outstanding issues with this migration that need scrutiny. It
focuses on source code migration and none of the RST documentation (like
the aforementioned release notes) were updated to reflect the change. In
particular, installation documentation.

The other outstanding issues are related to correctness and performance:
the ban code doesn't care about what matched, only whether a subject
matched or not. PCRE2 insists on allocating a pcre2_match_data object to
store the "ovector" where the match offsets are written. The solution
for bans was to share a dummy pcre2_match_data, but it might be unsafe.
This was done to avoid heap allocation in what is essentially a critical
path.

In VRE_exec() we have the same problem, we can't take the caller's
ovector and we need to pass a pcre2_match_data to pcre2_match(). Worse,
it is no longer a vector of int so in order to preserve VRE_exec()'s
signature a copy/conversion of the PCRE2_SIZE vector is needed in
addition to the pcre2_match_data allocation. Unlike struct vre_limit
that is primarily used by our parameter machinery, I decided to
preserve VRE_exec()'s API and avoid leaking the PCRE2_SIZE type outside
of VRE.

I am however not sure we can avoid the heap allocation that now occurs
in VRE_exec() that can be executed from a critical path.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 16, 2021
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 16, 2021
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 16, 2021
A lot of complexity goes away but we technically break VCL. That was
already the case when it comes to unknown escape sequences failing to
compile, but now invalid escape sequences in a replacement string will
also fail to match.

The big bad breakage is that pcre2 allows back references to capture
groups with the dollar sign in a replacement string, instead of the
backslash character in the old homemade VRT_regsub(). It means that VCL
referring to capture groups with sequences like "\0" or "\1" will now
fail hard.

What should happen in libvcc is a compile time check of a regsub when
both the regex and replacement string arguments are constants, which
should cover the majority of cases.

With this Tadd() goes away with its bogus handling of negative lengths.

On the VRE side, the sole consumer of VRE_NOTEMPTY is gone. The sole
consumer of an ovector is also gone. This means that we can get rid of
both.

For the ovector, a VRE_match() function shall replace VRE_exec() and
only concern itself with the result of a match similarly to how the ban
code currently uses a dummy pcre2_match_data.

Regarding VRE_NOTEMPTY, VRE_match() may take an options argument like
VRE_exec() does today to keep the door open for supporting a match
option in the future even though we don't need any now.

Refs varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 16, 2021
With the introduction of VRE_sub() many VRE_exec() use cases went away,
including the start offset. The subject string length is now optional
and zero can be passed to signal that the subject string is terminated
with a null character.

Performing this migration, I noticed that the match could fail if there
isn't enough space so the dummy match_data now has an arbitrary ovector
size of 30 like it used to.

Refs varnishcache#3559
@dridi dridi mentioned this issue Jun 19, 2021
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 28, 2021
Now that VRE is the only regular expression API we use, we can migrate
its backend to pcre2. The existing 'pcre_*' parameters are also renamed
to reflect this migration, and 'pcre_match_limit_recursion' gets special
treatment and is renamed to pcre2_depth_limit.

This creates an additional API breakage in VRE: the `match_recursion`
field in `struct vre_limits` is also renamed. One last breakage is the
removal of `VRE_has_jit` used by only one undocumented varnishtest
feature, and the pcre_jit feature is only used by one test case that no
longer fails.

The pcre jit compilation feature was broken anyway: sealing it at
compile time will not reflect what VRE actually links to. Once we have
a test case needing the jit feature, we can introduce a better API for
that check.

There is one outstanding performance problem, the ovector that was
previously allocated on the stack now needs to be allocated from the
heap. It might be possible to implement a pcre2 context to fix that or
maybe pool them, but for now we have heap allocations on the critical
path. The VRE_sub() function makes sure to make a single ovector
allocation (technically a pcre2_match_data allocation) since it's the
only one guaranteed to loop on a single regular expression for the
`regsuball()` use case.

Closes varnishcache#3616
Closes varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jun 30, 2021
Now that VRE is the only regular expression API we use, we can migrate
its backend to pcre2. The existing 'pcre_*' parameters are also renamed
to reflect this migration, and 'pcre_match_limit_recursion' gets special
treatment and is renamed to pcre2_depth_limit.

This creates an additional API breakage in VRE: the `match_recursion`
field in `struct vre_limits` is also renamed. One last breakage is the
removal of `VRE_has_jit` used by only one undocumented varnishtest
feature, and the pcre_jit feature is only used by one test case that no
longer fails.

The pcre jit compilation feature was broken anyway: sealing it at
compile time will not reflect what VRE actually links to. Once we have
a test case needing the jit feature, we can introduce a better API for
that check.

There is one outstanding performance problem, the ovector that was
previously allocated on the stack now needs to be allocated from the
heap. It might be possible to implement a pcre2 context to fix that or
maybe pool them, but for now we have heap allocations on the critical
path. The VRE_sub() function makes sure to make a single ovector
allocation (technically a pcre2_match_data allocation) since it's the
only one guaranteed to loop on a single regular expression for the
`regsuball()` use case.

Closes varnishcache#3616
Closes varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jul 5, 2021
Now that VRE is the only regular expression API we use, we can migrate
its backend to pcre2. The existing 'pcre_*' parameters are also renamed
to reflect this migration, and 'pcre_match_limit_recursion' gets special
treatment and is renamed to pcre2_depth_limit.

This creates an additional API breakage in VRE: the `match_recursion`
field in `struct vre_limits` is also renamed. One last breakage is the
removal of `VRE_has_jit` used by only one undocumented varnishtest
feature, and the pcre_jit feature is only used by one test case that no
longer fails.

The pcre jit compilation feature was broken anyway: sealing it at
compile time will not reflect what VRE actually links to. Once we have
a test case needing the jit feature, we can introduce a better API for
that check.

There is one outstanding performance problem, the ovector that was
previously allocated on the stack now needs to be allocated from the
heap. It might be possible to implement a pcre2 context to fix that or
maybe pool them, but for now we have heap allocations on the critical
path. The VRE_sub() function makes sure to make a single ovector
allocation (technically a pcre2_match_data allocation) since it's the
only one guaranteed to loop on a single regular expression for the
`regsuball()` use case.

Closes varnishcache#3616
Closes varnishcache#3559
dridi added a commit to dridi/varnish-cache that referenced this issue Jul 6, 2021
Now that VRE is the only regular expression API we use, we can migrate
its backend to pcre2. The existing 'pcre_*' parameters are also renamed
to reflect this migration, and 'pcre_match_limit_recursion' gets special
treatment and is renamed to pcre2_depth_limit.

This creates an additional API breakage in VRE: the `match_recursion`
field in `struct vre_limits` is also renamed. One last breakage is the
removal of `VRE_has_jit` used by only one undocumented varnishtest
feature, and the pcre_jit feature is only used by one test case that no
longer fails.

The pcre jit compilation feature was broken anyway: sealing it at
compile time will not reflect what VRE actually links to. Once we have
a test case needing the jit feature, we can introduce a better API for
that check.

There is one outstanding performance problem, the ovector that was
previously allocated on the stack now needs to be allocated from the
heap. It might be possible to implement a pcre2 context to fix that or
maybe pool them, but for now we have heap allocations on the critical
path. The VRE_sub() function makes sure to make a single ovector
allocation (technically a pcre2_match_data allocation) since it's the
only one guaranteed to loop on a single regular expression for the
`regsuball()` use case.

On the documentation front, the SmartOS installation instructions are
hidden for lack of a pcre2 package.

Closes varnishcache#3616
Closes varnishcache#3559
dridi added a commit that referenced this issue Jul 6, 2021
Now that VRE is the only regular expression API we use, we can migrate
its backend to pcre2. The existing 'pcre_*' parameters are also renamed
to reflect this migration, and 'pcre_match_limit_recursion' gets special
treatment and is renamed to pcre2_depth_limit.

This creates an additional API breakage in VRE: the `match_recursion`
field in `struct vre_limits` is also renamed. One last breakage is the
removal of `VRE_has_jit` used by only one undocumented varnishtest
feature, and the pcre_jit feature is only used by one test case that no
longer fails.

The pcre jit compilation feature was broken anyway: sealing it at
compile time will not reflect what VRE actually links to. Once we have
a test case needing the jit feature, we can introduce a better API for
that check.

There is one outstanding performance problem, the ovector that was
previously allocated on the stack now needs to be allocated from the
heap. It might be possible to implement a pcre2 context to fix that or
maybe pool them, but for now we have heap allocations on the critical
path. The VRE_sub() function makes sure to make a single ovector
allocation (technically a pcre2_match_data allocation) since it's the
only one guaranteed to loop on a single regular expression for the
`regsuball()` use case.

On the documentation front, the SmartOS installation instructions are
hidden for lack of a pcre2 package.

Closes #3616
Closes #3559
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants