WS_Reset() and overflown workspaces #3194

nigoroll · 2020-01-20T13:29:24Z

noticed here: b12af68

We use workspace overflows to signal to bail out for example after a failing VRT_SetHdr(). This is a guarantee that if some serious issue occurred during processing, we rather send an error downstream than an incomplete response or the result of incomplete processing.
We use the WS_Snapshot() ... WS_Reset() pattern as some kind of second order workspace allocation where the called code itself uses WS_Reserve().
This combination is problematic because WS_Reset() calls ws_ClearOverflow(ws), potentially clearing the overflow bit from a previous relevant failure.

I think we should simply forbid WS_Snapshot() on overflown workspaces, mandating prior calls of WS_Overflowed() for the above mentioned pattern.

Feedback welcome

The text was updated successfully, but these errors were encountered:

nigoroll · 2020-01-20T13:43:50Z

this patch to c00071.vtc demos the issue. Notice that the vtc already failed in the wrong place (while pushing processors rather than in the regsub()), which remained unnoticed.

diff --git a/bin/varnishtest/tests/c00071.vtc b/bin/varnishtest/tests/c00071.vtc
index 7e3c156ee..d4c17de1b 100644
--- a/bin/varnishtest/tests/c00071.vtc
+++ b/bin/varnishtest/tests/c00071.vtc
@@ -13,6 +13,7 @@ server s1 {
 
 varnish v1 -vcl+backend {
        import vtc;
+       import std;
        sub vcl_deliver {
                vtc.workspace_alloc(client, -192);
 
@@ -21,7 +22,8 @@ varnish v1 -vcl+backend {
                        vtc.workspace_alloc(client, -10);
                }
                else if (req.url ~ "/baz") {
-                       set resp.http.x-foo = regsub(req.url, "baz", "baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaz");
+                       set resp.http.x-foo = regsub(req.url, "baz", "baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaz");
+                       std.log("dummy");
                }
                set resp.http.x-of = vtc.workspace_overflowed(client);
        }

result:

**** c2   http[ 0] |HTTP/1.1
**** c2   http[ 1] |200
**** c2   http[ 2] |OK
**** c2   http[ 3] |Content-Length: 0
**** c2   http[ 4] |Date: Mon, 20 Jan 2020 13:41:33 GMT
**** c2   http[ 5] |X-Varnish: 1006
**** c2   http[ 6] |Age: 0
**** c2   http[ 7] |Via: 1.1 varnish (Varnish/6.3)
**** c2   http[ 8] |x-of: false
**** c2   http[ 9] |Accept-Ranges: bytes
**** c2   http[10] |Connection: keep-alive
**** c2   bodylen = 0
**   c2   === expect resp.status == 500
---- c2   EXPECT resp.status (200) == "500" failed

nigoroll · 2020-01-20T17:50:42Z

FTR, I just noticed that we already had a similar issue in #2645

Since 8baf4a6 we lost information about an overflowed workspace by calling std.log(), which resulted from use of the workspace for construction of a contiguous string from the argument constituents. Since then, we have changed the interface to STRANDS, but this issue remained. We now solve the case for real by pushing the string concatenation down to VSL: New versions of VSL and VSLb (coded by example of VSLv() and VSLbt()) take a strands argument and create a log record without additional copy overhead. These solve varnishcache#3194 for std.log(), make logging more efficient and, in particular, allow use of std.log() in low workspace conditions (because they do not require any). Also improve test coverage for std.log() Ref varnishcache#3194

We add checks for cases where a WS_Reset() could reset the overflowed workspace marker to ensure that the workspace is not already overflowed before using it. See also varnishcache#3196 for std.log() This concludes the fix for varnishcache#3194 together with the above (except for one h2 case, maybe)

Since 8baf4a6 we lost information about an overflowed workspace by calling std.log(), which resulted from use of the workspace for construction of a contiguous string from the argument constituents. Since then, we have changed the interface to STRANDS, but this issue remained. We now solve the case for real by pushing the string concatenation down to VSL: New versions of VSL and VSLb (coded by example of VSLv() and VSLbt()) take a strands argument and create a log record without additional copy overhead. These solve varnishcache#3194 for std.log(), make logging more efficient and, in particular, allow use of std.log() in low workspace conditions (because they do not require any). Also improve test coverage for std.log() Ref varnishcache#3194

It now bails out due to a workspace overflow which it acually did trigger all the time, but that remained unnoticed due to varnishcache#3194 The test to multiply the cookie header 128 times (8x in the first regex, 16x in the second) was not realistic even for "modern times", so I do not think we should have workspaces sized by such an example.

Since 8baf4a6 we lost information about an overflowed workspace by calling std.log(), which resulted from use of the workspace for construction of a contiguous string from the argument constituents. Since then, we have changed the interface to STRANDS, but this issue remained. We now solve the case for real by pushing the string concatenation down to VSL: New versions of VSL and VSLb (coded by example of VSLv() and VSLbt()) take a strands argument and create a log record without additional copy overhead. These solve varnishcache#3194 for std.log(), make logging more efficient and, in particular, allow use of std.log() in low workspace conditions (because they do not require any). Also improve test coverage for std.log() Ref varnishcache#3194

It now bails out due to a workspace overflow which it acually did trigger all the time, but that remained unnoticed due to varnishcache#3194 The test to multiply the cookie header 128 times (8x in the first regex, 16x in the second) was not realistic even for "modern times", so I do not think we should have workspaces sized by such an example.

We add checks for cases where a WS_Reset() could reset the overflowed workspace marker to ensure that the workspace is not already overflowed before using it. See also varnishcache#3196 for std.log() This concludes the fix for varnishcache#3194 together with the above (except for one h2 case, maybe)

Ref #3194

We use workspace overflows to signal to bail out for example after a failing `VRT_SetHdr()`. This is a guarantee that if some serious issue occurred during processing, we rather send an error downstream than an incomplete response or the result of incomplete processing. We use the `WS_Snapshot() ... WS_Reset()` pattern as some kind of second order workspace allocation where the called code itself uses `WS_Reserve()`. With this usage patter, `WS_Reset()` called `ws_ClearOverflow(ws)`, potentially clearing the overflow bit from a previous relevant failure. `WS_Snapshot()` now returns a magic value which gets recognized by `WS_Reset()` to ensure that the overflowed marker gets preserved or even restored. In this case, the workspace does not actually get reset. Fixes varnishcache#3194

We use workspace overflows to signal to bail out for example after a failing `VRT_SetHdr()`. This is a guarantee that if some serious issue occurred during processing, we rather send an error downstream than an incomplete response or the result of incomplete processing. We use the `WS_Snapshot() ... WS_Reset()` pattern as some kind of second order workspace allocation where the called code itself uses `WS_Reserve()`. With this usage pattern, `WS_Reset()` called `ws_ClearOverflow(ws)`, potentially clearing the overflow bit from a previous relevant failure. `WS_Snapshot()` now returns a magic value which gets recognized by `WS_Reset()` to ensure that the overflowed marker gets preserved or even restored. In this case, the workspace does not actually get reset. Fixes varnishcache#3194

We use workspace overflows to signal to bail out for example after a failing `VRT_SetHdr()`. This is a guarantee that if some serious issue occurred during processing, we rather send an error downstream than an incomplete response or the result of incomplete processing. We use the `WS_Snapshot() ... WS_Reset()` pattern as some kind of second order workspace allocation where the called code itself uses `WS_Reserve()`. With this usage pattern, `WS_Reset()` called `ws_ClearOverflow(ws)`, potentially clearing the overflow bit from a previous relevant failure. `WS_Snapshot()` now returns a magic value which gets recognized by `WS_Reset()` to ensure that the overflowed marker is still present. In this case, the workspace does not actually get reset. Fixes varnishcache#3194

We add checks for cases where a WS_Reset() could reset the overflowed workspace marker to ensure that the workspace is not already overflowed before using it. See also varnishcache#3196 for std.log() This concludes the fix for varnishcache#3194 together with the above (except for one h2 case, maybe)

We use workspace overflows to signal to bail out for example after a failing `VRT_SetHdr()`. This is a guarantee that if some serious issue occurred during processing, we rather send an error downstream than an incomplete response or the result of incomplete processing. We use the `WS_Snapshot() ... WS_Reset()` pattern as some kind of second order workspace allocation where the called code itself uses `WS_Reserve()`. With this usage pattern, `WS_Reset()` called `ws_ClearOverflow(ws)`, potentially clearing the overflow bit from a previous relevant failure. `WS_Snapshot()` now returns a magic value which gets recognized by `WS_Reset()` to ensure that the overflowed marker is still present. In this case, the workspace does not actually get reset. Fixes varnishcache#3194

We add checks for cases where a WS_Reset() could reset the overflowed workspace marker to ensure that the workspace is not already overflowed before using it. See also varnishcache#3196 for std.log() This concludes the fix for varnishcache#3194 together with the above (except for one h2 case, maybe)

We use workspace overflows to signal to bail out for example after a failing `VRT_SetHdr()`. This is a guarantee that if some serious issue occurred during processing, we rather send an error downstream than an incomplete response or the result of incomplete processing. We use the `WS_Snapshot() ... WS_Reset()` pattern as some kind of second order workspace allocation where the called code itself uses `WS_Reserve()`. With this usage pattern, `WS_Reset()` called `ws_ClearOverflow(ws)`, potentially clearing the overflow bit from a previous relevant failure. `WS_Snapshot()` now returns a magic value which gets recognized by `WS_Reset()` to ensure that the overflowed marker is still present. In this case, the workspace does not actually get reset. Fixes varnishcache#3194

This is an improvement over the second last commit (e.g. ed970de) based on feedback to varnishcache#3202: We now avoid any other unintended clears of the overflow bit by splitting two functions: * WS_Rollback() is now what WS_Reset() used to be: It clears overflows and accepts the zero cookie for a reset-to-start It is only intended for use within varnishd and is thus declared in cache_varnishd.h * WS_Reset() does not touch the overflow bit any longer, ensuring that a once-overflowed workspace stays overflowed With this in place, the magic snap_overflowed cookie would not be needed any more, but it still serves two good purposes: - better debugging and - a safety measure against passing a cookie from an already overflowed workspace to WS_Rollback() Fixes varnishcache#3194

We use workspace overflows to signal to bail out for example after a failing `VRT_SetHdr()`. This is a guarantee that if some serious issue occurred during processing, we rather send an error downstream than an incomplete response or the result of incomplete processing. We use the `WS_Snapshot() ... WS_Reset()` pattern as some kind of second order workspace allocation where the called code itself uses `WS_Reserve()`. With this usage pattern, `WS_Reset()` called `ws_ClearOverflow(ws)`, potentially clearing the overflow bit from a previous relevant failure. We now avoid any other unintended clears of the overflow bit by splitting two functions: * WS_Rollback() is now what WS_Reset() used to be: It clears overflows and accepts the zero cookie for a reset-to-start It is only intended for use within varnishd and is thus declared in cache_varnishd.h * WS_Reset() does not touch the overflow bit any longer, ensuring that a once-overflowed workspace stays overflowed `WS_Snapshot()` now returns a magic value which gets recognized by `WS_Reset()` to ensure that the overflowed marker is still present. This serves two purposes: - better debugging and - a safety measure against passing a cookie from an already overflowed workspace to WS_Rollback() Fixes varnishcache#3194

We use workspace overflows to signal to bail out for example after a failing `VRT_SetHdr()`. This is a guarantee that if some serious issue occurred during processing, we rather send an error downstream than an incomplete response or the result of incomplete processing. We use the `WS_Snapshot() ... WS_Reset()` pattern as some kind of second order workspace allocation where the called code itself uses `WS_Reserve()`. With this usage pattern, `WS_Reset()` called `ws_ClearOverflow(ws)`, potentially clearing the overflow bit from a previous relevant failure. We now avoid any other unintended clears of the overflow bit by splitting two functions: * WS_Rollback() is now what WS_Reset() used to be: It clears overflows and accepts the zero cookie for a reset-to-start It is only intended for use within varnishd and is thus declared in cache_varnishd.h * WS_Reset() does not touch the overflow bit any longer, ensuring that a once-overflowed workspace stays overflowed `WS_Snapshot()` now returns a magic value which gets recognized by `WS_Reset()` to ensure that the overflowed marker is still present. This serves two purposes: - better debugging and - a safety measure against passing a cookie from an already overflowed workspace to WS_Rollback() Fixes #3194

nigoroll added the a=feedback please label Jan 20, 2020

nigoroll mentioned this issue Jan 20, 2020

Add a strands interface to VSL and use it for std.log #3196

Merged

nigoroll added b=bug c=varnishd r=6.0 r=6.1 r=trunk and removed a=feedback please labels Jan 20, 2020

nigoroll mentioned this issue Jan 20, 2020

address most other overflow vs. WS_Reset() cases #3197

Closed

nigoroll added the a=has_PR label Jan 20, 2020

bsdphk added a=Bugwash Today and removed a=Bugwash Today labels Jan 27, 2020

nigoroll added a commit that referenced this issue Jan 27, 2020

Improve workspace overflow test

b9c5cb5

Ref #3194

nigoroll mentioned this issue Jan 27, 2020

An overflowed workspace must remain overflowed after WS_Reset() #3202

Merged

nigoroll removed the a=has_PR label Feb 6, 2020

nigoroll self-assigned this Feb 13, 2020

dridi closed this as completed in #3202 Mar 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WS_Reset() and overflown workspaces #3194

WS_Reset() and overflown workspaces #3194

nigoroll commented Jan 20, 2020

nigoroll commented Jan 20, 2020

nigoroll commented Jan 20, 2020

WS_Reset() and overflown workspaces #3194

WS_Reset() and overflown workspaces #3194

Comments

nigoroll commented Jan 20, 2020

nigoroll commented Jan 20, 2020

nigoroll commented Jan 20, 2020