-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guard against OA_GZIPBITS streaming race #2904
Conversation
nice catch, @rezan here's the vtc without the annoying 503: https://gist.github.com/nigoroll/75468b9a0848dca31a743d001ae01c2c |
If accepted, the same technique should be applied to all the vend.h macros Fixes varnishcache#2904
#2909 contains a suggestion how to fix the root cause |
Fix proposed in #2909 is a nogo, the code does not ensure atomicity in the face of optimizing compilers and vend.h methods are specifically meant to work on unaligned storage. To fix this, we need some kind of reliable indicator that an attribute has in fact been set. oa->present may be that indicator, or if not, could become it. |
Do we want to do anything here? The PR basically forces a gzip stream when streaming. Not sure if we got anywhere in proving that the |
We do not hold a reference, the magic can be unstable.
Turns out the magic check on the boc was racy too in that it can be zero'ed away at anytime. I discussed with @mbgrydeland, we decided to just remove the check and continue to stream if the boc is non null. The other option would be to grab a reference. |
Under load, client c4 from g00005.vtc may fail with a 200 response instead of the expected 206 partial response. There is a window during which we might still see a boc, but because c4 sets beresp.do_stream to false, the fetch has to be over. To close this race we can instead reference the boc as suggested in varnishcache#2904 and keep track of the boc state.
Under load, client c4 from g00005.vtc may fail with a 200 response instead of the expected 206 partial response. There is a window during which we might still see a boc, but because c4 sets beresp.do_stream to false, the fetch has to be over. To close this race we can instead reference the boc as suggested in #2904 and keep track of the boc state.
Under load, client c4 from g00005.vtc may fail with a 200 response instead of the expected 206 partial response. There is a window during which we might still see a boc, but because c4 sets beresp.do_stream to false, the fetch has to be over. To close this race we can instead reference the boc as suggested in #2904 and keep track of the boc state. Refs #3235
Ran into an issue where Varnish reports an incorrect Content-Length when streaming a backend response (gzip) and running that thru a gunzip VDP. Turns out that
OA_GZIPBITS
is being unsafely used. The patch simply forces Varnish to do a chunked response when streaming.Reproducer VTC is here.
Just to explain, when we write the gunzip length of
3167
here:varnish-cache/bin/varnishd/cache/cache_gzip.c
Line 402 in 4709fae
We end up here:
varnish-cache/include/vend.h
Lines 103 to 106 in 4709fae
And because we are writing single bytes, the delivery thread can incorrectly read
3167
as3072
or95
, which would be the good value missing 8 of its bits.I talked this over with @mbgrydeland and we ran into a few issues. First, there is no good VTC for this. This is because the VTC will either pass with the race in place or it can possibly fail with the patch in place due to the way a
boc
is grabbed late in the client delivery. Basically, the VTC checks that a gunzip streamed response has no Content-Length, but there are cases where this is both true and false before and after the patch. So no VTC unless everyone is ok with a racy VTC.The other possibly bigger issue is that when we cache miss, get a
boc
, but its cleared by the time we grab it intransmit
, we could have stale memory in cache, which could in theory give us a partial read. I believe this is due to not hitting a memory barrier anywhere in this code path. I left this issue for Martin to explain (and fix) as this represents a much bigger issue than this patch can handle.