Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent client response properties when a VFP fails during streaming #2918

Closed
slimhazard opened this issue Feb 23, 2019 · 2 comments
Closed
Assignees

Comments

@slimhazard
Copy link
Contributor

I'm still struggling a bit with the matter raised in #2903, which I initially believed was a bug in Varnish. As it turned out, it was a consequence of streaming -- if a backend fetch fails after the stream has begun, so that an HTTP status and header was already sent in the client response, then of course it is too late to change the client resp.status or anything in the header.

That resulted in this test, to verify what happens when a VFP returns an error during streaming. As it turns out, the test fails intermittently.

The same thing can be tested with standard Varnish:

varnishtest "test streaming and error handling"

server s1 {
	rxreq
	txresp -nolen -hdr "Content-Encoding: gzip" -hdr "Content-Length: 4"
	sendhex "de ad be ef"
	expect_close
} -start

varnish v1 -vcl+backend {
	sub vcl_backend_response {
		set beresp.do_stream = true;
		set beresp.do_gunzip = true;
	}
} -start

# Streaming is enabled by default, so when the error is detected, both
# resp.status == 200 and Transfer-Encoding:chunked have been sent,
# with an empty client response body. So the connection closes just
# after sending the client response headers. (cf. Varnish issue #2903)
client c1 {
	txreq
	rxresphdrs
	expect resp.status == 200
	expect resp.http.Transfer-Encoding == "chunked"
	expect_close
} -run

This test passes most of the time, but not always -- I'm seeing it pass about 65%-80% of the time. When it fails, a number of different things may have happened:

  • T-E is chunked, but a chunk length "0" makes it into the response body, so the socket close does not happen right after resp headers are received. Note that if resp.status==200, T-E is chunked, and the body has one empty chunk, then the client response looks like a normal, empty response, with no sign that there was any error at all.

  • Content-Length: 0, so the T-E expectation fails. As with the "one empty chunk" response, this also looks like a normally empty response, with no indication of error.

  • resp.status == 503 and a Guru Meditation, so the response doesn't appear to have been streamed at all.

Since streaming is turned on by default, most users are likely to get a broken client response when a VFP reports failure, but (mostly) not resp.status==503, as is typically the case for Varnish when a fetch fails. A broken 200 response might make a user think there's something wrong with Varnish (as I did originally).

So I'd like to have a way to describe, document and verify what happens to a client response in the error case, so that users can recognize it. But nothing seems to be reliable; even worse, the client response may appear to normal and empty, as described above, so clients can't know to ignore the response.

The only advisory that seems to be reliable is "monitor FetchError in the log", since the error is always reported there. Is there anything reliable we can say about the client response?

@bsdphk
Copy link
Contributor

bsdphk commented Feb 25, 2019

Bugwash:

We don't have a well defined path for the failing VFP to signal to the transport "This failed" where H1 can shut socket and H2/H3 can fail that stream only.

Not ready for ticket, @slimhazard to move to VIP

@bsdphk bsdphk closed this as completed Mar 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants