New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http/2 traffic stopping with hitch/varnish 5.2.0 #2431

Closed
justnx opened this Issue Sep 18, 2017 · 8 comments

Comments

Projects
None yet
6 participants
@justnx
Copy link

justnx commented Sep 18, 2017

I'm not certain if i encountered a bug or design flaw. The Varnish 5.2.0 final version didn't solved my problems.

The sess_fail counter is raising to 33804924 in seconds if im enabling alpn-protos with h2 support in hitch. if im watching the logs, i see that only http2 requests getting logged, so i'm asume that h1 request can't pass anymore.

sessions

The effects reminds me about same issues (haproxy SSL offloading -> nginx, solution see: https://ispire.me/http2-ssl-offloading-with-haproxy-and-nginx/) i encountered when i was pointing/mixing traffic (h1 traffic to h2 or vice versa) to wrong endpoint protocols.

i ran hitch with this settings:

Hitch:
frontend = "[PUBLIC IP]:443+/etc/hitch/certs/certfile.pem"
backend = "[127.0.0.1]:6086"
alpn-protos = "h2, http/1.1"
write-proxy-v2 = on # Write PROXY header

Varnish 5.2.0:
-a PUBLIC IP:80 -a 127.0.0.1:6086,PROXY
-T 127.0.0.1:6082
-f /etc/varnish/myvcl.vcl
-p feature=+http2
-p thread_pool_add_delay=2
-p thread_pools=2
-p thread_pool_min=200
-p thread_pool_max=4000
-p syslog_cli_traffic=off
-t 120
-S /etc/varnish/secret
-s malloc,3G"

What i finally dont understand is how Varnish will distinguish between PUBLICIP:80 HTTP Traffic HTTP1 only coming directly via HTTP to Varnish and feature=+http2 (INTERNALIP:6068,PROXY HTTPS Traffic Protocol: HTTP1+HTTP2) coming via TCP from Hitch which will only listen for SSL Connections.

My current idea is to replace hitch with haproxy as SSL Offloader and doing same with varnish what i do on haproxy > nginx only setups already (see Howto) but for that i need a separate definition parameter h1+PROXY,h2+PROXY to point each protocol to the right listener.

So a definition like setting protocol per Listener would be necessary to make such setup possible:

PublicIP:80 Varnish (non SSL) -> Nginx 10.0.10.2:80
PublicIP:443 Haproxy (SSL Offloading) via Proxy Protocolv1 http/1.1 > Varnish 127.0.0.1:6443 -> Nginx 10.0.10.2:6443 (proxy_protocol)
PublicIP:443 Haproxy (SSL Offloading) via Proxy Protocolv1 h2 -> Varnish 127.0.0.1:6086 -> Nginx 10.0.10.2:6086 (proxy_protocol)

haproxy config:
frontend varnish-ssl
bind PublicIP:443 ssl crt /etc/haproxy/certs/combined.yourcert.pem alpn h2,http/1.1
mode tcp
use_backend varnish-ssl-http2 if { ssl_fc_alpn -i h2 }
default_backend varnish-ssl-http1-fallback

backend varnish-ssl-http2
        mode                    tcp
        option                  tcpka
        server                  varnish-ssl-h2 127.0.0.1:6086 check-send-proxy inter 2500 maxconn 10000

backend varnish-ssl-http1-fallback
        mode                    tcp
        option                  tcpka
        server                  varnish-ssl-h1 127.0.0.1:6443 check-send-proxy inter 2500 maxconn 10000

This is how the listener definition would look like in nginx: 
listen 10.0.10.2:6086 http2 proxy_protocol; # haproxy TCP SSL termination + HTTP/2
listen 10.0.10.2:6443 proxy_protocol;       # haproxy TCP SSL termination for HTTP/1.1 and lower
listen 10.0.10.2:80;          # HTTP only Traffic (non SSL)

So in varnish i need something like this:
-a PUBLICIP:80 -a 127.0.0.1:6443,h1+PROXY -a 127.0.0.1:6086,h2+PROXY

And to finalize point each connection to the right end extending the varnish vcl by this:

if (std.port(local.ip) == 6086) {
    set req.backend_hint = nginx_ssl_h1;
} elseif (std.port(local.ip) == 6443) {
    set req.backend_hint = nginx_ssl_h2;
} else {
    set req.backend_hint = nginx_h1;
}

backend nginx_ssl_h1 {
   .host = "10.0.10.2";
   .port = "6443";
   .proxy_header = 1;
}

backend nginx_ssl_h2 {
   .host = "10.0.10.2";
   .port = "6086";
   .proxy_header = 1;
}

backend nginx_h1 {
   .host = "10.0.10.2";
   .port = "80";
} 

varnish listen config definition without h1/h2 or PROXY (non SSL) should enable HTTP layer while h1/h2 or PROXY (SSL only) varnish config definition should enable raw TCP socket layer

might this be the proper way or does have varnish another approach doing it?

@daghf

This comment has been minimized.

Copy link
Member

daghf commented Sep 18, 2017

Hi

I wonder if the issue you are seeing is related to the -p thread_pool_add_delay=2 setting. At some point in the past we changed the semantics here and the value is now in seconds and not milliseconds.

If you're on a non-ancient linux system, you could probably just remove it (thus setting it to 0), or you could try setting it to 0.002.

The issue you are seeing may also be related to #2418. Could you for now please try a new setting for thread_pool_add_delay, and then get back to us on how that works out?

@daghf daghf added the c=H/2 label Sep 18, 2017

@justnx

This comment has been minimized.

Copy link

justnx commented Sep 18, 2017

@daghf
All my Servers running on Debian Jessie (8.9).
The Website i host serving around 530 million reqs / 7 million uniq visitors per month.

I removed thread_pool_add_delay completly as you mentioned. Now it takes some more seconds but then still everything stuck.

If i disable alpn-protos = "h2, http/1.1" again it just runs normal.

Another sideffect i got since 5.2 is that varnishncsa is stopping logging (doesn't complete the log line) and running at 100%cpu load. If i have h2 enabled its doing this all the time, restarting varnishncsa doesn't solve this.
Only after disabling alpn-protos and restart varnishncsa service after it stuck first or second time it countinously doing its job again.

varnishncsa custom daemon options i use are:
-a -w /storage/logs/varnish/varnishncsa_access.log -F '%{Host}i %{X-Real}i %l %u %t \"%m %U %H\" %s %b \"%{Referer}i\" \"%{User-agent}i\" %{Varnish:time_firstbyte}x'

@daghf

This comment has been minimized.

Copy link
Member

daghf commented Sep 18, 2017

@justnx

Thanks for the follow-up. I'd say this looks very much like #2418.

I haven't seen the varnishncsa issue before, but I'll see if I can reproduce.

@justnx

This comment has been minimized.

Copy link

justnx commented Oct 9, 2017

Any news yet?
Btw. the varnishncsa 100% cpu usage issue still happens multiple times a day while sess_fail raising.

@daghf

This comment has been minimized.

Copy link
Member

daghf commented Oct 10, 2017

Hey

I think the varnishncsa issue is due to the logging in our current H/2 implementation. For now could you try adding

-p vsl_mask=-H2TxBody,-H2TxHdr,-H2RxBody,-H2RxHdr

to your varnishd command line, and then see if varnishncsa CPU consumption improves?

Improving the log bits for h/2 is on my list, and I plan on making these only enabled via a special debug bit.

@justnx

This comment has been minimized.

Copy link

justnx commented Oct 10, 2017

I stopped using h/2 support until its fixed. So currently this occurs with usual h/1
The cpu consumption of varnishncsa goes along with the stop of logging any more lines. On time it occurs it hangs and dont complete writing the last log line. Looks more like an buffer overflow.

@justnx

This comment has been minimized.

Copy link

justnx commented Oct 14, 2017

I recon that you made changes to systemd varnish.service, setting from LimitMEMLOCK=82000 to LimitMEMLOCK=85983232

I changed that on my custom varnish.service file now.
Could this probably produce such effect with varnishncsa if memlock is set too low?

@daghf daghf added c=varnishncsa and removed c=H/2 labels Mar 16, 2018

@bsdphk bsdphk added the a=bugwash label Oct 4, 2018

@daghf

This comment has been minimized.

Copy link
Member

daghf commented Oct 8, 2018

This issue seemed to pivot to a potential varnishncsa issue that we haven't seen elsewhere, or that is likely OBE by now.

Timing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment