New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health probes fail when HTTP response does not contain reason phrase #2069

Closed
philipseidel opened this Issue Sep 1, 2016 · 5 comments

Comments

Projects
None yet
5 participants
@philipseidel

philipseidel commented Sep 1, 2016

Expected Behavior

When an HTTP server responds with a valid HTTP status code and no reason phrase a Varnish health probe should consider it valid.

Current Behavior

Some application servers are starting to drop support for the reason phrase in an HTTP response. The spring-boot project is one example: spring-projects/spring-boot#6548. These updates are resulting in failed health probes since the response message does not contain a reason phrase HTTP/1.1 200 vs. HTTP/1.1 200 OK.

Possible Solution

Accept HTTP responses that don't contain a reason phrase as valid as RFC-2616 states that The client is not required to examine or display the Reason- Phrase. I am not sure I agree with this but that is how some projects are interpreting the spec.

Steps to Reproduce (for bugs)

  • Create a text file with the following markup.
HTTP/1.1 200

Hello World!
  • Run simple netcat server while true ; do nc -l 80 < test.html ; done
  • Create simple VCL
backend default {
  .host = "127.0.0.1";
  .port = "http";
  .probe = {
    .url = "/";
    .window = 8;
    .threshold = 3;
    .initial = 3;
  }
}
  • Reload/Start varnish
  • Watch backend health statistics watch 'varnishadm debug.health' and notice the Happy row will have the backend as - rather than H.
  • Update the text file to include a reason phrase.
HTTP/1.1 200 WOOT

Hello World!
  • Backend statistics should now be reporting H.

Context

It would seem that some are interpreting the spec as though the reason phrase is not required since the client shouldn't be required to examine it. If things continue to go this direction it will be hard to upgrade components that run behind Varnish and follow this pattern.

Your Environment

  • Version used: 3.0.7
  • Operating System and version: Red Hat Enterprise Linux Server release 6.7 (Santiago)
@Dridi

This comment has been minimized.

Show comment
Hide comment
@Dridi

Dridi Sep 1, 2016

Member

Varnish 3.0 is EOL, and so will 4.0 in two weeks. You should upgrade to 4.1 as this won't be fixed.

Member

Dridi commented Sep 1, 2016

Varnish 3.0 is EOL, and so will 4.0 in two weeks. You should upgrade to 4.1 as this won't be fixed.

@Dridi Dridi closed this Sep 1, 2016

@Dridi Dridi added the wontfix label Sep 1, 2016

@philipseidel

This comment has been minimized.

Show comment
Hide comment
@philipseidel

philipseidel Sep 1, 2016

Is it an issue in 4.1?

philipseidel commented Sep 1, 2016

Is it an issue in 4.1?

@Dridi

This comment has been minimized.

Show comment
Hide comment
@Dridi

Dridi Sep 1, 2016

Member

It seems to work fine with 4.1.3, this test case passes:

varnishtest "Test for #2069"

server s1 -repeat 20 {
    rxreq
    send "HTTP/1.1 200\r\n\r\nHello World!"
    accept
} -start

varnish v1 -vcl {
    probe start_sick {
        .interval = 50ms;
        .initial = 0;
    }

    backend get_healthy {
        .host = "${s1_addr}";
        .port = "${s1_port}";
        .probe = start_sick;
    }

    backend stay_sick {
        .host = "${bad_ip}";
        .port = "9090";
        .probe = start_sick;
    }

    sub vcl_recv {
        set req.backend_hint = stay_sick;
    }
} -start

delay 1

varnish v1 -expect VBE.vcl1.get_healthy.happy > 0
varnish v1 -expect VBE.vcl1.stay_sick.happy == 0
Member

Dridi commented Sep 1, 2016

It seems to work fine with 4.1.3, this test case passes:

varnishtest "Test for #2069"

server s1 -repeat 20 {
    rxreq
    send "HTTP/1.1 200\r\n\r\nHello World!"
    accept
} -start

varnish v1 -vcl {
    probe start_sick {
        .interval = 50ms;
        .initial = 0;
    }

    backend get_healthy {
        .host = "${s1_addr}";
        .port = "${s1_port}";
        .probe = start_sick;
    }

    backend stay_sick {
        .host = "${bad_ip}";
        .port = "9090";
        .probe = start_sick;
    }

    sub vcl_recv {
        set req.backend_hint = stay_sick;
    }
} -start

delay 1

varnish v1 -expect VBE.vcl1.get_healthy.happy > 0
varnish v1 -expect VBE.vcl1.stay_sick.happy == 0

Dridi added a commit that referenced this issue Sep 12, 2016

Dridi added a commit that referenced this issue Apr 10, 2017

Ignore missing reason in probe responses
This is a cherry-pick of 3271c51 that
fixed the issue for Varnish 3.0 since it wasn't a problem in the 4.1
series then. This is only a back-port of the regression test since the
referenced commits already fixed the regression.

Refs 5e4d6c8
Refs 9f42262
Refs #2069
@behinam

This comment has been minimized.

Show comment
Hide comment
@behinam

behinam May 7, 2017

I Have problem with 3 webserver health check
I have a varnish server with 3 backends. all backends are apache. everything is ok and the varnish server caches everything I need, and the connections are OK. I wnat to monitor the health of web servers. and in case of a failure, varnish does not send reuests to the failed web server. The problem is that when I enable the probe for all backedns I get the 503 error! If i enablle it on one or two backends, everything is OK, but when I enable it for 3 backends I get the 503 error.
here's varnish configuration for backends and health checking:

vcl 4.0;

import directors;

probe backend_healthcheck {
.url = "/";
.timeout = 34 ms;
.window = 5;
.threshold = 3;
.interval = 1s;

}

backend web1 {
.host = "192.168.1.16";
.port = "8080";
.probe = backend_healthcheck;
}

backend web2 {
.host = "192.168.1.18";
.port = "8080";
.probe = backend_healthcheck;
}

backend web3 {
.host = "192.168.1.20";
.port = "8080";
.probe = backend_healthcheck;
}

sub vcl_init {
new apache = directors.round_robin();
apache.add_backend(web1);
apache.add_backend(web2);
apache.add_backend(web3);
}

behinam commented May 7, 2017

I Have problem with 3 webserver health check
I have a varnish server with 3 backends. all backends are apache. everything is ok and the varnish server caches everything I need, and the connections are OK. I wnat to monitor the health of web servers. and in case of a failure, varnish does not send reuests to the failed web server. The problem is that when I enable the probe for all backedns I get the 503 error! If i enablle it on one or two backends, everything is OK, but when I enable it for 3 backends I get the 503 error.
here's varnish configuration for backends and health checking:

vcl 4.0;

import directors;

probe backend_healthcheck {
.url = "/";
.timeout = 34 ms;
.window = 5;
.threshold = 3;
.interval = 1s;

}

backend web1 {
.host = "192.168.1.16";
.port = "8080";
.probe = backend_healthcheck;
}

backend web2 {
.host = "192.168.1.18";
.port = "8080";
.probe = backend_healthcheck;
}

backend web3 {
.host = "192.168.1.20";
.port = "8080";
.probe = backend_healthcheck;
}

sub vcl_init {
new apache = directors.round_robin();
apache.add_backend(web1);
apache.add_backend(web2);
apache.add_backend(web3);
}

hermunn added a commit that referenced this issue May 21, 2017

Ignore missing reason in probe responses
This is a cherry-pick of 3271c51 that
fixed the issue for Varnish 3.0 since it wasn't a problem in the 4.1
series then. This is only a back-port of the regression test since the
referenced commits already fixed the regression.

Refs 5e4d6c8
Refs 9f42262
Refs #2069
@hermunn

This comment has been minimized.

Show comment
Hide comment
@hermunn

hermunn May 21, 2017

Contributor

Backport review: The commit 6768c37 was backported as 9367a63.

Contributor

hermunn commented May 21, 2017

Backport review: The commit 6768c37 was backported as 9367a63.

ibreger added a commit to thomsonreuters/varnish-cache that referenced this issue Jun 27, 2017

Ignore missing reason in probe responses
This is a cherry-pick of 3271c51 that
fixed the issue for Varnish 3.0 since it wasn't a problem in the 4.1
series then. This is only a back-port of the regression test since the
referenced commits already fixed the regression.

Refs 5e4d6c8
Refs 9f42262
Refs #2069

wmfgerrit pushed a commit to wikimedia/operations-debs-varnish4 that referenced this issue Jun 29, 2017

4.1.7-1wm1: new upstream, new counters
Package varnish 4.1.7, add counters for transient storage.

Introduce a new counter for shortlived objects creation,
cache_shortlived, and another one for uncacheable responses,
cache_uncacheable. They should provide insights when it comes to
monitoring transient storage usage.

Changes in 4.1.7:

 - Correctly honor nuke_limit parameter
   varnishcache/varnish-cache#1764
 - Prevent storage backends name collisions
   varnishcache/varnish-cache#2321
 - varnishstat -1 -f field inclusion glob doesn't allow VBE backend fields
   varnishcache/varnish-cache#2022
 - Health probes fail when HTTP response does not contain reason phrase
   varnishcache/varnish-cache#2069
 - "varnishstat -f MAIN.sess_conn -1" produces empty output
   varnishcache/varnish-cache#2118
 - Remember to reset workspace
   varnishcache/varnish-cache#2219
 - Rework and fix varnishstat counter filtering
   varnishcache/varnish-cache#2320
 - Docfix: Only root can jail
   varnishcache/varnish-cache#2329
 - Don't panic on a null ban
 - Add extra locking to protect the pools list and refcounts
 - Add -vsl_catchup to varnishtest
 - Add record-prefix support to varnishncsa

Bug: T164768
Ref: https://github.com/varnishcache/varnish-cache/blob/4.1/doc/changes.rst#varnish-cache-417-2017-06-28
Change-Id: I8a8f3a8103feb83b1a55a6788ea6c5d12963b4f5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment