New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nuke_limit is not honored #1764

Closed
bsdphk opened this Issue Mar 4, 2016 · 8 comments

Comments

Projects
None yet
8 participants
@bsdphk
Contributor

bsdphk commented Mar 4, 2016

Old ticket imported from Trac:

nuke_limit doesn't seem to have any effect anymore. It looks like stv_alloc_obj is called multiple times per object, and only does one allocation, so it never hits nuke_limit.

@gquintard gquintard self-assigned this May 11, 2016

@gquintard

This comment has been minimized.

Show comment
Hide comment
@gquintard

gquintard May 11, 2016

Contributor

nuke_limit is not honored anymore. relevant vtc:

varnishtest "Test nuke_limit"

server s1 {
    # First consume (almost) all of the storage
    rxreq
    expect req.url == /url1
    txresp -bodylen 200000

    rxreq
    expect req.url == /url2
    txresp -bodylen 200000

    rxreq
    expect req.url == /url3
    txresp -bodylen 200000

    rxreq
    expect req.url == /url4
    txresp -bodylen 200000

    rxreq
    expect req.url == /url5
    txresp -bodylen 1000000
} -start

varnish v1  -arg "-smalloc,1M" -arg "-p nuke_limit=3" -vcl+backend {
    sub vcl_backend_response {
        set beresp.do_stream = false;
    }
} -start


client c1 {
    txreq -url /url1
    rxresp
    expect resp.status == 200

    txreq -url /url2
    rxresp
    expect resp.status == 200

    txreq -url /url3
    rxresp
    expect resp.status == 200

    txreq -url /url4
    rxresp
    expect resp.status == 200

    txreq -url /url5
    rxresp
    expect resp.status == 503
} -run
Contributor

gquintard commented May 11, 2016

nuke_limit is not honored anymore. relevant vtc:

varnishtest "Test nuke_limit"

server s1 {
    # First consume (almost) all of the storage
    rxreq
    expect req.url == /url1
    txresp -bodylen 200000

    rxreq
    expect req.url == /url2
    txresp -bodylen 200000

    rxreq
    expect req.url == /url3
    txresp -bodylen 200000

    rxreq
    expect req.url == /url4
    txresp -bodylen 200000

    rxreq
    expect req.url == /url5
    txresp -bodylen 1000000
} -start

varnish v1  -arg "-smalloc,1M" -arg "-p nuke_limit=3" -vcl+backend {
    sub vcl_backend_response {
        set beresp.do_stream = false;
    }
} -start


client c1 {
    txreq -url /url1
    rxresp
    expect resp.status == 200

    txreq -url /url2
    rxresp
    expect resp.status == 200

    txreq -url /url3
    rxresp
    expect resp.status == 200

    txreq -url /url4
    rxresp
    expect resp.status == 200

    txreq -url /url5
    rxresp
    expect resp.status == 503
} -run

@fgsch fgsch self-assigned this Oct 17, 2016

@bsdphk bsdphk closed this in 9d73cd1 Feb 27, 2017

mbgrydeland added a commit that referenced this issue Jun 15, 2017

Make param::nuke_limit a total count of nukes allowed for each
object creation.

Fixes #1764

Conflicts:
	bin/varnishd/cache/cache_fetch.c
	bin/varnishd/storage/stevedore.c
	bin/varnishd/storage/storage.h
	bin/varnishd/storage/storage_lru.c
	bin/varnishd/storage/storage_persistent.c
	bin/varnishd/storage/storage_simple.c
@hermunn

This comment has been minimized.

Show comment
Hide comment
@hermunn

hermunn Jun 20, 2017

Contributor

Backport review: This is backported by @mbgrydeland (365e605) and is part of 4.1.7-beta1. For some users this fix will change the behavior of varnish in a significant way, but for most people it will not be noticed.

Contributor

hermunn commented Jun 20, 2017

Backport review: This is backported by @mbgrydeland (365e605) and is part of 4.1.7-beta1. For some users this fix will change the behavior of varnish in a significant way, but for most people it will not be noticed.

ibreger added a commit to thomsonreuters/varnish-cache that referenced this issue Jun 27, 2017

Make param::nuke_limit a total count of nukes allowed for each
object creation.

Fixes varnishcache#1764

Conflicts:
	bin/varnishd/cache/cache_fetch.c
	bin/varnishd/storage/stevedore.c
	bin/varnishd/storage/storage.h
	bin/varnishd/storage/storage_lru.c
	bin/varnishd/storage/storage_persistent.c
	bin/varnishd/storage/storage_simple.c

wmfgerrit pushed a commit to wikimedia/operations-debs-varnish4 that referenced this issue Jun 29, 2017

4.1.7-1wm1: new upstream, new counters
Package varnish 4.1.7, add counters for transient storage.

Introduce a new counter for shortlived objects creation,
cache_shortlived, and another one for uncacheable responses,
cache_uncacheable. They should provide insights when it comes to
monitoring transient storage usage.

Changes in 4.1.7:

 - Correctly honor nuke_limit parameter
   varnishcache/varnish-cache#1764
 - Prevent storage backends name collisions
   varnishcache/varnish-cache#2321
 - varnishstat -1 -f field inclusion glob doesn't allow VBE backend fields
   varnishcache/varnish-cache#2022
 - Health probes fail when HTTP response does not contain reason phrase
   varnishcache/varnish-cache#2069
 - "varnishstat -f MAIN.sess_conn -1" produces empty output
   varnishcache/varnish-cache#2118
 - Remember to reset workspace
   varnishcache/varnish-cache#2219
 - Rework and fix varnishstat counter filtering
   varnishcache/varnish-cache#2320
 - Docfix: Only root can jail
   varnishcache/varnish-cache#2329
 - Don't panic on a null ban
 - Add extra locking to protect the pools list and refcounts
 - Add -vsl_catchup to varnishtest
 - Add record-prefix support to varnishncsa

Bug: T164768
Ref: https://github.com/varnishcache/varnish-cache/blob/4.1/doc/changes.rst#varnish-cache-417-2017-06-28
Change-Id: I8a8f3a8103feb83b1a55a6788ea6c5d12963b4f5
@inCre

This comment has been minimized.

Show comment
Hide comment
@inCre

inCre Aug 21, 2017

Hi,

After the security bug we updated our varnish cluster from 4.1.1 -> 4.1.8

After a few days we realized a new issue, which i pretty much think is due to nuked objects.
After a restart of varnish and until MAIN.n_lru_nuked is larger then 0 everything works fine.
But when MAIN.n_lru_nuked exceeds lets say 100 only chunks of content gets cached. For eksample video files (flv in this case) which has a size between 100-200mb each is suddenly served as 1-3mb instead. This results in the videos stalls after som secs of cause.

I then saw the following:

The default nuke_limit is 10, and this number is high enough to not
affect most users. However, if you want to make sure that the
behavior is not changed when upgrading, you should set the value much
higher.

Therefore I have these questions

  1. What happens when the limit 10 get exceeded?
  2. We have a rather large varnish cluster, but still nukes severel 1000 a day. Never been a problem, but since 4.1.7 it is. Should we just bumped the value to 9999999 or is there a problem with doing that?
  3. Are we able to disable the nuke_limit eg. 0 ?

inCre commented Aug 21, 2017

Hi,

After the security bug we updated our varnish cluster from 4.1.1 -> 4.1.8

After a few days we realized a new issue, which i pretty much think is due to nuked objects.
After a restart of varnish and until MAIN.n_lru_nuked is larger then 0 everything works fine.
But when MAIN.n_lru_nuked exceeds lets say 100 only chunks of content gets cached. For eksample video files (flv in this case) which has a size between 100-200mb each is suddenly served as 1-3mb instead. This results in the videos stalls after som secs of cause.

I then saw the following:

The default nuke_limit is 10, and this number is high enough to not
affect most users. However, if you want to make sure that the
behavior is not changed when upgrading, you should set the value much
higher.

Therefore I have these questions

  1. What happens when the limit 10 get exceeded?
  2. We have a rather large varnish cluster, but still nukes severel 1000 a day. Never been a problem, but since 4.1.7 it is. Should we just bumped the value to 9999999 or is there a problem with doing that?
  3. Are we able to disable the nuke_limit eg. 0 ?
@hermunn

This comment has been minimized.

Show comment
Hide comment
@hermunn

hermunn Aug 21, 2017

Contributor
  1. If the nuke limit is reached, the Varnish will serve a 503.
  2. Yes, bumping to a really high number is the right thing. 9999999 is probably a good value.
  3. No, zero is not a special value. With 0 there will be no RLU nuking at all.

This is one of very few patches after 4.1 that can affect a running varnish in a negative way, and it is present in all versions from 4.1.7-beta1 and onwards.

Contributor

hermunn commented Aug 21, 2017

  1. If the nuke limit is reached, the Varnish will serve a 503.
  2. Yes, bumping to a really high number is the right thing. 9999999 is probably a good value.
  3. No, zero is not a special value. With 0 there will be no RLU nuking at all.

This is one of very few patches after 4.1 that can affect a running varnish in a negative way, and it is present in all versions from 4.1.7-beta1 and onwards.

@Dridi

This comment has been minimized.

Show comment
Hide comment
@Dridi

Dridi Aug 21, 2017

Member

This is a topic for the misc mailing list, but the short story is that the nuke_limit is per-transaction, and the n_lru_nuked is global. @inCre f you have more questions, please ask them on the mailing list.

Member

Dridi commented Aug 21, 2017

This is a topic for the misc mailing list, but the short story is that the nuke_limit is per-transaction, and the n_lru_nuked is global. @inCre f you have more questions, please ask them on the mailing list.

@naveen-goswami

This comment has been minimized.

Show comment
Hide comment
@naveen-goswami

naveen-goswami Feb 20, 2018

We are also facing the same problem as described by @inCre , varnish is truncating the transaction and only send part of the response body to the client. This is happening for large objects for now. Our cache memory is almost full, therefore, I assume varnish need to do nuking to make space for new object. Our current nuke_limit is 50. But @hermunn mention that after reaching nuke_limit we will receive 503, but this is not happening, we receive the error transfer closed with outstanding read data remaining. We are using varnish 5.2 at the moment. Any help would be appreciated.

naveen-goswami commented Feb 20, 2018

We are also facing the same problem as described by @inCre , varnish is truncating the transaction and only send part of the response body to the client. This is happening for large objects for now. Our cache memory is almost full, therefore, I assume varnish need to do nuking to make space for new object. Our current nuke_limit is 50. But @hermunn mention that after reaching nuke_limit we will receive 503, but this is not happening, we receive the error transfer closed with outstanding read data remaining. We are using varnish 5.2 at the moment. Any help would be appreciated.

@Dridi

This comment has been minimized.

Show comment
Hide comment
@Dridi

Dridi Feb 20, 2018

Member

@naveen-goswami please take this to the mailing list instead. If you don't get a 503, it means that streaming was enabled (default) and Varnish started the client delivery parallel to the backend fetch. This is a trade off between latency and correctness. The solution is to have two storage backends, one for large files and one for small files, this way you won't run into a situation where large files nuke lots of small files to make space.

Member

Dridi commented Feb 20, 2018

@naveen-goswami please take this to the mailing list instead. If you don't get a 503, it means that streaming was enabled (default) and Varnish started the client delivery parallel to the backend fetch. This is a trade off between latency and correctness. The solution is to have two storage backends, one for large files and one for small files, this way you won't run into a situation where large files nuke lots of small files to make space.

@naveen-goswami

This comment has been minimized.

Show comment
Hide comment
@naveen-goswami

naveen-goswami Feb 20, 2018

just to update on the issue, increasing nuke_limit to 500 helped us in removing those errors. Will take future concerns to mailing list as mentioned. @Dridi could you provide any guide that help us to divide storage back-ends, so we can tackle this problem effectively ? our statics suggest that we would again face this problem when the object size is > 10 mb

naveen-goswami commented Feb 20, 2018

just to update on the issue, increasing nuke_limit to 500 helped us in removing those errors. Will take future concerns to mailing list as mentioned. @Dridi could you provide any guide that help us to divide storage back-ends, so we can tackle this problem effectively ? our statics suggest that we would again face this problem when the object size is > 10 mb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment