Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic with ESI via VEP_Finish -> ObjGetSpace #3502

Closed
nigoroll opened this issue Jan 21, 2021 · 2 comments
Closed

Panic with ESI via VEP_Finish -> ObjGetSpace #3502

nigoroll opened this issue Jan 21, 2021 · 2 comments
Assignees

Comments

@nigoroll
Copy link
Member

nigoroll commented Jan 21, 2021

This was seen during load testing with different configurations on different machines:

Assert error in obj_getmethods(), cache/cache_obj.c line 96:
  Condition((oc->stobj->stevedore) != NULL) not true.
version = varnish-6.3.2 revision 6870fd661a2b42c2e8adad838b5d92a71f27dccd, vrt api = 10.0
ident = Linux,3.10.0-1160.11.1.el7.x86_64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
now = 1963.740356 (mono), 1611218259.565571 (real)
Backtrace:
  0x43d8df: /usr/sbin/varnishd() [0x43d8df]
  0x4a3582: /usr/sbin/varnishd(VAS_Fail+0x42) [0x4a3582]
  0x43a2a0: /usr/sbin/varnishd() [0x43a2a0]
  0x43a6ee: /usr/sbin/varnishd(ObjGetSpace+0x1e) [0x43a6ee]
  0x42d411: /usr/sbin/varnishd(VFP_GetStorage+0x61) [0x42d411]
  0x426a0f: /usr/sbin/varnishd() [0x426a0f]
  0x428f20: /usr/sbin/varnishd(VEP_Finish+0xd0) [0x428f20]
  0x425f46: /usr/sbin/varnishd() [0x425f46]
  0x426b75: /usr/sbin/varnishd() [0x426b75]
  0x42d6bd: /usr/sbin/varnishd(VFP_Close+0x2d) [0x42d6bd]

full panic with redacted bits below

Hypothesis on what is going on:

  • Fetch fails (obvious though vfc = ... { failed = 1 ... }
  • vbf_stp_error() calls
        if (oc->stobj->stevedore != NULL) {
                oc->boc->len_so_far = 0;
                ObjFreeObj(bo->wrk, oc);
        }
  • ObjFreeObj() dispatches to sml_objfree()() which clears the stobj:
        memset(oc->stobj, 0, sizeof oc->stobj);
  • when the ESI vfp gets fini'd, we call vfp_esi_fini() -> vfp_esi_end() -> VEP_Finish()
  • which calls the callback vep->cb() aka vfp_vep_callback()
  • which calls VFP_GetStorage() -> ObjGetSpace(), then triggering the assertion (due to the ObjFreeObj() in vbf_stp_error()

Trouble is: I fail to reproduce this in a VTC right now, so the hypothesis might be useless.

Panic with redacted bits:

Panic at: Thu, 21 Jan 2021 08:37:41 GMT
Assert error in obj_getmethods(), cache/cache_obj.c line 96:
  Condition((oc->stobj->stevedore) != NULL) not true.
version = varnish-6.3.2 revision 6870fd661a2b42c2e8adad838b5d92a71f27dccd, vrt api = 10.0
ident = Linux,3.10.0-1160.11.1.el7.x86_64,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
now = 1963.740356 (mono), 1611218259.565571 (real)
Backtrace:
  0x43d8df: /usr/sbin/varnishd() [0x43d8df]
  0x4a3582: /usr/sbin/varnishd(VAS_Fail+0x42) [0x4a3582]
  0x43a2a0: /usr/sbin/varnishd() [0x43a2a0]
  0x43a6ee: /usr/sbin/varnishd(ObjGetSpace+0x1e) [0x43a6ee]
  0x42d411: /usr/sbin/varnishd(VFP_GetStorage+0x61) [0x42d411]
  0x426a0f: /usr/sbin/varnishd() [0x426a0f]
  0x428f20: /usr/sbin/varnishd(VEP_Finish+0xd0) [0x428f20]
  0x425f46: /usr/sbin/varnishd() [0x425f46]
  0x426b75: /usr/sbin/varnishd() [0x426b75]
  0x42d6bd: /usr/sbin/varnishd(VFP_Close+0x2d) [0x42d6bd]
thread = (cache-worker)
pthread.attr = {
  guard = 4096,
  stack_bottom = 0x7f38b7c73000,
  stack_top = 0x7f38b7cf3000,
  stack_size = 524288,
}
thr.req = (nil) {
},
thr.busyobj = 0x7f373bdb1020 {
  end = 0x7f373be31000,
  retries = 0,
  sp = 0x7f393ac42020 {
    fd = 851, vxid = 54566919,
    t_open = 1611218243.842591,
    t_idle = 1611218259.389168,
    ws = 0x7f393ac42060 {
      id = \"ses\",
      {s, f, r, e} = {0x7f393ac420a0, +168, (nil), +344},
    },
    transport = HTTP/1 {
      state = HTTP1::Proc
    }
    client = *redacted* 37042 /var/lib/haproxy/varnish_listen.sock,
  },
  worker = 0x7f38b7cf1bd0 {
    ws = 0x7f38b7cf1c78 {
      id = \"wrk\",
      {s, f, r, e} = {0x7f38b7cf1380, +0, (nil), +2040},
    },
    VCL::method = BACKEND_RESPONSE,
    VCL::return = deliver,
    VCL::methods = {BACKEND_FETCH, BACKEND_RESPONSE},
  },
  vfc = 0x7f373bdb3be0 {
    failed = 1,
    req = 0x7f373bdb1a70,
    resp = 0x7f373bdb2328,
    wrk = 0x7f38b7cf1bd0,
    oc = 0x7f37bd5e43c0,
    filters = {
      esi_gzip = 0x7f373bdb4d70 {
        priv1 = 0x7f377a8a6180,
        priv2 = 0,
        closed = 0
      },
      gunzip = 0x7f373bdb4d28 {
        priv1 = 0x7f376e983600,
        priv2 = 0,
        closed = 0
      },
      V1F_CHUNKED = 0x7f373bdb4b38 {
        priv1 = 0x7f373bdb41d8,
        priv2 = -1,
        closed = 0
      },
    },
    obj_flags = 0x16,
  },
  filter_list = \"gunzip esi_gzip\",
  ws = 0x7f373bdb1060 {
    id = \"bo\",
    {s, f, r, e} = {0x7f373bdb3c28, +4800, (nil), +512976},
  },
  ws_bo = 0x7f373bdb40f0,
  http[bereq] = 0x7f373bdb1a70 {
    ws = 0x7f373bdb1060 {
      [Already dumped, see above]
    },
    hdrs {
      \"GET\",
      *redacted*,
    },
  },
  http[beresp] = 0x7f373bdb2328 {
    ws = 0x7f373bdb1060 {
      [Already dumped, see above]
    },
    hdrs {
      \"HTTP/1.1\",
      \"200\",
      \"OK\",
      *redacted*,
    },
  },
  objcore[fetch] = 0x7f37bd5e43c0 {
    refcnt = 2,
    flags = {busy, hfm, hfp},
    exp_flags = {},
    boc = 0x7f37f90c2820 {
      refcnt = 2,
      state = req_done,
      vary = (nil),
      stevedore_priv = (nil),
    },
    exp = {1611218259.565343, 1800.000000, 0.000000, 0.000000},
    objhead = 0x7f3708bf1f20,
    stevedore = (nil),
  },
  http_conn = 0x7f373bdb41d8 {
    fd = 735 (@0x7f3816a61ef4),
    doclose = RX_BODY,
    ws = 0x7f373bdb1060 {
      [Already dumped, see above]
    },
    {rxbuf_b, rxbuf_e} = {0x7f373bdb4238, 0x7f373bdb430a},
    {pipeline_b, pipeline_e} = {0x7f373bdb430a, 0x7f373bdb4b31},
    content_length = -1,
    body_status = chunked,
    first_byte_timeout = 4.000000,
    between_bytes_timeout = 1.000000,
  },
  flags = {do_esi, do_gzip, uncacheable},
  director_req = 0x7f39e8666aa0 {
    cli_name = boot.*redacted*,
    admin_health = probe, changed = 1611216467.973877,
    type = backend {
      port = (null),
      hosthdr = 0.0.0.0,
      n_conn = 28,
    },
  },
  director_resp = director_req,
  vcl = {
    name = \"boot\",
    busy = 1502,
    discard = 0,
    state = auto,
    temp = warm,
    conf = {
      syntax = \"41\",
      srcname = {
        *redacted*,
      },
    },
  },
},
vmods = {
  std = {0x7f39e850a0f0, Varnish 6.3.2 6870fd661a2b42c2e8adad838b5d92a71f27dccd, 0.0},
  directors = {0x7f39e850a160, Varnish 6.3.2 6870fd661a2b42c2e8adad838b5d92a71f27dccd, 0.0},
  re = {0x7f39e850a1d0, Varnish 6.3.2 6870fd661a2b42c2e8adad838b5d92a71f27dccd, 10.0},
  re2 = {0x7f39e850a2b0, Varnish 6.3.0 0c9a93f1b2c6de49b8c6ec8cefd9d2be50041d79, 10.0},
  uuid = {0x7f39e850a320, Varnish 6.3.2 6870fd661a2b42c2e8adad838b5d92a71f27dccd, 10.0},
  header = {0x7f39e850a390, Varnish 6.3.2 6870fd661a2b42c2e8adad838b5d92a71f27dccd, 10.0},
  blob = {0x7f39e850a400, Varnish 6.3.2 6870fd661a2b42c2e8adad838b5d92a71f27dccd, 0.0},
  var = {0x7f39e850a470, Varnish 6.3.2 6870fd661a2b42c2e8adad838b5d92a71f27dccd, 10.0},
  selector = {0x7f39e850a4e0, Varnish 6.3.1 6e96ff048692235e64565211a38c41432a26c055, 10.0},
  geoip2 = {0x7f39e850a550, Varnish 6.3.2 6870fd661a2b42c2e8adad838b5d92a71f27dccd, 0.0},
  *redacted* = {0x7f39e850a5c0, Varnish 6.3.2 6870fd661a2b42c2e8adad838b5d92a71f27dccd, 0.0},
},
@nigoroll nigoroll self-assigned this Jan 22, 2021
@nigoroll
Copy link
Member Author

nigoroll commented Jan 22, 2021

FYI: I have a reproducer VTC now. Which was hard to get to because I mislead myself onto the wrong track. One half of my hypothesis was wrong.

@nigoroll
Copy link
Member Author

Unfortunately, I forgot the 'Fixes #...' in the commit message. Fixed in 50780f3

nigoroll added a commit that referenced this issue Jan 22, 2021
This is similar to #3502, but the out-of-storage condition happens after
the object has been created successfully.

Use a logexpect to ensure we test as intended.
nigoroll added a commit that referenced this issue Jan 23, 2021
Do not overwrite an already latched errno.

Note that it does not make any difference at the moment because we only
ever use ENOMEM

Ref #3502
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant