Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New timeout nomenclature #3817

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from
Draft

Conversation

dridi
Copy link
Member

@dridi dridi commented Jun 17, 2022

This pull request is loosely based on work started a few years ago with @nigoroll. Loosely because it is not exactly where we left off our drafting effort last time we made progress together, and it adds things I had not previously discussed with him.

Thanks to the alias systems added to runtime parameters and VCL variables, renaming parameters does not break anything until we decide to remove the deprecated aliases.

Extract from the manual

Timeout Parameters
------------------

All timeout parameters follow a consistent naming scheme:

    <subject>_<event>_<type>

The subject may be one of the following:

- ``sess`` (client session)
- ``req`` (client request)
- ``resp`` (client response)
- ``pipe`` (client transaction turning into a pipe)
- ``bereq`` (backend request)
- ``beresp`` (backend response)
- ``backend`` (backend resource)
- ``cli`` (command line session)
- ``thread`` (worker thread)

Common events are:

- ``idle`` (waiting for data to be sent or received)
- ``send`` (complete delivery)
- ``start`` (first byte fetched)
- ``pool`` (time spent in a pool)

Finally, there are two types of timeouts:

- ``timeout`` (definitive failure condition)
- ``interrupt`` (triggered to verify another condition)

This does not really apply to "all" timeouts, only those with "timeout_" or "_timeout" in the name.

Parameter changes

  • first_byte_timeout renamed to beresp_start_timeout
  • between_bytes_timeout renamed to beresp_idle_timeout
  • backend_idle_timeout renamed to backend_pool_timeout
  • cli_timeout renamed to cli_resp_timeout
  • connect_timeout renamed to bereq_connect_timeout
  • idle_send_timeout renamed to resp_idle_interrupt
  • pipe_timeout renamed to pipe_idle_timeout
  • send_timeout renamed to resp_send_timeout
  • timeout_idle renamed to sess_idle_timeout
  • timeout_linger renamed to sess_linger_interrupt

VCL variables changes

  • bereq.between_bytes_timeout renamed to beresp.idle_timeout
  • bereq.first_byte_timeout renamed to beresp.start_timeout
  • sess.idle_send_timeout renamed to resp.idle_interrupt
  • sess.send_timeout renamed to resp.send_timeout
  • sess.timeout_idle renamed to sess.idle_timeout
  • sess.timeout_linger renamed to sess.linger_interrupt

Future timeout considerations

Thanks to this naming scheme we can more easily see what we may be missing. This pull request does not change anything except for parameter and VCL variable names, while offering compatibility aliases. There are known desirable timeouts that could be named like this following the proposed nomenclature:

  • bereq_send_timeout (wanted by both UPLEX and Varnish Software)
  • req_fetch_timeout (wanted by both UPLEX and Varnish Software)
  • beresp_fetch_timeout (Varnish Software use case, would be the equivalent of last_byte_timeout from Varnish Enterprise)
  • pipe_task_timeout (Varnish Software use case)
  • req_task_timeout (UPLEX use case)
  • bereq_task_timeout (UPLEX use case)

There are other aspects that our timeout draft covered, for example #3766, but I think we can start with this alone. The change is already large enough and promises interesting discussions since I don't expect consensus to arise immediately. Huge thanks to @nigoroll and sorry for not following up with you sooner.

@emacneille-at-varnish
Copy link

Note that some folks like to set timeouts on bereq in vcl_backend_fetch on a per-request basis, possibly allowing more time on specific objects that are known to require more computation in the backend. Some examples are here

If new timeouts are added and they aren't accessible via the bereq namespace, would you be able to set them instead in vcl_backend_response? Of course if you are already in vcl_backend_error, you have missed the opportunity to extend a timeout.

@dridi
Copy link
Member Author

dridi commented Jun 21, 2022

Taking the example you link to, we would from this:

import urlplus;

sub vcl_backend_fetch {
        if (urlplus.get_extension() == "zip") {
	        set bereq.last_byte_timeout = 4s;
        }
}

to this:

import urlplus;

sub vcl_backend_fetch {
        if (urlplus.get_extension() == "zip") {
	        set beresp.fetch_timeout = 4s;
        }
}

While you wouldn't have access to most of beresp, you could still set up beresp timeouts in vcl_backend_fetch if a) this pull request was merged and b) this specific timeout was not enterprise-only.

@emacneille-at-varnish
Copy link

Makes sense, thank you!

@dridi
Copy link
Member Author

dridi commented Jun 27, 2022

Rebased against current master to solve conflicts with #3812.

dridi added 18 commits June 27, 2022 23:01
Now that aliases show themselves as such to other tweaks, we'll need the
other tweaks to know about this detail. Moving the tweak up will allow a
new function visible to the others without a prior declaration.

Refs f885637
The way we now pass aliases, we lose the original priv pointer. So we
need to resolve the original parameter in turn.

Fixes f885637
The latter is already long enough with command line options coverage
alone, and with incoming additions to the parameters documentation now
seems a good time for this extraction.
Most timeouts don't comply, and will be migrated gradually with
deprecated aliases to maintain compatibility.

The resulting list of timeouts will also help us notice what we
may be missing and offer a reliable naming convention for new
timeouts.
We could even change the return value to a string and return where p is
when a match is found, or NULL. This way a call site could check whether
the original identifier or an alias was found, and for example issue a
warning.
Otherwise it breaks the param.reset command.
Conversely, the VCL variable `bereq.first_byte_timeout` is replaced by
`beresp.start_timeout`.
And the VCL variable `bereq.between_bytes_timeout` is in turn replaced
by `beresp.idle_timeout`.
And this time, no impact on VCL code, only VRT.
Convenience for parity with VTCP_set_read_timeout().
Exceptionally, one existing VTC was migrated to the new name immediately
since it relied on debug logs containing 'idle_send_timeout'.

Turning the timeout to an interrupt, I tried to update the parameter
description to better convey what it does. This does not change the
current behavior, but I tried to highlight what is wrong with this
timeout and the mismatch between its purpose for HTTP/1 and h2.

Depending on where we look at the documentation, between the parameter
or VCL variables, we get a slightly different description. None of them
are accurate, but the consensus seems to have always been the parameter
description from before h2 or the 'sess.idle_send_timeout' variable.

For session-level timeouts, we may consider an h2_send_timeout parameter
in the future to distinguish between socket writes and underlying h2
streams timing out on a condvar. This h2_send_timeout parameter would
map to SO_SNDTIMEO for h2 client connections.

Refs varnishcache#2927
Note to self, absolutely no test coverage for this parameter.
Everything mentioned in the commit log where idle_send_timeout is
renamed to resp_idle_timeout applies here as well.
They needed to be taken care of simultaneously to avoid hassles related
to their current cohabitation with resp timeouts.
@nigoroll nigoroll self-requested a review July 4, 2022 13:25
@nigoroll
Copy link
Member

nigoroll commented Jul 4, 2022

👍🏽 in general, but I would like to ask about some divergence from our previous collaboration:

* `between_bytes_timeout` renamed to `beresp_idle_timeout`

I think between bytes is a very good, descriptive name. Our last joint compromise seems to have been beresp_read_timeout, I would still suggest that or beresp_between_bytes_timeout

* `connect_timeout` renamed to `bereq_connect_timeout`

should be backend_connect_timeout, IMHO

* `idle_send_timeout` renamed to `resp_idle_interrupt`

We left the collaboration off with resp_write_timeout, which I still find clearer. Why the change? And, again "idle" is used in so many contexts that I really think we should avoid it except for "idle connections" (e.g. with HTTP/1.1 keepalive). That is sess_idle_timeout, which I like

* `pipe_timeout` renamed to `pipe_idle_timeout`

here I am happy with idle because a pipe not transferring data for some time is "idle"

Copy link
Member

@nigoroll nigoroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see top-level comment

dridi added a commit to dridi/varnish-cache that referenced this pull request Jan 25, 2024
This takes care of documenting the meaning of a zero timeout value in
a central location, since none of the timeout parameters have a mention
for the effect it produces.

This is written as if there was a separate varnish-param(7) manual,
which should eventually be enacted. This manual extraction was initially
implemented in the adjacent varnishcache#3817 timeout nomenclature draft.
dridi added a commit to dridi/varnish-cache that referenced this pull request Jan 31, 2024
This takes care of documenting the meaning of a zero timeout value in
a central location, since none of the timeout parameters have a mention
for the effect it produces.

This is written as if there was a separate varnish-param(7) manual,
which should eventually be enacted. This manual extraction was initially
implemented in the adjacent varnishcache#3817 timeout nomenclature draft.
dridi added a commit that referenced this pull request Jan 31, 2024
This takes care of documenting the meaning of a zero timeout value in
a central location, since none of the timeout parameters have a mention
for the effect it produces.

This is written as if there was a separate varnish-param(7) manual,
which should eventually be enacted. This manual extraction was initially
implemented in the adjacent #3817 timeout nomenclature draft.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants