Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provisions for regional caching #93

Closed
skef opened this issue May 17, 2022 · 7 comments
Closed

Provisions for regional caching #93

skef opened this issue May 17, 2022 · 7 comments

Comments

@skef
Copy link
Contributor

skef commented May 17, 2022

In many cases a cache hit on an augmentation set will be very unlikely. However, there are cases where caching, and in particular regional caching (e.g. Akamai), can become not just relevant but important to performance. For example, when the initial subset is for a home page of a media service and the first augmentation is for the top article. Or when the initial subset is for the home page of a company and the first augmentation is for a popular tab or the first area added dynamically when scrolling.

Whether a request can be cached by existing services often depends on the request type (GET may be required) and the URL length. At present the spec is flexible when it comes to request method and the parameters are compressed but there are no length guarantees and some cases where the codepoints and indices will be fragmented in a way that takes more bytes to specify.

Additionally, any requirements for caching will typically be understood by the server rather than the client.

One way of addressing this need is to allow the server to respond to a GET or POST request with a different, cache-compatible URL to be used for the actual download of subset or augmentation data. This is similar to a temporary redirect, and in fact one of the existing redirect codes may suffice for this purpose (I unfortunately don't know off-hand).

(This alternative URL could contain a hash value generated from a canonicalized representation of the request, with the map stored on the server side so that the response can be regenerated on a cache miss. That possibility helps illustrate how the mechanism could work but I see no need to constrain the server-side implementation in the spec. Adobe's system has, at times, used a system like this to remain regional-cache-compatible.)

I suggest adding a section to the specification that addresses regional caching along the lines above. If we're confident that one of the existing redirect codes suffices then that section might just be informational. If we are not confident then a modest extension to the protocol may be needed.

@garretrieger
Copy link
Contributor

garretrieger commented May 21, 2022

Thanks, this is a use case that we haven't put a lot of thought into yet. I have a concern with the redirection approach though as it introduces an additional roundtrip which can be quite costly for latency. Also we've taken great pains to make the protocol completely stateless (that is a server isn't required to store state to implement a conforming implementation, but it is allowed for a server to have state if it wants). I'll need to think about this some more and see if there's a way to better accommodate first request caching without introducing a redirect.

If we do decide to allow for caching via a redirect, then I think we could just update the specification saying it's allowable for the server to return a HTTP redirect as long as the url it redirects too contains a valid response, but otherwise make no requirements for how the redirected url works (for example the identifiers used in the cache url).

@garretrieger
Copy link
Contributor

Ah so the spec already talks about how redirects are handled in https://w3c.github.io/IFT/Overview.html#invalid-server-response. That is it allows redirects from the server.

So doing caching by redirecting to a cacheable URL is currently supported.

@skef skef mentioned this issue Jun 28, 2022
@skef
Copy link
Contributor Author

skef commented Jun 28, 2022

I was asked to add more detail to this.

Support for caching may be more important for larger-scale services than some might expect. Adobe's system has been overhauled several times specifically to increase the amount of caching and probably will be overhauled again in the future.

I think there are three general areas of support for caching, the first being what has already been discussed:

  1. Redirects to cached or cacheable URLs. A facility built on redirects has the advantages of server-side logic, which may include "super-setting" (including more code points and perhaps more features than strictly requested to reduce the frequency of subsequent requests and also the number of cached files). It has the disadvantages of a round trip and server-side state of some duration.
  2. Canonical hash of the parameters as an extra URL parameter: Some caching services can respond directly based on a subset of URL parameters specified by their "client" -- which in this case is the operator of the server. If a hash of the subset parameters were added to every request its could be used as a key for hashing. This has the advantage of avoiding the round trip but the disadvantage of only partial support for "super-setting" -- the server-side can reduce the frequency of subsequent requests but requests with "common supersets" are not folded under the same cache key. I'm also not sure it's currently possible to do URL-key-based caching of POST requests, even if the key is always in the URL.
  3. A-la-carte IFT: Let the server influence the client with javascript code to pick which IFT features it wants to use and which it wants to do itself, or to influence the request (e.g. with client-side "super-setting"). I've filed issue A-la-carte IFT #103 on this subject.

@garretrieger
Copy link
Contributor

Some good news on this front I was recently pointed towards a new HTTP method QUERY (https://httpwg.org/http-extensions/draft-ietf-httpbis-safe-method-w-body.html) which works much better for caching then GET/POST. It encodes the request data like POST but retains the cache ability of GET requests.

I'd like to investigate it a bit more, but I'm strongly considering changing the specification to either exclusively use QUERY or for it at least to be the recommended method with GET/POST as fallbacks.

The QUERY spec specifically recommends that normalization should be applied to the request body to generate the cache key (https://httpwg.org/http-extensions/draft-ietf-httpbis-safe-method-w-body.html#section-2.1), I think this would solve many of the problems that you mentioned. For example normalization could upgrade the requested codepoint set to a cacheable superset that is known will be served by the backend server.

For the three options you mentioned:

  1. Should be supported in the spec currently.
  2. Not currently supported by the spec, but something that we could definitely consider adding. If we switch to QUERY that will solve the POST problem.
  3. As mentioned in the a-la-carte issue a javascript API is out of scope for this spec, but is likely something we want as a separate specification.

@skef
Copy link
Contributor Author

skef commented Jul 1, 2022

Some good news on this front I was recently pointed towards a new HTTP method QUERY

That does look good!

@svgeesus
Copy link
Contributor

Tagging @martinthomson

@garretrieger
Copy link
Contributor

The current plan for enabling caching for patch subset will be either:

One additional development that's related is the investigation I made into using precompressed brotli metablocks to partially cache portions of the font.

I'm going to close this issue for now as the remaining work will be tracked in the QUERY issue (#127), but please reopen if you think additional changes are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants