Add basic support for the Cache-Control directives #1953

bbockelm · 2023-03-11T17:14:10Z

HTTP provides a standard set of commands, in the Cache-Control header, for manipulating the behavior of a cache.

This adds a framework for using these commands and adds preliminary support for the no-store / no-cache directives as requested in #1886 by @jthiltges.

This purposely does not address some of the difficult concurrency issues (how to behave when a client reads from an already-open file handle) or implement max-age; those are intended for follow-up work.

Fixes #1886.

The HTTP `cache-control` header has a well-defined set of potential values. This helper class will allow us to central its parsing.

This provides the PFC with the ability to parse the `cache-control` CGI, using the XrdOucCacheDirective helper class, and understand the no-store and no-cache directives. *Note* this handles the simple cases only -- if the file in the cache was already opened by another client then we will serve from that file according to the original open rules.

Fixes xrootd#1886

bbockelm · 2023-03-11T17:14:51Z

@osschar - can you please review?

osschar · 2023-03-13T23:10:52Z

CacheDirective part is good ... now I have some comments :)

This implements no-cache and no-store in the same way. From MDN docs:

no-cache - re-check file/object on origin; and
no-store - do not write to disk what was read in this request.

In this PR both 1. and 2. prevent the XrdPfcFile (for potentially other attached IOs, especially in forwarding mode proxies) from either using the cached data or writing any data -- each request is forwarded to the remote and then not written into the cache (see also note [1] below). The requests are also still broken into cache blocks which is rather unnecessary.

I do not think this should be propagated to XrdPfcFile -- instead, it should be resolved at the level of the IO object by simply forwarding the requests to the remote (through the original IO).

Now, if this is a direct mode proxy, several input connections will get piled over on the same IO and cache commands will only get parsed for the first one. To solve this, we would need to parse the directives beforehand and connect each client to appropriate cache IO object. In this case (to satisfy functionality introduced in this PR) we would have two: a) standard one; and b) no-cache-or-store one.

Alternatively, if this is possible, case b) could use the memory-cache implementation.

This would satisfy ?no-cache and ?no-cache&no-store (as we can not re-check file state at the origin). For ?no-store we could forward the no-store state along with the ReadRequest.

As I've been writing this ... it seems increasingly that this extensions make more sense for forwarding mode caches. How were you actually envisioning these to be used? You see my point, http proxy caches urls (server+object-name) ... xcache only caches object-names that were, up to now, assumed to be immutable (as HEP data usually is, if you have a new version of the file it is either in another directory or has a different name).

@abh3 if there are URL arguments in the open request -- does this affect selection of IO object in direct mode proxies? I.e., would open("/foo.root?a") and open("/foo.root?b") use the same IO object?

Notes

It is unclear how to properly implement no-cache in XCache case: even if we were to do stat on some remote replica the most we could check would be file-size -- unless we required a checksum support and also stored it in the pfc info file. I assume this is why you implemented it as always requesting data from the remote. Still, with no-cache it could still be allowed to be stored (but could conflict with the existing data in the cache and that's why you decided to drop it).

bbockelm · 2023-03-13T23:52:08Z

I think you figured out the reason why I implemented no-cache and no-store as the same … no-store behavior should be sufficient to meet the no-cache requirements.

The use I'm thinking of is mostly along the lines of monitoring and debugging. Hence, I don’t mind the fact it’s not highly optimized. Is there anything wrong with still using the same code paths?

osschar · 2023-03-14T08:21:45Z

Ah, monitoring & debugging, makes sense. I'd still prefer to have the no-store flag in the IO, modifying the File state for all connected clients seems like a nasty thing to do, especially for large files that are used frequently.

Let me see with Andy if one could treat requests with certain cache-control options as forwarding-mode requests ... so they get their own IO and we can do things there.

abh3 · 2023-03-14T15:23:58Z

On Mon, 13 Mar 2023, Matev? Tadel wrote: @abh3 if there are URL arguments in the open request -- does this affect selection of IO object in direct mode proxies? I.e., would open("/foo.root?a") and open("/foo.root?b") use the same IO object?

We have no special CGI for anything but POSC and space allocation (space name and amount of space). What is being discussed here seems to be spilling across several PR's. I've already commented on one of them. I think, as you alluded, we should be wary of implementing complex logic when there is no obvious use case for it. Also, I do agree, most of this makes sense for forwardng proxies not ones that have a fixed origin. #### Notes

…

1. It is unclear how to properly implement no-cache in XCache case: even if we were to do stat on some remote replica the most we could check would be file-size -- unless we required a checksum support and also stored it in the pfc info file. I assume this is why you implemented it as always requesting data from the remote. Still, with no-cache it could still be allowed to be stored (but could conflict with the existing data in the cache and that's why you decided to drop it). -- Reply to this email directly or view it on GitHub: #1953 (comment) You are receiving this because you were mentioned. Message ID: ***@***.***>

abh3 · 2023-03-14T15:26:39Z

Perhaps we can have a brain storming session dring the Workshop. Brian when are you arriving? Matevz and I will be there the Saturday before and will certainly have Monday, Tuesday, and Thrsday evening available.

…

On Tue, 14 Mar 2023, Matev? Tadel wrote: Ah, monitoring & debugging, makes sense. I'd still prefer to have the no-store flag in the IO, modifying the File state for all connected clients seems like a nasty thing to do, especially for large files that are used frequently. Let me see with Andy if one could treat requests with certain cache-control options as forwarding-mode requests ... so they get their own IO and we can do things there. -- Reply to this email directly or view it on GitHub: #1953 (comment) You are receiving this because you were mentioned. Message ID: ***@***.***>

abh3 · 2023-03-17T20:37:16Z

As to whether CGI arguments affect IO object selection, they do not.

bbockelm · 2023-03-19T00:18:21Z

Perhaps we can have a brain storming session during the Workshop. Brian when are you arriving?

I should get into town Sunday afternoon. Would be happy to have a brainstorming session then or later on during the week.

I think this should all be solvable, probably much faster in person!

abh3

Obviously, I was reviewing these top down instead of bottom up. I suppose it didn't matter as all these pull requests came in all at once (for all practical purposes). As outlined in the max-age comment, this isn't the architectural approach used with the server. We don't force specific protocol elements all the way through the stack but look at simplifying how that information goes through in a standard way. That current way is using simple cgi elements as we can easily transmit them through the whole call stack. It also allows different protocols to use the same code paths. This patch introduces a whole new way of relaying information and that really is not the desired approach within the server. We can discus this offline whenever you want.

bbockelm · 2024-02-13T13:47:32Z

Closing this one out -- Alja has been picking up where I left off in separate PRs.

bbockelm added 3 commits March 11, 2023 11:13

Add helper class for parsing cache-control header

1e20fbd

The HTTP `cache-control` header has a well-defined set of potential values. This helper class will allow us to central its parsing.

Add support for the cache-control header

e63f575

Fixes xrootd#1886

bbockelm mentioned this pull request Mar 11, 2023

Implement only-if-cached mode for HTTP #1954

Closed

abh3 requested review from ccaffy, osschar and abh3 March 12, 2023 06:19

bbockelm mentioned this pull request Mar 13, 2023

Implement max-age directive #1957

Closed

abh3 reviewed Apr 14, 2023

View reviewed changes

bbockelm closed this Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic support for the Cache-Control directives #1953

Add basic support for the Cache-Control directives #1953

bbockelm commented Mar 11, 2023

bbockelm commented Mar 11, 2023

osschar commented Mar 13, 2023

bbockelm commented Mar 13, 2023

osschar commented Mar 14, 2023

abh3 commented Mar 14, 2023 via email

abh3 commented Mar 14, 2023 via email

abh3 commented Mar 17, 2023

bbockelm commented Mar 19, 2023

abh3 left a comment

bbockelm commented Feb 13, 2024

Add basic support for the Cache-Control directives #1953

Add basic support for the Cache-Control directives #1953

Conversation

bbockelm commented Mar 11, 2023

bbockelm commented Mar 11, 2023

osschar commented Mar 13, 2023

Notes

bbockelm commented Mar 13, 2023

osschar commented Mar 14, 2023

abh3 commented Mar 14, 2023 via email

abh3 commented Mar 14, 2023 via email

abh3 commented Mar 17, 2023

bbockelm commented Mar 19, 2023

abh3 left a comment

Choose a reason for hiding this comment

bbockelm commented Feb 13, 2024