-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic support for the Cache-Control directives #1953
Conversation
The HTTP `cache-control` header has a well-defined set of potential values. This helper class will allow us to central its parsing.
This provides the PFC with the ability to parse the `cache-control` CGI, using the XrdOucCacheDirective helper class, and understand the no-store and no-cache directives. *Note* this handles the simple cases only -- if the file in the cache was already opened by another client then we will serve from that file according to the original open rules.
@osschar - can you please review? |
CacheDirective part is good ... now I have some comments :) This implements no-cache and no-store in the same way. From MDN docs:
In this PR both 1. and 2. prevent the XrdPfcFile (for potentially other attached IOs, especially in forwarding mode proxies) from either using the cached data or writing any data -- each request is forwarded to the remote and then not written into the cache (see also note [1] below). The requests are also still broken into cache blocks which is rather unnecessary. I do not think this should be propagated to XrdPfcFile -- instead, it should be resolved at the level of the IO object by simply forwarding the requests to the remote (through the original IO). Now, if this is a direct mode proxy, several input connections will get piled over on the same IO and cache commands will only get parsed for the first one. To solve this, we would need to parse the directives beforehand and connect each client to appropriate cache IO object. In this case (to satisfy functionality introduced in this PR) we would have two: a) standard one; and b) no-cache-or-store one. Alternatively, if this is possible, case b) could use the memory-cache implementation. This would satisfy ?no-cache and ?no-cache&no-store (as we can not re-check file state at the origin). For ?no-store we could forward the no-store state along with the ReadRequest. As I've been writing this ... it seems increasingly that this extensions make more sense for forwarding mode caches. How were you actually envisioning these to be used? You see my point, http proxy caches urls (server+object-name) ... xcache only caches object-names that were, up to now, assumed to be immutable (as HEP data usually is, if you have a new version of the file it is either in another directory or has a different name). @abh3 if there are URL arguments in the open request -- does this affect selection of IO object in direct mode proxies? I.e., would open("/foo.root?a") and open("/foo.root?b") use the same IO object? Notes
|
I think you figured out the reason why I implemented no-cache and no-store as the same … no-store behavior should be sufficient to meet the no-cache requirements. The use I'm thinking of is mostly along the lines of monitoring and debugging. Hence, I don’t mind the fact it’s not highly optimized. Is there anything wrong with still using the same code paths? |
Ah, monitoring & debugging, makes sense. I'd still prefer to have the no-store flag in the IO, modifying the File state for all connected clients seems like a nasty thing to do, especially for large files that are used frequently. Let me see with Andy if one could treat requests with certain cache-control options as forwarding-mode requests ... so they get their own IO and we can do things there. |
On Mon, 13 Mar 2023, Matev? Tadel wrote:
@abh3 if there are URL arguments in the open request -- does this
affect selection of IO object in direct mode proxies? I.e., would
open("/foo.root?a") and open("/foo.root?b") use the same IO object?
We have no special CGI for anything but POSC and space allocation (space
name and amount of space). What is being discussed here seems to be
spilling across several PR's. I've already commented on one of them. I
think, as you alluded, we should be wary of implementing complex logic
when there is no obvious use case for it. Also, I do agree, most of this
makes sense for forwardng proxies not ones that have a fixed origin.
#### Notes
…
1. It is unclear how to properly implement no-cache in XCache case: even if we were to do stat on some remote replica the most we could check would be file-size -- unless we required a checksum support and also stored it in the pfc info file. I assume this is why you implemented it as always requesting data from the remote. Still, with no-cache it could still be allowed to be stored (but could conflict with the existing data in the cache and that's why you decided to drop it).
--
Reply to this email directly or view it on GitHub:
#1953 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
|
Perhaps we can have a brain storming session dring the Workshop. Brian
when are you arriving? Matevz and I will be there the Saturday before and
will certainly have Monday, Tuesday, and Thrsday evening available.
…On Tue, 14 Mar 2023, Matev? Tadel wrote:
Ah, monitoring & debugging, makes sense. I'd still prefer to have the no-store flag in the IO, modifying the File state for all connected clients seems like a nasty thing to do, especially for large files that are used frequently.
Let me see with Andy if one could treat requests with certain cache-control options as forwarding-mode requests ... so they get their own IO and we can do things there.
--
Reply to this email directly or view it on GitHub:
#1953 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
|
As to whether CGI arguments affect IO object selection, they do not. |
I should get into town Sunday afternoon. Would be happy to have a brainstorming session then or later on during the week. I think this should all be solvable, probably much faster in person! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously, I was reviewing these top down instead of bottom up. I suppose it didn't matter as all these pull requests came in all at once (for all practical purposes). As outlined in the max-age comment, this isn't the architectural approach used with the server. We don't force specific protocol elements all the way through the stack but look at simplifying how that information goes through in a standard way. That current way is using simple cgi elements as we can easily transmit them through the whole call stack. It also allows different protocols to use the same code paths. This patch introduces a whole new way of relaying information and that really is not the desired approach within the server. We can discus this offline whenever you want.
Closing this one out -- Alja has been picking up where I left off in separate PRs. |
HTTP provides a standard set of commands, in the
Cache-Control
header, for manipulating the behavior of a cache.This adds a framework for using these commands and adds preliminary support for the
no-store
/no-cache
directives as requested in #1886 by @jthiltges.This purposely does not address some of the difficult concurrency issues (how to behave when a client reads from an already-open file handle) or implement
max-age
; those are intended for follow-up work.Fixes #1886.