Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic support for the Cache-Control directives #1953

Closed
wants to merge 3 commits into from

Conversation

bbockelm
Copy link
Contributor

HTTP provides a standard set of commands, in the Cache-Control header, for manipulating the behavior of a cache.

This adds a framework for using these commands and adds preliminary support for the no-store / no-cache directives as requested in #1886 by @jthiltges.

This purposely does not address some of the difficult concurrency issues (how to behave when a client reads from an already-open file handle) or implement max-age; those are intended for follow-up work.

Fixes #1886.

The HTTP `cache-control` header has a well-defined set of potential
values.  This helper class will allow us to central its parsing.
This provides the PFC with the ability to parse the `cache-control`
CGI, using the XrdOucCacheDirective helper class, and understand
the no-store and no-cache directives.

*Note* this handles the simple cases only -- if the file in the
cache was already opened by another client then we will serve from that
file according to the original open rules.
@bbockelm
Copy link
Contributor Author

@osschar - can you please review?

@osschar
Copy link
Contributor

osschar commented Mar 13, 2023

CacheDirective part is good ... now I have some comments :)

This implements no-cache and no-store in the same way. From MDN docs:

  1. no-cache - re-check file/object on origin; and
  2. no-store - do not write to disk what was read in this request.

In this PR both 1. and 2. prevent the XrdPfcFile (for potentially other attached IOs, especially in forwarding mode proxies) from either using the cached data or writing any data -- each request is forwarded to the remote and then not written into the cache (see also note [1] below). The requests are also still broken into cache blocks which is rather unnecessary.

I do not think this should be propagated to XrdPfcFile -- instead, it should be resolved at the level of the IO object by simply forwarding the requests to the remote (through the original IO).

Now, if this is a direct mode proxy, several input connections will get piled over on the same IO and cache commands will only get parsed for the first one. To solve this, we would need to parse the directives beforehand and connect each client to appropriate cache IO object. In this case (to satisfy functionality introduced in this PR) we would have two: a) standard one; and b) no-cache-or-store one.

Alternatively, if this is possible, case b) could use the memory-cache implementation.

This would satisfy ?no-cache and ?no-cache&no-store (as we can not re-check file state at the origin). For ?no-store we could forward the no-store state along with the ReadRequest.

As I've been writing this ... it seems increasingly that this extensions make more sense for forwarding mode caches. How were you actually envisioning these to be used? You see my point, http proxy caches urls (server+object-name) ... xcache only caches object-names that were, up to now, assumed to be immutable (as HEP data usually is, if you have a new version of the file it is either in another directory or has a different name).

@abh3 if there are URL arguments in the open request -- does this affect selection of IO object in direct mode proxies? I.e., would open("/foo.root?a") and open("/foo.root?b") use the same IO object?

Notes

  1. It is unclear how to properly implement no-cache in XCache case: even if we were to do stat on some remote replica the most we could check would be file-size -- unless we required a checksum support and also stored it in the pfc info file. I assume this is why you implemented it as always requesting data from the remote. Still, with no-cache it could still be allowed to be stored (but could conflict with the existing data in the cache and that's why you decided to drop it).

@bbockelm
Copy link
Contributor Author

I think you figured out the reason why I implemented no-cache and no-store as the same … no-store behavior should be sufficient to meet the no-cache requirements.

The use I'm thinking of is mostly along the lines of monitoring and debugging. Hence, I don’t mind the fact it’s not highly optimized. Is there anything wrong with still using the same code paths?

@osschar
Copy link
Contributor

osschar commented Mar 14, 2023

Ah, monitoring & debugging, makes sense. I'd still prefer to have the no-store flag in the IO, modifying the File state for all connected clients seems like a nasty thing to do, especially for large files that are used frequently.

Let me see with Andy if one could treat requests with certain cache-control options as forwarding-mode requests ... so they get their own IO and we can do things there.

@abh3
Copy link
Member

abh3 commented Mar 14, 2023 via email

@abh3
Copy link
Member

abh3 commented Mar 14, 2023 via email

@abh3
Copy link
Member

abh3 commented Mar 17, 2023

As to whether CGI arguments affect IO object selection, they do not.

@bbockelm
Copy link
Contributor Author

Perhaps we can have a brain storming session during the Workshop. Brian when are you arriving?

I should get into town Sunday afternoon. Would be happy to have a brainstorming session then or later on during the week.

I think this should all be solvable, probably much faster in person!

Copy link
Member

@abh3 abh3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously, I was reviewing these top down instead of bottom up. I suppose it didn't matter as all these pull requests came in all at once (for all practical purposes). As outlined in the max-age comment, this isn't the architectural approach used with the server. We don't force specific protocol elements all the way through the stack but look at simplifying how that information goes through in a standard way. That current way is using simple cgi elements as we can easily transmit them through the whole call stack. It also allows different protocols to use the same code paths. This patch introduces a whole new way of relaying information and that really is not the desired approach within the server. We can discus this offline whenever you want.

@bbockelm
Copy link
Contributor Author

Closing this one out -- Alja has been picking up where I left off in separate PRs.

@bbockelm bbockelm closed this Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include Cache-Control support for XrdHttp to invalidate cached data
3 participants