-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent caching of large datasets in Global Caches #7
Comments
Some possible options to consider:
|
Caches might want to use the length field in the message to decide whether they will be able to handle that data volume. If that field is missing they could still decide if they are going to store the data to the cache after the download finished. |
I vote for option 3 |
Before looking for the right mechanism, I think it makes sense to agree who decides if things are cached. Is it the data publisher? In which case putting a flag in the discovery metadata or notification message would work. Is the cache operator? In which case "message length" could be a useful criteria. The data publisher probably has a better idea of whether the data is real-time or near-real-time. The cache owner is the one impacted by the choice in terms of data down/upload and storage cost. |
And to have the complete picture to consider as well the impact for users. |
It also depends on whether limits will be mandated across all global services, or vary across same. In either case:
|
Hi Tom. Useful points. Thinking out loud ...
|
The From a user perspective, however, this may imply that he would have to subscribe to origin/a/wis2/... for receiving the non cached data. An alternative to my option 3. above is for the WIS2 Node to:
This way, as a user, I don't really care I subscribe to cache/a/wis2/... and I will receive the correct links in the message. Like that, WIS2 Node is in control, Global Cache follows a simple rule, user doesn't require to know about this subtlety. |
That's a neat solution. It means that data consumer wanting core data doesn't have to worry about whether it's on We just need to clearly document the (counter intuitive) situation of when a It also means that the Global Discovery Catalogue can treat all core data the same - adding an additional actionable link pointing to subscription via the BTW - I'm assuming that the Global Discovery Catalogue adds an actionable link for the associated |
"We just need to clearly document the (counter intuitive) situation..." does it really matter? |
@kaiwirt, is this potential solution (don't cache but re-publish me) a good option for you as a Global Cache centre? |
@golfvert - by documentation yes, I meant mention it in the guide - I'm sure that some Global Cache implementers might not get the point unless we're explicit with the reason. |
To me i think it is ok if caches republish messages without actually downloading and storing the data leaving the message unmodified but the topic. Caches could also indicate this in the message. Having a flag like on_local_cache: true/false |
Sounds good, I just have one addition... |
As ECMWF will have its WIS2 Node soonish, and its data shouldn't be cached, shall we tentatively endorse:
If that is acceptable, then @tomkralidis can amend the WIS2-notification-message repo accordingly. |
We can add this feature. No objections.
If we agree on this procedure we should in the same line agree what to do with recommended messages. In that case I would prefer having the same logic such that a Global Cache is not downloading recommended data, but is republishing the message at cache/a/wis2
|
TT-WISMD 2023-04-12:
Recommendation:
|
ET-W2AT 2023-05-15:
|
A small complement to the summary above:
Item 2. is to make users' life easier. They will keep subscribing to the topic |
Associated PR in wmo-im/wis2-notification-message#46 |
Just for my clarification. We use the same mechanism for recommended data: GC receives message on origin/#, it does not download the data but republishes the (unmodified) message as cache/# |
Not necessarily. |
A producer of core data needs to know in advance whether they will be cached or not. If core data are not cached than the producer will have to accommodate data access from an unknown number of consumers, as opposed to one download from the global cache. The caching of the core data at the Global Caches is a key advantage of the WIS2 architecture from the point of view of a producer of large volumes of such data. |
Decision An increase in the volume of data has to be announced by the provider in advance to allow the GCs to take required measures. ===DECISIONS for core data, add properties.cache (true|false, default=true) to the notification message as decided by data producer |
Guide section for Global Cache operators is updated. Need to put appropriate text in section for data publishers (section 2.6.3) |
Done. New section for Data Publishers includes this information: "Considerations when providing Core data in WIS2" |
Centres like ECMWF or Eumetsat will provide very large amount of core data that nevertheless should not (or may not) be stored in the Global Cache.
In the current approach, there is no mechanism to prevent those kind of data not to end up in the cache.
Right now all data published using messages in the topics
origin/a/wis2/country/centre-id/core/#
will end up being caught by Global Cache and copied.We should define a way to prevent that default behaviour.
updated: 31 May 2023
===DECISIONS
The text was updated successfully, but these errors were encountered: