-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assignment of topics for multidisciplinary datasets #38
Comments
TT-WISMD 2023-04-12:
|
can a blob of data be registered under several hierarchy? i.e. multiple times |
The TT-NWPMD meeting (17.04.2023) suggested that it would be more helpful to allow the datasets be made available, subscribed and notified under several (multiple) hierarchy topics.
TT-NWPMD would also like to seek further clarification how the idea of multidisciplinary token/value will work. |
Having a multidisciplinary definition may lead to providers dumping any data in question to this topic? The lesser evil here could be publishing multiple messages with the same So if a data granule is published that applies to 3 topics, then:
In this manner, we are able to ensure deduplication. This would, however, require WCMP2 |
The current design is, that the data_id is uniquely identifying the data granule. If a cache is receiving three messages in three topics with the same data_id, then the behaviour is, that the cache is downloading the data once, republishes the corresponding message and drops the other two messages as duplicates. |
If we go for this option, multiple messages in the currently existing domains, with the same data_id, then the Global cache should republish the message in each topic hierarchy but only download once. Is that doable ? How to have this behaviour will keeping the current anti duplication of downloads ? We also have to remember the purpose of the topic hierarchy. It is not a way to describe the data (that is the job of the metadata) but to allow I therefore wonder if both "requirements" are consistent with the currently agreed purpose of the topic hierarchy. |
We can implement this change if this is agreed on. However i am not sure this is a good solution. Having different messages in different topics for the same data only increases the number of messages with no additional benefit in my opinion. If data "fits" into several topics, then i would prefer having a decision on which is the correct topic for that data instead of just sending out multiple messages. |
I suggest to wait before implementing anything! This is still under discussion. As explained in my comment above, I have the feeling (I might be wrong though) that we are taking the topic hierarchy discussion on the wrong path. It may eventually look like a second-class metadata record. We should focus on the official metadata for this kind of information. |
@golfvert the readme of this repository states that: |
I don't think that what I wrote contradicts that statement. The data is classified and categorized. We haven't a unique central topic with everything. Then, the question is how far this topic hierarchy should be sufficient to identify the data. |
Proposed decision for the @wmo-im/tt-wismd:
|
@sebvi wants a real-life use case to understand |
related to: wmo-im/wcmp2#94 |
Following the discussion above I struggle to see what I could use the topicHierarchy for. I thought it was for filtering, but if multiple hierarchies are not permitted it won't serve the purpose for many datasets. Furthermore I don't understand the comment above:
I thought is was part of the metadata (https://wmo-im.github.io/wcmp2/standard/wcmp2-DRAFT.html#_topic_hierarchy) and the elements in this is what we use for filtering relevant information. For cryosphere many of the datasets could also be published using weather or climate etc and then you will have to resort to Properties/Themes to really sort what is relevant and just ignore TopicHierarchy. So given the ambiguity for many datasets I am struggling to see the use case for TopicHierarchy. |
Hi @steingod, |
@steingod the TT-WISMD decided recently scale down the multiple uses of topic_hierarchy and this we will remove it as a property in WCMP2 and as a requirement for the data_id in the notification message. See: wmo-im/wcmp2#95. So now, the topic_hierarchy will only be used to identify a channel for pub/sub. |
Thanks for the update, makes sense to me. Concerning pub/sub I do understand it has a specific meaning, but for this to be useful at the practical level, the implementation requires that it is possible to connect datasets to only one channel, else you would anyway have to subscribe to everything and filter afterwards. Removing it as a requirement from WCMP2 makes sense, but how is the relation of datasets and channels addressed to make it consistent across the community(ies)? |
TT-NWPMD meeting (2023.06.13) noted the decision on scaling down the multiple uses of topic hierarchy. TT-NWPMD asks for further clarification on how to solve the original issue. For a dataset of multidisciplinary in nature, which topic should it be associated with? Clear and well-documented guidance would be needed to ensure consistency. |
@wmo-im/tt-nwpmd This will probably be the guidance: If a dataset is multidisciplinary in nature, then choose the best fit for the TH. Think of the TH as a key or identifier for notifications on the cache with some basic meaning, but not a full description. More descriptions about other relevant disciplines will go into the WCMP2 metadata record for that dataset. Currently, the TT-WISMD is considering the best approach for this. Please see this comment in issue # 101 wmo-im/wcmp2#101 (comment). |
Adding my thoughts ... We need to treat each domain separately, so "similar data" from, say, 2 earth system domains would need to be published in places on the topic hierarchy. We shouldn't try to conflate. This solution might not be super elegant, but at least it's predictable for data publishers and data consumers. |
note: currently notifications for the same data (same data-id) would be considered as duplicate by the Global Cache, even if they were published in different topics. Of course code is patient and it could be extended, but we will increase complexity... At least my first feeling would be that multiple publish for the same data (with automatic download of the linked data) is not a good idea. But maybe I am just overlooking a simple solution... |
I am afraid that this can start very complex discussions. Example: precipitation is hydrology and weather. Does this mean that we publish precipitation observations on two topics? We should not build too much around topic and make use of the discovery metadata to inform different communities. I think that this needs to be addressed at the WCMP2 level, not in the topic. |
I VERY strongly supports Enrico's comment... topic hierarchy and messages is about knowing that new dataset is available while providing some filtering capabilities. It is not to describe the data nor to limit its usage. |
I also strongly oppose that we publish messages for the same data in multiple topics. This only adds complexity without providing much advantage. |
I updated the decision in the first issue comment to reflect the general consensus of this group. |
TT-WISMD 2023-06-22:
|
One data should be published with one discipline. If the data is relevant to other discplines, these information should be described in the metadata of the data. (see TT-NWPMD meeting on 05.07.2023) |
Should we close the issue as decided? |
We need the decision reflected in documentation in the resulting specification (once wmo-im/wis2-topic-hierarchy#47 is reviewed/merged). |
TT-WISMD 2023-09-12:
|
TT-WISMD 2023-09-25
|
PR in #39. |
Posting a question from @masato-f29 from the NWPMetadata team.
"Sea surface temperature can be included in weather, climate, and oceans. Should sea surface temperature data be tagged with three controlled vocabularies (CV): weather, climate, and ocean? Or should we make it exclusive and propose the additional CV at level 8?
DECISION
The topic_hierarchy will only be used to identify a channel for pub/sub. When a dataset is applicable under multiple domains, one should choose one domain (that is a best fit) and use the WCMP2 metadata record for further descriptions.
The text was updated successfully, but these errors were encountered: