-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial WIS 2.0 metadata/search brainstorming/ideas #1
Comments
Further discussion with @efucile (2021-02-15) (cc @petersilva)
|
https://github.com/wmo-im/GTStoWIS2#conventions better to get the shared repo, than my personal repo. as discussed with @tomkralidis: The tables from WMO 386 Attachment II-5 are in the GTStoWIS2 folder in JSON format, and are chained together. Somebody should be able to string the tables together to produce one big table of all possible topics, but I remember @antje-s doing something akin to that, but it resulted in impractically large tables. I think it would have to be done with a keen appreciatiation for how all the tables link together, it is perhaps not so large then. |
I just went onto my server behind my experimental prototype ( https://hpfx.collab.science.gc.ca/~pas037/WMO_Sketch )
for most countries, the hierarchy in a given hour is relatively simple...
My prototype feed is from the UNIDATA, so heavily biased with US data. when I look at the tree:
There is a lot of:
One of the things we were debating is whether it makes sense to have the prediction hour in the topic hierarchy. I think this is more practical, but it runs counter to the idea of metadata being "very granular" ... I think the temporal information is too granular for inclusion in the topic tree, but would appreciate other views. |
I agree with removal of the forecast hour from the tree. It seems more practical to leave possible forecast hour filtering, if needed, at the client's discretion. In fact, NWP model data are a good candidate for distribution through a service. Once this happens, then instead of sending 100s (1000s) of notification for each model run (about 100s/1000s of new files) your system will send one notification telling that the service has a new run available. For now, those who want just a subset of forecast hours will discard 100s/1000s of small notifications few times per day. |
note... the number of notifications is not changed... it is just that all of the outputs will be under the same topic, with different file names. You will subscribe to the KWEC/model-regional/0-90n/0-90w (aka atlantic ocean above the equatorish...) and there will be a file for each hour published under the same topic. For what it is worth, in operational forecasting, the 6 hour forecast is available before the 12 hour, they 18hour etc... so one announcement for the entire run would be unsuitable for real-time use, as it could delay transmission for ... usually upto an hour or so (I don't know about other countries, but in Canada, the "regional" (adaptive grid over North America) run is about 45 minutes long, the global (analogous to ECMWF guidance) is around 90 minutes long.) There are also more localized grids that have similar performance profiles to the regional (aka HRDPS.) |
Sorry, I did not check how this particular model is distributed. If its one file per forecast hour, then it is fine. My point was that some models produce hundreds of files per run. It is not a problem to filter out such number of notifications on client, but still it would nicer if the service just sent a notification when some new logical set of data becomes available. But we have diverged. |
When it concerns granularity mentioned by @tomkralidis. In WIS 1.0 we have one metadata records per GTS bulletin, but many of them are logically from the same category - e.g. surface observation from a certain part of the world. So in the proposed topic hierarchy they will naturally form sub-trees and so the subscribing will be easier. |
I think this is intimately related to: wmo-im/GTStoWIS2#9 |
thanks @josusky. IMO we want a higher level of granularity so the WIS 2.0 catalogue does not become a bulletin search API, but a yellow pages so one can find/bind accordingly. |
If generating a 'supertable' is too large, can we describe the tables in question (C1, C2, C3, C6, C7, etc.) and their relationship? Perhaps this is described at https://github.com/wmo-im/GTStoWIS2#conventions ? |
Summary of the table linkages from WMO 386 Volume I Attachment II-5:
If we drop hours, then Tables C4, and C5 disappear, how big is a supertablr? In the GTS2WIS module, @antje-s has already merged all of the table B's into one TableB that is about 400 lines.. or so. TableA could be merged into TableB for about 4*26=104 entries... so about 504 for a hypothetical TableAB. so the total of a single recursive JSON array merging all the tables into one big one is: 504+33000+308+242+91=34135 Then there are 6000 known origin codes (CCCC) of 15K known airports... that could originate such products in theory. |
35K is tractable (this is the size of NASA GCMD for example). Can we have workflow that autogen's the supertable from the smaller tables (I suppose easier to manage that way as well)? |
I made the code to do this in issue009 branch on GTStoWIS2. You can clone and reproduce it... it's around 277KB (only 17000 entries in the end... some math might have been wrong) with the tables in their current state. I had to add D1 and D2 tables, which were missing. Also there are some cases where there is a comparison to do (ii < 49, for example) where only the threshold is included... so it might be wrong for those cases. Unclear to me how it can be used for now. |
Some comments on the ideas from above... Metadata Standards Catalogue options Definitive WIS catalogue |
@wmo-im/tt-wismd / @wmo-im/tt-wigosmd in relation to WIS 2.0 and the metadata search demonstration project, notes from initial discussion with discussion with @6a6d74 (2020-12-15).
Note that these are initial ideas only for discussion with ET-Metadata. Please review and provide your thoughts and perspectives here, thanks.
Drivers
Metadata Standards
Harvesting
Catalogue options
The browser as the catalogue
Definitive WIS catalogue
Guidance and support to members
The text was updated successfully, but these errors were encountered: