New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gather 'driving'/'ancillary' data into a single 'inputs' item #15
Conversation
Each input must now feature a 'kind' key alongside 'units' in the definition dictionary for inputs. The 'kind' can take up to one of the three values: 'dynamic', 'static', 'climatologic'. 'dynamic' corresponds to data which features a time dimension of matching resolution with the component, 'static' does not feature a time dimension, and 'climatologic' features a time dimension, but this one represents the frequency of the climatogy covered by the data (e.g. 'monthly', 'seasonal', 'daily', etc.). If the input is of 'climatologic' kind, it must be accompanied by a 'frequency' key, and the corresponding value can be a timedelta object, or one of the two strings: 'seasonal', 'monthly', or 'day_of_year'. If no 'kind' is provided, it is assumed to be a 'dynamic' kind. This is namely useful to provide straight compatibility of this new definition for input data for the `DataComponent`.
Model developers must populate the various class attributes (dictionaries) 'inputs_info', 'parameters_info', etc. Up until now, there was no check on the presence of sufficient information for the proper functioning of their component. E.g. all items in these dictionaries must feature 'units', and more specific cases exist for 'inputs_info' now. So this commit adds a new method for `Component` class, which does those checks in the component's definition.
'ancillary_c' for surface layer remains a 'static' kind of input (i.e. no time dimension), while 'ancillary_b' becomes a 'climatologic' kind of input (i.e. featuring a time dimension but unrelated to the component's resolution, 'monthly' for the 12 months covering a year of climatogy data), this is a new type of input data which was not supported before. All other inputs, with names featuring 'driving' in their name, are all considered a 'dynamic' kind of input.
The netCDF variable names for the altitude/latitude/longitude bounds in the file created with `cf-python` were arbitrary (e.g. 'bounds', 'bounds1', 'bounds2'), so this commit renames them with more explicit names altitude_bounds, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it implements the ideas we discussed over the last week or two. I think uniting the "ancillary" and "driving" data is a good idea as there are a number of areas where the distinction blurs.
Thank you @rich-HJ I wonder whether we need to implement tighter time checks for 'climatologic' kind of data? At the moment, it checks for the right number of values available along the time dimension, but nothing else. For example, when The alternative to tighter checks would be to choose a standard (i.e. default order/start of the year) for seasonal/monthly for HJ, and document it as a requirement somewhere. Maybe this is a non-problem and datasets out there are always following the same order for the seasons, and the same start for a year of climatology? As far as I could see in the CF-conventions, nothing is enforced in that regard, but since 'time' and 'time_bounds' are required for the climatology data, they do not need a standard. |
I think it is fine to assume seasonal is the meteorological definition. DJF, MAM, JJA and SON. If people want as different definition they should have to expertise to implement it. |
As for calendar. Do we need to allow met, hyd? I would stay with calendar to begin with and add options if there is great demand. |
This sounds reasonable to me (i.e. expecting meteorological seasons and calendar year). |
Another question. To support other frequencies (e.g. the MODIS 10-day LAI), I've added support for a But this whole process assumes a 'gregorian' calendar (because this is what So maybe asking for an integer in place of a timedelta is better? But e.g. Not sure what is best here, and what we should support. |
Timedelta assumes a gregorian calendar, which is not flexible, and risks behing inconsistent with the calendar of the component. So for now, there will only be a support for a custom integer value, corresponding to the number of sub-periods for which climatologic values are required in the calendar year. It is imperfect as well, but at least it drops the explicit assumption of a specific calendar.
I dropped the support for timedelta for now. I replaced it by a support for an integer value if 'seasonal', 'monthly', or 'day_of_year' are not enough. The framework will check that the =the length of the time dimension in the dataset corresponds to this integer value. |
resolve #7
The definition of a
Component
used to distinguish between 'driving_data' and 'ancillary_data', but this distinction was rather ambiguous (where would climatology data fit in?), community-specific (mostly UM world?), and limited (no time dimension allowed for ancillary).A component is now defined by just one item 'inputs' for the data given to it. Each input must be given a 'units' metadata (as was already the case) and a 'kind' metadata (newly added). The 'kind' can be:
SpaceDomain
, and for every time step of component'sTimeDomain
(i.e. both time and space dimensions are expected for the data array)SpaceDomain
(i.e. only space dimensions are expected for the data array)SpaceDomain
, and for a given number of sub-periods in a year period (i.e. both time and space dimensions expected for the data array, but length of time dimension not equal to number of time steps) – sub-periods defined in an additional 'frequency' metadata, e.g. 'seasonal', 'monthly', 'day_of_year', timedelta(days=7), etc.The definition of a component's inputs would look like this:
The distinction of 'inputs' into kinds allows for some checks on the compatibility between the data given and what the component needs. For 'dynamic' a full space and time check can be done, for 'static' a space check can be done (a time dimension may or may not exist, but if it does, it must be of size one), and for 'climatologic' a space check can be done alongside a check on the length of the time dimension compared to the expectation.