Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather 'driving'/'ancillary' data into a single 'inputs' item #15

Merged
merged 12 commits into from Dec 2, 2020

Conversation

ThibHlln
Copy link
Member

@ThibHlln ThibHlln commented Nov 25, 2020

resolve #7

The definition of a Component used to distinguish between 'driving_data' and 'ancillary_data', but this distinction was rather ambiguous (where would climatology data fit in?), community-specific (mostly UM world?), and limited (no time dimension allowed for ancillary).

A component is now defined by just one item 'inputs' for the data given to it. Each input must be given a 'units' metadata (as was already the case) and a 'kind' metadata (newly added). The 'kind' can be:

  • 'dynamic': data for every spatial element of component's SpaceDomain, and for every time step of component's TimeDomain (i.e. both time and space dimensions are expected for the data array)
  • 'static': data for every spatial element of component's SpaceDomain (i.e. only space dimensions are expected for the data array)
  • 'climatologic': data for every spatial element of component's SpaceDomain, and for a given number of sub-periods in a year period (i.e. both time and space dimensions expected for the data array, but length of time dimension not equal to number of time steps) – sub-periods defined in an additional 'frequency' metadata, e.g. 'seasonal', 'monthly', 'day_of_year', timedelta(days=7), etc.

The definition of a component's inputs would look like this:

inputs_info = {
    'rainfall': {
        'units': 'kg m-2 s-1',
        'kind': 'dynamic'
    },
    'elevation': {
        'units': 'm above sea level',
        'kind': 'static'
    },
    'leaf_area_index': {
        'units': '1',
        'kind': 'climatologic',
        'frequency': 'monthly'
    }
}

The distinction of 'inputs' into kinds allows for some checks on the compatibility between the data given and what the component needs. For 'dynamic' a full space and time check can be done, for 'static' a space check can be done (a time dimension may or may not exist, but if it does, it must be of size one), and for 'climatologic' a space check can be done alongside a check on the length of the time dimension compared to the expectation.

Thibault Hallouin added 8 commits November 24, 2020 16:15
Each input must now feature a 'kind' key alongside 'units' in the definition dictionary for inputs.

The 'kind' can take up to one of the three values: 'dynamic', 'static', 'climatologic'. 'dynamic' corresponds to data which features a time dimension of matching resolution with the component, 'static' does not feature a time dimension, and 'climatologic' features a time dimension, but this one represents the frequency of the climatogy covered by the data (e.g. 'monthly', 'seasonal', 'daily', etc.). If the input is of 'climatologic' kind, it must be accompanied by a 'frequency' key, and the corresponding value can be a timedelta object, or one of the two strings: 'seasonal', 'monthly', or 'day_of_year'.

If no 'kind' is provided, it is assumed to be a 'dynamic' kind. This is namely useful to provide straight compatibility of this new definition for input data for the `DataComponent`.
Model developers must populate the various class attributes (dictionaries) 'inputs_info', 'parameters_info', etc.

Up until now, there was no check on the presence of sufficient information for the proper functioning of their component. E.g. all items in these dictionaries must feature 'units', and more specific cases exist for 'inputs_info' now.

So this commit adds a new method for `Component` class, which does those checks in the component's definition.
'ancillary_c' for surface layer remains a 'static' kind of input (i.e. no time dimension), while 'ancillary_b' becomes a 'climatologic' kind of input (i.e. featuring a time dimension but unrelated to the component's resolution, 'monthly' for the 12 months covering a year of climatogy data), this is a new type of input data which was not supported before. All other inputs, with names featuring 'driving' in their name, are all considered a 'dynamic' kind of input.
The netCDF variable names for the altitude/latitude/longitude bounds in the file created with `cf-python` were arbitrary (e.g. 'bounds', 'bounds1', 'bounds2'), so this commit renames them with more explicit names altitude_bounds, etc.
Copy link
Collaborator

@rich-HJ rich-HJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it implements the ideas we discussed over the last week or two. I think uniting the "ancillary" and "driving" data is a good idea as there are a number of areas where the distinction blurs.

@ThibHlln
Copy link
Member Author

Thank you @rich-HJ

I wonder whether we need to implement tighter time checks for 'climatologic' kind of data? At the moment, it checks for the right number of values available along the time dimension, but nothing else.

For example, when 'frequency': 'seasonal', it will check for 4 values available, but it does not check if it is MAM-JJA-SON-DJF, or another order. Likewise, when 'frequency': 'monthly', it will check for 12 values, but it does not check if a calendar year, a meteorological year, or a hydrological year is considered.

The alternative to tighter checks would be to choose a standard (i.e. default order/start of the year) for seasonal/monthly for HJ, and document it as a requirement somewhere. Maybe this is a non-problem and datasets out there are always following the same order for the seasons, and the same start for a year of climatology? As far as I could see in the CF-conventions, nothing is enforced in that regard, but since 'time' and 'time_bounds' are required for the climatology data, they do not need a standard.

@rich-HJ
Copy link
Collaborator

rich-HJ commented Nov 26, 2020

I think it is fine to assume seasonal is the meteorological definition. DJF, MAM, JJA and SON. If people want as different definition they should have to expertise to implement it.

@rich-HJ
Copy link
Collaborator

rich-HJ commented Nov 26, 2020

As for calendar. Do we need to allow met, hyd? I would stay with calendar to begin with and add options if there is great demand.

@ThibHlln
Copy link
Member Author

This sounds reasonable to me (i.e. expecting meteorological seasons and calendar year).
I am going to document that in the docstring of Component for the argument dataset.

@ThibHlln
Copy link
Member Author

ThibHlln commented Nov 26, 2020

Another question.

To support other frequencies (e.g. the MODIS 10-day LAI), I've added support for a datetime.timedelta in frequency. This infers the length of the time dimension by using the floor division of 366 days by this timedelta (giving the number of full sub-periods of length timedelta), and then by adding one if the remainder of the division is not 0 (to cover the last sub-period of length less than timedelta).

But this whole process assumes a 'gregorian' calendar (because this is what datetime is based on). But the TimeDomain of the component could be in another calendar, which is not very consistent.

So maybe asking for an integer in place of a timedelta is better? But e.g. timedelta(days=10) should be 37 in a gregorian calendar, but only 36 in a 360-day calendar.

Not sure what is best here, and what we should support.

Timedelta assumes a gregorian calendar, which is not flexible, and risks behing inconsistent with the calendar of the component. So for now, there will only be a support for a custom integer value, corresponding to the number of sub-periods for which climatologic values are required in the calendar year. It is imperfect as well, but at least it drops the explicit assumption of a specific calendar.
@ThibHlln
Copy link
Member Author

ThibHlln commented Dec 2, 2020

I dropped the support for timedelta for now. I replaced it by a support for an integer value if 'seasonal', 'monthly', or 'day_of_year' are not enough. The framework will check that the =the length of the time dimension in the dataset corresponds to this integer value.

@ThibHlln ThibHlln merged commit 53a16e9 into unifhy-org:dev Dec 2, 2020
@ThibHlln ThibHlln self-assigned this Dec 3, 2020
@ThibHlln ThibHlln deleted the gather-input-data branch April 28, 2021 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for input time series/climatology/time-invariant in place of driving/ancillary
2 participants