Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions to produce accurate time bounds #418

Merged
merged 17 commits into from Apr 20, 2023

Conversation

pochedls
Copy link
Collaborator

@pochedls pochedls commented Feb 24, 2023

Description

PR is intended to reproduce CDAT functionality (e.g., setAxisTimeBoundsMonthly()) to produce accurate time bounds for datasets with missing bounds.

As cleanup tasks, this PR will also a) remove the width parameter from generic bound creation routines, b) only autogenerate bounds for lat/lon (others axes, e.g., time, must be specified).

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@github-actions github-actions bot added the type: enhancement New enhancement request label Feb 24, 2023
@codecov-commenter
Copy link

codecov-commenter commented Feb 24, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (157f6cf) 100.00% compared to head (c28b227) 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##              main      #418    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           14        14            
  Lines         1308      1425   +117     
==========================================
+ Hits          1308      1425   +117     
Impacted Files Coverage Δ
xcdat/spatial.py 100.00% <ø> (ø)
xcdat/bounds.py 100.00% <100.00%> (ø)
xcdat/dataset.py 100.00% <100.00%> (ø)
xcdat/regridder/accessor.py 100.00% <100.00%> (ø)
xcdat/regridder/grid.py 100.00% <100.00%> (ø)
xcdat/regridder/regrid2.py 100.00% <100.00%> (ø)
xcdat/regridder/xesmf.py 100.00% <100.00%> (ø)
xcdat/temporal.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tomvothecoder
Copy link
Collaborator

This is an awesome turnaround! Thanks for getting to this so fast Steve. Feel free to tag me when it is ready.

I can do code review and high-level usability testing, and maybe @lee1043 can a more in-depth usability testing if he's available?

@lee1043
Copy link
Collaborator

lee1043 commented Feb 27, 2023

@pochedls thank you for the PR! @tomvothecoder yes I can also take a look. I will work on it and keep you all posted.

@pochedls
Copy link
Collaborator Author

@pochedls thank you for the PR! @tomvothecoder yes I can also take a look. I will work on it and keep you all posted.

This PR is not yet ready for review.

@lee1043
Copy link
Collaborator

lee1043 commented Feb 27, 2023

This PR is not yet ready for review.

@pochedls Okay, let us know when it's ready.

@tomvothecoder tomvothecoder added this to In progress in Next Release (v0.5.0) via automation Feb 27, 2023
@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Mar 1, 2023

Hey @pochedls I'm following about using TemporalAccessor._infer_freq() in this PR.

In the current implementation, self._set_data_var_attrs() needs to be called first before calling _infer_freq() because this method requires the self.dim class attribute (below)

xcdat/xcdat/temporal.py

Lines 673 to 674 in 1efb807

time_coords = self._dataset[self.dim]
min_delta = pd.to_timedelta(np.diff(time_coords).min(), unit="ns")

A possible workaround is to add an optional dim kwarg that is used if it is set instead of self.dim. We can also make infer_freq a public method (if we think it is useful, otherwise keeping it internal is fine).

ds.temporal.infer_freq(dim="time")

Copy link
Collaborator Author

@pochedls pochedls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to be a summary of the changes made to help with the review.

Major changes

  • xcdat no longer adds temporal bounds by default (it now only adds lat/lon by default). To add time (or height) bounds you need to include T in the new bound_axes argument in open_dataset functionality (e.g., bounds_axes=[‘X’, ‘Y’, ’T’]) or the axes argument in add_missing_bounds.
  • Functionality to add bounds now handles temporal bounds as a special case (see trace below); we set the time bounds to the expected start/end of the time period (e.g., the start/end of the month for monthly data).
  • A number of functions of the form get_FREQ_time_bounds are added. They are wrapped by get_time_bounds which is intended to call the correct get_FREQ_time_bounds function.
  • infer_freq is now a function (rather than a temporal accessor method)
  • Added a new _month_add function to add N months to a cftime object (N can be negative for subtraction)

API Changes

  • Modified
    • open_dataset now adds optional bounds_axes argument [dataset.py]
    • open_mfdataset now adds optional bounds_axes argument [dataset.py]
    • postprocess_dataset now adds required bounds_axes argument [dataset.py]
    • _infer_freq(time_coord) now requires the time coordinate axis and is no longer an accessor method [temporal.py]
    • add_missing_bounds remove width argument and accept optional axes argument [bounds.py]
    • add_bounds remove width argument [bounds.py]
    • _create_bounds remove width argument [bounds.py]
  • New
    • _month_add(times, delta, calendar) [temporal.py]
    • get_yearly_time_bounds(time) [bounds.py]
    • get_monthly_time_bounds(time) [bounds.py]
    • get_daily_time_bounds(time, frequency=1) [bounds.py]
    • get_time_bounds(time) [bounds.py]

Trace

open_dataset / open_mfdataset
postprocess_dataset
    add_missing_bounds
      _create_bounds
        get_time_bounds
         get_yearly_time_bounds / get_monthly_time_bounds / get_daily_time_bounds
           _infer_freq / _month_add

  • Add optional bounds_axes argument to open dataset functionality: open_dataset(..., bounds_axes=[..., "T"]) or open_mfdataset(..., bounds_axes=[..., "T"]), which call _postprocess_dataset(..., bounds_axes=[..., "T"]). By default bounds_axes=["X", "Y"]. The rest of this "trace" assumes "T" is included in bounds_axes.
  • Post-processing routine calls add_missing_bounds, which now only operates on the bounds_axes passed into the function (rather than all X/Y/T/Z coordinate variables in dataset).
  • add_missing_bounds calls _create_bounds. If the coordinate variable being operated on is time, the routine will call the new get_time_bounds function to create the time bounds (instead of using the generic bounds creation logic).
  • get_time_bounds calls the refactored _infer_freq. Based on the inferred frequency either get_yearly_time_bounds, get_monthly_time_bounds, or get_daily_time_bounds is called. The function then creates and returns a time coordinate data array.

xcdat/bounds.py

  • Modify add_missing_bounds such that width is no longer an argument (always 0.5) and an optional axes argument is added with [“X”, “Y”] set as the default.
  • Modify add_bounds to remove width argument and to accept bounds as an optional argument. If bounds are passed in it will add these bounds to the dataset rather than generating new bounds.
  • Modify _create_bounds to remove width argument. Modifies functionality: if coord_var is a temporal axis the routine now calls get_time_bounds to compute the temporal bounds.
  • Add get_yearly_time_bounds, get_monthly_time_bounds, and get_daily_time_bounds based on CDAT functionality. These functions create time bounds for annual, monthly, and daily time series. get_daily_time_bounds will also calculate sub-daily (but not sub-hourly) time bounds if a frequency argument is supplied.
  • Add get_time_bounds function to call the correct get_FREQ_time_bounds function based on the refactored _infer_freq function.

xcdat/dataset.py

  • open_dataset, open_mfdataset, and _postprocess_dataset are modified to accept an optional bounds_axes argument, which defines the axes for which xcdat ensures there are bounds

xcdat/temporal

  • Add _month_add function to add N months to cftime objects
  • Change _infer_freq to a function, which now requires time_coords as an argument
  • Update average to pass in the now required time coordinate axis argument to _infer_freq

tests/fixtures.py

  • Add sub-hourly, hourly, daily, and yearly time axes and bounds
  • Refactor generate_dataset so that inputs to xr.Dataset are decided with conditional and then xr.Dataset is called with those inputs (reduces code length a little)
  • Add generate_dataset_by_frequency to generate datasets of varying frequencies in order to test functionality to add temporal bounds. I initially added a frequency argument to generate_dataset, but it did not pass the pre-commit hooks (one of them complained the function was overly complex: Flake8 C901).

tests/test_bounds.py

  • add test_time_bounds_not_added_to_the_dataset_if_not_specified to ensure that time bounds are not added by default when calling add_missing_bounds
  • Fixed test_returns_bounds_for_dataset_with_time_coords_as_cftime_objects test (which was not setting time bounds correctly)
  • Add test_get_monthly_bounds, test_get_yearly_bounds, test_get_daily_bounds, test_get_hourly_bounds which check that they each compute the correct time bounds.
  • Add test_generate_monthly_bounds_for_eom_set_true, which is the same as above, except the end_of_month flag for get_monthly_time_bounds is set to True.
  • Add test_generic_get_time_bounds_function which checks to ensure get_time_bounds adds the correct time bounds to the dataset for yearly, monthly, daily, and hourly data. Also ensures that sub-hourly data throws a ValueError.

tests/test_dataset.py

  • Change test_adds_missing_lat_and_lon_bounds to test_adds_missing_lat_and_lon_bounds_does_not_add_time_bounds to ensure that time bounds are not added by default when opening a dataset without time bounds
  • Add test_adds_missing_lat_and_lon_and_time_bounds to ensure time bounds are added when T is included in the bounds_axes argument in open_dataset.

tests/test_temporal.py

  • Add test_month_add_plus_minus function to test addition / subtraction of months with _month_add

@pochedls
Copy link
Collaborator Author

pochedls commented Mar 6, 2023

One unresolved aspect of this PR is that I needed to add # type: ignore in several get_XYZ_time_bounds() functions, e.g., here.

@tomvothecoder
Copy link
Collaborator

Hey @pochedls, thanks for your excellent PR summary above. It is really helpful.

I am walking through your code changes and will do some usability testing. @lee1043 can you do more in-depth usability testing as well?

I am hoping to get this PR merged for the v0.5.0 release (around mid next week).

xcdat/bounds.py Outdated Show resolved Hide resolved
xcdat/bounds.py Outdated Show resolved Hide resolved
@lee1043
Copy link
Collaborator

lee1043 commented Mar 11, 2023

Thank you, @pochedls, for making this PR. Could you help me learning how to use this new capability in following scenario? A sample snippet of codes would be very helpful.

  • Open dataset that has monthly frequency and single variable. Each time step stored at the beginning of month (1/1, 2/1, 3/1, ...)
  • I want to set bounds as ([1/1-1/31, 2/1-2/28, 3/1-3/31, ...])

I guess some combination of create_bounds and get_monthly_time_bounds need to be used, but I am yet to have a good idea how to use them. I think we may want to rename get_monthly_time_bounds to set_monthly_time_bounds or create_monthly_time_bounds because get_monthly_time_bounds gives me an impression that I might retrieve time axis from the original dataset.

@pochedls
Copy link
Collaborator Author

pochedls commented Mar 11, 2023

Thank you, @pochedls, for making this PR. Could you help me learning how to use this new capability in following scenario? A sample snippet of codes would be very helpful.

* Open dataset that has monthly frequency and single variable. Each time step stored at the beginning of month (1/1, 2/1, 3/1, ...)

* I want to set bounds as ([1/1-1/31, 2/1-2/28, 3/1-3/31, ...])

I guess some combination of create_bounds and get_monthly_time_bounds need to be used, but I am yet to have a good idea how to use them. I think we may want to rename get_monthly_time_bounds to set_monthly_time_bounds or create_monthly_time_bounds because get_monthly_time_bounds gives me an impression that I might retrieve time axis from the original dataset.

I think there are two ways to do this (90% sure – I don't have this PR open on my machine right now):

  1. On open: ds = xc.open_dataset(..., bounds_axes=['X', 'Y', 'T'])
  2. After open: ds = ds.bounds.add_missing_bounds(axes=['X', 'Y', 'T'])

@tomvothecoder also suggested renaming get_time_bounds functions to create_time_bounds (I agree with both of you).

xcdat/bounds.py Outdated Show resolved Hide resolved
@lee1043
Copy link
Collaborator

lee1043 commented Mar 11, 2023

  1. On open: ds = xc.open_dataset(..., bounds_axes=['X', 'Y', 'T'])
  2. After open: ds = ds.bounds.add_missing_bounds(axes=['X', 'Y', 'T'])

@pochedls In the above my example scenario how do I know add_missing_bounds set time bounds as begin/end of month instead of -15 and +15 days from the time of each time step?

@pochedls
Copy link
Collaborator Author

pochedls commented Mar 11, 2023

  1. On open: ds = xc.open_dataset(..., bounds_axes=['X', 'Y', 'T'])
  2. After open: ds = ds.bounds.add_missing_bounds(axes=['X', 'Y', 'T'])

@pochedls In the above my example scenario how do I know add_missing_bounds set time bounds as begin/end of month instead of -15 and +15 days from the time of each time step?

add_missing_bounds effectively calls get_time_bounds (to be renamed create_time_bounds), which infers that the frequency is monthly and then calls get_monthly_time_bounds (to be renamed create_monthly_time_bounds). So the bounds functionality now is intended to figure out the appropriate time bounds based on frequency (rather than doing the +/- 15 day stuff). Do you think the +/- 15 day stuff should be preserved?

@lee1043
Copy link
Collaborator

lee1043 commented Mar 11, 2023

add_missing_bounds effectively calls get_time_bounds (to be renamed create_time_bounds), which infers that the frequency is monthly and then calls get_monthly_time_bounds (to be renamed create_monthly_time_bounds). So the bounds functionality now is intended to figure out the appropriate time bounds based on frequency (rather than doing the +/- 15 day stuff). Do you think the +/- 15 day stuff should be preserved?

@pochedls thank you for explaining. I was thinking add_missing_bounds add for example time_bnds data array when it was not existing in the original dataset, but add_missing_bounds will update bounds information as well. Is this correct?

@pochedls
Copy link
Collaborator Author

add_missing_bounds effectively calls get_time_bounds (to be renamed create_time_bounds), which infers that the frequency is monthly and then calls get_monthly_time_bounds (to be renamed create_monthly_time_bounds). So the bounds functionality now is intended to figure out the appropriate time bounds based on frequency (rather than doing the +/- 15 day stuff). Do you think the +/- 15 day stuff should be preserved?

@pochedls thank you for explaining. I was thinking add_missing_bounds add for example time_bnds data array when it was not existing in the original dataset, but add_missing_bounds will update bounds information as well. Is this correct?

No – I don't think add_missing_bounds replaces bounds if they are present in the dataset. To do that you'd have to drop the existing bounds first.

@github-actions github-actions bot added the type: docs Updates to documentation label Mar 14, 2023
Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my initial review for this PR with some changes (link to diff with latest commit vs. previous commit).

I'm exploring the potential conversion of the create functions to accessor methods in a separate review.

Summary of changes:

  • Rename get time bounds functions to create
  • Update docstrings and comments for create time bounds functions
  • Add TODO comment to raise exception when freq cannot be inferred in _infer_freq
  • Add create time bounds functions to top-level xcdat module __init__.py and API references api.rst

xcdat/dataset.py Outdated
Comment on lines 43 to 44
add_bounds: bool = True,
bounds_axes: Optional[List[str]] = ["X", "Y"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about combining add_bounds and bounds_axes into a single parameter (e.g., add_bounds = ["X, "Y"], also accepts None)?

I'll need to think about this more to see if it is confusing and/or makes the API less intuitive compared to having two separate parameters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be okay with this (assuming it defaults to ["X", "Y"]).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be considered a breaking change IMO. I assume most users keep the default add_bounds=True argument and xCDAT is not at v1.0.0 yet so we can kinda get away with it (for now).

I'll take a look at this in PR #434.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pochedls and @lee1043, I removed bounds_axes and refactored add_bounds to accept a list axes string, None, and False.

I kept False as an option for API backwards compatibility. I'm certain people have been using add_bounds=False to fix metadata issues and then calling add_bounds() or add_missing_bounds().

Refer to this commit diff for more info: c28b227#diff-e5b713e4bf50c61361496f77fdadcf9d3ffc2fcb39c4c953932cd780ae951e55

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I view this as another improvement to the API I originally proposed. Thanks.

xcdat/bounds.py Outdated Show resolved Hide resolved
tests/test_bounds.py Outdated Show resolved Hide resolved
tests/test_bounds.py Outdated Show resolved Hide resolved
@lee1043
Copy link
Collaborator

lee1043 commented Apr 19, 2023

Hey @lee1043, is ready for your usability testing. Thanks!

Thank you @tomvothecoder @pochedls! I will work on this tomorrow.

xcdat/bounds.py Outdated Show resolved Hide resolved
xcdat/bounds.py Outdated Show resolved Hide resolved
xcdat/bounds.py Outdated Show resolved Hide resolved
@lee1043
Copy link
Collaborator

lee1043 commented Apr 19, 2023

@tomvothecoder @pochedls I confirm the add_time_bounds function works as expected and the documentation is very helpful! Many thanks for this PR! I agree to merge this PR to the main.

pochedls and others added 15 commits April 20, 2023 13:56
…ds; update add_bounds to accept user specified bounds
…ime_bounds function. Update documentation. Add sub-hourly, sub-daily, daily, and yearly datasets in test fixtures. Add unit tests.
- Add requirements for using temporal accessor class
- Update docstrings and comments for `create` time bounds functions
- Add TODO comment to raise exception when freq cannot be inferred in `_infer_freq`
- Add `create` time bounds functions to top-level xcdat module `__init__.py` and API references `api.rst`
Co-authored-by: Stephen Po-Chedley <pochedley@gmail.com>
@tomvothecoder tomvothecoder force-pushed the feature/412-accurate-time-bounds branch from edd3985 to 1004880 Compare April 20, 2023 20:56
@tomvothecoder
Copy link
Collaborator

Thank you @pochedls for your work and @lee1043 for validating.

I am merging this milestone PR!

@tomvothecoder tomvothecoder merged commit cbef082 into main Apr 20, 2023
5 checks passed
Next Release (v0.6.0) automation moved this from In progress to Done Apr 20, 2023
@tomvothecoder tomvothecoder deleted the feature/412-accurate-time-bounds branch April 20, 2023 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: docs Updates to documentation type: enhancement New enhancement request
Projects
No open projects
4 participants