Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multi-file dataset spatial average orientation and weights when lon bounds span prime meridian #495

Merged
merged 4 commits into from Jun 13, 2023

Conversation

pochedls
Copy link
Collaborator

@pochedls pochedls commented Jun 5, 2023

Description

In #494, .spatial.get_weights and .spatial.average() do not work for multifile datasets when the longitude bounds span the prime meridian. This PR updates that logic to fix issues.

Issues and root causes:

The next issue was that the longitude bounds wrap around a prime meridian in this dataset. This creates problems, because weights are determined by the difference between the longitude bound values (and a prime meridian introduces a discontinuity, e.g., [359.6, 0.35] in this case). The logic to correct this does not work for multifile datasets (which led to the error xarray can't set arrays with multiple array indices to dask yet. which came from a function _force_domain_order_low_to_high).

I also noticed that the _force_domain_order_low_to_high could produce incorrect weights for datasets that wrap a prime meridian (this didn't occur because of the above issues).

I've addressed the multifile dataset issue and the prime meridian issues in PR #495, which will hopefully be included in our next release.

Originally posted by @pochedls in #494 (comment)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@github-actions github-actions bot added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Jun 5, 2023
Comment on lines +281 to +282
with pytest.raises(ValueError):
self.ds.spatial._get_longitude_weights(domain_bounds, region_bounds=None)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test here is much simpler than has been included in past iterations.

It basically checks to see if there is more than one longitude bound in which the lower bound (i.e., bound[0]) is greater than the upper bound (i.e., bounds[1]). This can happen once for longitude (when spanning a prime meridian) but not more than once for rectilinear datasets. If it happens more than once, a ValueError is thrown.

Comment on lines -284 to -289
# The logic for generating longitude weights depends on the
# bounds being ordered such that d_bounds[:, 0] < d_bounds[:, 1].
# They are re-ordered (if need be) for the purpose of creating
# weights.
d_bounds = self._force_domain_order_low_to_high(d_bounds)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary for latitude because we use the absolute difference of the domain bounds. So the logic was updated in the _get_longitude_weights() function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this also caused the error

xarray can't set arrays with multiple array indices to dask yet.

because d_bounds were not loaded / copied.

Comment on lines -327 to -353
def _force_domain_order_low_to_high(self, domain_bounds: xr.DataArray):
"""Reorders the ``domain_bounds`` low-to-high.

This method ensures all lower bound values are less than the upper bound
values (``domain_bounds[:, 1] < domain_bounds[:, 1]``).

Parameters
----------
domain_bounds: xr.DataArray
The bounds of an axis.

Returns
------
xr.DataArray
The bounds of an axis (re-ordered if applicable).
"""
index_bad_cells = np.where(domain_bounds[:, 1] - domain_bounds[:, 0] < 0)[0]

if len(index_bad_cells) > 0:
new_domain_bounds = domain_bounds.copy()
new_domain_bounds[index_bad_cells, 0] = domain_bounds[index_bad_cells, 1]
new_domain_bounds[index_bad_cells, 1] = domain_bounds[index_bad_cells, 0]

return new_domain_bounds

return domain_bounds

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed and replaced with a simpler consistency check and existing logic.

xcdat/spatial.py Outdated
Comment on lines 421 to 428
# check if there is more than one potential prime meridian cell
# there should only be one for rectilinear data
pmcells = np.where(domain_bounds[:, 1] - domain_bounds[:, 0] < 0)[0]
if len(pmcells) > 1:
raise ValueError(
"More than one longitude bound is out of order. Only one bound \n\
spanning the prime meridian is permitted"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the consistency check to ensure there is only one instance in which domain_bounds[:, 1] - domain_bounds[:, 0] (this can happen for grid cells that span a prime meridian, but only once for rectilinear data).

xcdat/spatial.py Outdated
Comment on lines 429 to 437
# convert longitude bounds to 360 degree domain
d_bounds: xr.DataArray = self._swap_lon_axis(d_bounds, to=360) # type: ignore
# check for bounds spanning prime meridian
p_meridian_index = _get_prime_meridian_index(d_bounds)
# if bounds span a prime meridian, ensure bounds are organized to span 0-360
# (without losing weight)
if p_meridian_index is not None:
d_bounds = _align_lon_bounds_to_360(d_bounds, p_meridian_index)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled these lines of code out of the if region_bounds is not None block (below). This logic should apply whether there is a region_bound provided or not, because it helps to ensure the longitude bounds span 0 to 360 and are linear (high-to-low) which is important during the weights calculation.

Copy link
Collaborator Author

@pochedls pochedls Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally put in this if statement because the user could pass in region bounds in (-180, 180) or (0, 360). If the user passed in region bounds, we converted both the domain and region bounds to a (0, 360) domain and then handled prime meridian issues.

The issue prompting this shows that for some datasets, we need to deal with the prime meridian issue (even if there are no region bounds). This doesn't occur in CMIP data (I think because CF-conventions state that bounds should be increasing). If bounds are increasing, there is no need for this complex logic.

@tomvothecoder tomvothecoder changed the title bugfix for #494 Fix multi-file dataset spatial average orientation and weights when lon bounds span prime meridian Jun 7, 2023
@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Jun 7, 2023

FYI I updated the PR title because it is used as the default commit message for the PR with "Squash and merge".
I also updated the description with a comment from #494 that covers the issue and causes.

Comment on lines +183 to +200
def test_spatial_average_for_domain_wrapping_p_meridian_non_cf_conventions(
self,
):
ds = self.ds.copy()

# get spatial average for original dataset
ref = ds.spatial.average("ts").ts

# change first bound from -0.9375 to 359.0625
lon_bnds = ds.lon_bnds.copy()
lon_bnds[0, 0] = 359.0625
ds["lon_bnds"] = lon_bnds

# check spatial average with new (bad) bound
result = ds.spatial.average("ts").ts

assert result.identical(ref)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test that non-CF compliant bounds still lead to correct spatial averaging.

@pochedls
Copy link
Collaborator Author

pochedls commented Jun 8, 2023

I think this can be reviewed now @tomvothecoder. I think I understand the spatial averaging issues we've had in the past (see #500).

@codecov-commenter
Copy link

codecov-commenter commented Jun 8, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (ef0085b) 100.00% compared to head (93dbca6) 100.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #495   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           14        14           
  Lines         1431      1425    -6     
=========================================
- Hits          1431      1425    -6     
Impacted Files Coverage Δ
xcdat/spatial.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pochedls, your changes make sense to me. I added a few minor code suggestions. If you looped over datasets for validation like your normally do and the results checked out, feel free to merge. Thanks!

xcdat/spatial.py Outdated Show resolved Hide resolved
xcdat/spatial.py Outdated Show resolved Hide resolved
@pochedls pochedls merged commit 44e0e3b into main Jun 13, 2023
5 checks passed
@pochedls pochedls deleted the bug/494_fix_spatial_lon_bound_logic branch June 13, 2023 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
3 participants