New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multi-file dataset spatial average orientation and weights when lon bounds span prime meridian #495
Conversation
with pytest.raises(ValueError): | ||
self.ds.spatial._get_longitude_weights(domain_bounds, region_bounds=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test here is much simpler than has been included in past iterations.
It basically checks to see if there is more than one longitude bound in which the lower bound (i.e., bound[0]
) is greater than the upper bound (i.e., bounds[1]
). This can happen once for longitude (when spanning a prime meridian) but not more than once for rectilinear datasets. If it happens more than once, a ValueError
is thrown.
# The logic for generating longitude weights depends on the | ||
# bounds being ordered such that d_bounds[:, 0] < d_bounds[:, 1]. | ||
# They are re-ordered (if need be) for the purpose of creating | ||
# weights. | ||
d_bounds = self._force_domain_order_low_to_high(d_bounds) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary for latitude because we use the absolute difference of the domain bounds. So the logic was updated in the _get_longitude_weights()
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this also caused the error
xarray can't set arrays with multiple array indices to dask yet.
because d_bounds
were not loaded / copied.
def _force_domain_order_low_to_high(self, domain_bounds: xr.DataArray): | ||
"""Reorders the ``domain_bounds`` low-to-high. | ||
|
||
This method ensures all lower bound values are less than the upper bound | ||
values (``domain_bounds[:, 1] < domain_bounds[:, 1]``). | ||
|
||
Parameters | ||
---------- | ||
domain_bounds: xr.DataArray | ||
The bounds of an axis. | ||
|
||
Returns | ||
------ | ||
xr.DataArray | ||
The bounds of an axis (re-ordered if applicable). | ||
""" | ||
index_bad_cells = np.where(domain_bounds[:, 1] - domain_bounds[:, 0] < 0)[0] | ||
|
||
if len(index_bad_cells) > 0: | ||
new_domain_bounds = domain_bounds.copy() | ||
new_domain_bounds[index_bad_cells, 0] = domain_bounds[index_bad_cells, 1] | ||
new_domain_bounds[index_bad_cells, 1] = domain_bounds[index_bad_cells, 0] | ||
|
||
return new_domain_bounds | ||
|
||
return domain_bounds | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removed and replaced with a simpler consistency check and existing logic.
xcdat/spatial.py
Outdated
# check if there is more than one potential prime meridian cell | ||
# there should only be one for rectilinear data | ||
pmcells = np.where(domain_bounds[:, 1] - domain_bounds[:, 0] < 0)[0] | ||
if len(pmcells) > 1: | ||
raise ValueError( | ||
"More than one longitude bound is out of order. Only one bound \n\ | ||
spanning the prime meridian is permitted" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the consistency check to ensure there is only one instance in which domain_bounds[:, 1] - domain_bounds[:, 0]
(this can happen for grid cells that span a prime meridian, but only once for rectilinear data).
xcdat/spatial.py
Outdated
# convert longitude bounds to 360 degree domain | ||
d_bounds: xr.DataArray = self._swap_lon_axis(d_bounds, to=360) # type: ignore | ||
# check for bounds spanning prime meridian | ||
p_meridian_index = _get_prime_meridian_index(d_bounds) | ||
# if bounds span a prime meridian, ensure bounds are organized to span 0-360 | ||
# (without losing weight) | ||
if p_meridian_index is not None: | ||
d_bounds = _align_lon_bounds_to_360(d_bounds, p_meridian_index) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled these lines of code out of the if region_bounds is not None
block (below). This logic should apply whether there is a region_bound
provided or not, because it helps to ensure the longitude bounds span 0 to 360 and are linear (high-to-low) which is important during the weights calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was originally put in this if statement because the user could pass in region bounds in (-180, 180) or (0, 360). If the user passed in region bounds, we converted both the domain and region bounds to a (0, 360) domain and then handled prime meridian issues.
The issue prompting this shows that for some datasets, we need to deal with the prime meridian issue (even if there are no region bounds). This doesn't occur in CMIP data (I think because CF-conventions state that bounds should be increasing). If bounds are increasing, there is no need for this complex logic.
FYI I updated the PR title because it is used as the default commit message for the PR with "Squash and merge". |
def test_spatial_average_for_domain_wrapping_p_meridian_non_cf_conventions( | ||
self, | ||
): | ||
ds = self.ds.copy() | ||
|
||
# get spatial average for original dataset | ||
ref = ds.spatial.average("ts").ts | ||
|
||
# change first bound from -0.9375 to 359.0625 | ||
lon_bnds = ds.lon_bnds.copy() | ||
lon_bnds[0, 0] = 359.0625 | ||
ds["lon_bnds"] = lon_bnds | ||
|
||
# check spatial average with new (bad) bound | ||
result = ds.spatial.average("ts").ts | ||
|
||
assert result.identical(ref) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test that non-CF compliant bounds still lead to correct spatial averaging.
I think this can be reviewed now @tomvothecoder. I think I understand the spatial averaging issues we've had in the past (see #500). |
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #495 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 14 14
Lines 1431 1425 -6
=========================================
- Hits 1431 1425 -6
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pochedls, your changes make sense to me. I added a few minor code suggestions. If you looped over datasets for validation like your normally do and the results checked out, feel free to merge. Thanks!
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
Description
In #494,
.spatial.get_weights
and.spatial.average()
do not work for multifile datasets when the longitude bounds span the prime meridian. This PR updates that logic to fix issues.Issues and root causes:
.spatial.average(lat_bnds=(a,b)
does not appear to be working properly #494Checklist
If applicable: