-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update get_axis_coord()
to interpret more keys
#262
Conversation
get_coord_var()
to intrepret more keysget_coord_var()
to interpret more keys
Codecov Report
@@ Coverage Diff @@
## main #262 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 9 14 +5
Lines 742 1133 +391
==========================================
+ Hits 742 1133 +391
Continue to review full report at Codecov.
|
Hi @pochedls @durack1, this PR is ready for review. It relates to the GH issues you opened (#215 and #260). In the Files changed tab, you can ignore the test files unless you're interested in them. I wasn't able to tag you as a reviewer @durack1 since you're not in the xCDAT GH org. I just sent an invite for you to join. |
xcdat/axis.py
Outdated
def get_coord_var(dataset: xr.Dataset, axis: CFAxisName): | ||
"""Gets a coordinate variable for an axis in the dataset. | ||
|
||
This function loops through the possible keys that can be interpreted by | ||
``cf_xarray`` and returns the first match. The keys that can be intrepreted | ||
for a coordinate variable include CF-compliant (1) ``"axis"`` attr, | ||
(2) ``"standard_name"`` attr, and the (3) common short name for the dimension | ||
and coordinates variable. | ||
|
||
For example, ``cf_xarray`` can map the Y axis to the coordinate variable | ||
"lat" using at least one of the following keys: | ||
|
||
1. ``axis="Y"`` | ||
2. ``standard_name="latitude"`` | ||
3. Dimension and coordinate name for latitude is "lat" | ||
|
||
Parameters | ||
---------- | ||
dataset : xr.Dataset | ||
The dataset. | ||
axis : CFAxisName | ||
The CF-compliant axis name. | ||
|
||
Returns | ||
------- | ||
xr.DataArray | ||
The coordinate variable. | ||
|
||
Raises | ||
------ | ||
KeyError | ||
If the coordinate variable was not found in the dataset. | ||
|
||
Notes | ||
----- | ||
Refer to [1]_ for a list of CF-compliant "axis" and "standard_name" attr | ||
names that can be interpreted by ``cf_xarray``. | ||
|
||
References | ||
---------- | ||
|
||
.. [1] https://cf-xarray.readthedocs.io/en/latest/coord_axes.html#axes-and-coordinates | ||
""" | ||
keys = CF_NAME_MAP[axis] | ||
coord_var = None | ||
|
||
for key in keys: | ||
try: | ||
coord_var = dataset.cf[key] | ||
break | ||
except KeyError: | ||
pass | ||
|
||
if coord_var is None: | ||
raise KeyError( | ||
f"A coordinate variable for the {axis} axis was not found in the dataset. " | ||
"Make sure the coordinate variable exists in the dataset and either the " | ||
"'axis' or 'standard_name' attr is set, or the name of the dim and " | ||
"coords is the common short name (e.g.,'lat')." | ||
) | ||
return coord_var | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function for review
xcdat/axis.py
Outdated
# https://cf-xarray.readthedocs.io/en/latest/coord_axes.html#axis-names | ||
CFAxisName = Literal["X", "Y", "T"] | ||
# https://cf-xarray.readthedocs.io/en/latest/coord_axes.html#coordinate-names | ||
CFStandardName = Literal["latitude", "longitude", "time"] | ||
ShortName = Literal["lat", "lon"] | ||
|
||
# The key is the accepted value for method and function arguments, and the | ||
# values are the CF-compliant axis and standard names that are interpreted in | ||
# the dataset. | ||
CF_NAME_MAP: Dict[CFAxisName, List[Union[CFAxisName, CFStandardName, ShortName]]] = { | ||
"X": ["X", "longitude", "lon"], | ||
"Y": ["Y", "latitude", "lat"], | ||
"T": ["T", "time"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new axis coordinate mapping table structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be expanded for "Z" after PR #164 is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR code changes
I was first seeing if this PR resolves the code snippet I outlined here, but I am getting the same error in |
Thanks @pochedls, I failed to update the spatial averaging methods to agnostically interpret the dimension names for the spatial axes. Here's the commit that fixes #215: 436f29b I tested it using the code snippet from that PR and it is now working.
|
@tomvothecoder a noob question, if I just checkout this branch, should that allow me to start testing against these edge case files that I've encountered? (or do I have to checkout more than this #262 branch alone) Attempting to follow along, it seems that #262 (comment) covers a similar question, no? |
Exactly, you just need to setup the Python development environment for this repository and you can test the branch in a Jupyter Notebook or Python script/console. Guide to get that setup: https://xcdat.readthedocs.io/en/latest/contributing.html#local-development |
@durack1 You might be behind on commits if you are still seeing errors that were resolved in previous comments, so I think pulling the latest commits might fix them. |
@tomvothecoder looks like I am up-to-date:
And I am still hitting a similar issue (as #215) (xcdat_dev) bash-4.2$ python
Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xcdat
>>> print(xcdat.__version__)
0.2.0
>>> f = "/p/css03/esgf_publish/CMIP6/PMIP/IPSL/IPSL-CM6A-LR/midPliocene-eoi400/r1i1p1f1/AERmonZ/o3/grz/v20190118/o3_AERmonZ_IPSL-CM6A-LR_midPliocene-eoi400_r1i1p1f1_grz_185001-204912.nc"
>>> from xcdat import open_dataset as od
>>> xH = od(f)
Traceback (most recent call last):
File "~/git/xcdat/xcdat/bounds.py", line 178, in get_bounds
bounds_key = coord_var.attrs["bounds"]
KeyError: 'bounds'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "~/git/xcdat/xcdat/bounds.py", line 145, in add_missing_bounds
self.get_bounds(axis)
File "~/git/xcdat/xcdat/bounds.py", line 180, in get_bounds
raise KeyError(
KeyError: "The coordinate variable lat has no 'bounds' attr. Make sure the 'bounds' attr is set and the bounds data var exists"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "~/git/xcdat/xcdat/dataset.py", line 93, in open_dataset
ds = _postprocess_dataset(ds, data_var, center_times, add_bounds, lon_orient)
File "~/git/xcdat/xcdat/dataset.py", line 466, in _postprocess_dataset
dataset = dataset.bounds.add_missing_bounds()
File "~/git/xcdat/xcdat/bounds.py", line 147, in add_missing_bounds
self._dataset = self._add_bounds(axis, width)
File "~/git/xcdat/xcdat/bounds.py", line 307, in _add_bounds
and "degree" in coord_var.attrs["units"]
KeyError: 'units' |
@durack1 What is happening with that specific dataset is:
Related code: Lines 305 to 311 in c678c8b
Since the I pushed a commit, so we should be good to go for this specific issue. |
To follow up on the previous comment, I am working on handling when the lat Prototype code:
|
@tomvothecoder great, so c678c8b doesn't seem to have been added to this branch, right? And while we are finding edge cases, here are 3 others. I can open these successfully using cdms2 but hit a problem with /p/css03/esgf_publish/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/historical/r2i1p1f3/SIday/sitemptop/gr/v20191218/sitemptop_SIday_HadGEM3-GC31-MM_historical_r2i1p1f3_gr_19900101-19941230.nc
/p/css03/esgf_publish/CMIP6/PAMIP/CCCma/CanESM5/pdSST-futBKSeasSIC/r41i1p2f1/Amon/ua/gn/v20190429/ua_Amon_CanESM5_pdSST-futBKSeasSIC_r41i1p2f1_gn_200004-200105.nc
/p/css03/esgf_publish/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp245/r3i1p1f1/SImon/siu/gn/v20210517/siu_SImon_EC-Earth3_ssp245_r3i1p1f1_gn_203301-203312.nc And this single file is garbled netcdf that has no data, but does have a valid header, so a very edgey case indeed ( /p/css03/esgf_publish/CMIP6/FAFMIP/MPI-M/MPI-ESM1-2-LR/faf-passiveheat/r1i1p1f1/Omon/epcalc100/gn/v20210901/epcalc100_Omon_MPI-ESM1-2-LR_faf-passiveheat_r1i1p1f1_gn_187001-188912.nc I'll have a look to see if any other obvious problems were identified in my CMIP6 iteration code |
@durack1 - I'll let @tomvothecoder decide what is best, but I think based on our recent experience documenting issues with the regridder, I would suggest opening new tickets for these examples unless they are the same underlying issue. This will help in diagnosing / resolving the issues quickly and ensuring that the issues are documented in a way that is searchable (and may allow us to understand why changes were made). |
@pochedls the 3 files all problems with the time coord, and 2 lines of code in The Happy to wedge these wherever they are useful, new issues are fine but will be duplicative |
I just pushed code to handle the Lines 306 to 323 in a51e2dd
Thanks @durack1 for documenting these issues. I would normally agree with @pochedls that new opening up new issues would be good so that the scope of the conversation stays related to the PR. However, it seems like your issues are related to #261, which is a PR independent of this one. I would post your comments over there if that's correct (instead of creating duplicate issues as you mentioned). Also, I think this PR is in good shape after addressing the issues in the previous comments. I'll do a final review of this PR and merge, unless there are any other issues you've come across. This PR and #261 are blockers for v0.3.0, which I'm hoping to release tomorrow (or Monday). |
@tomvothecoder, @pochedls no problem, the 3 files with time coord issues have been copied across into #261 |
lon: xr.DataArray = get_axis_coord(ds, "X") | ||
lon_bounds: xr.DataArray = dataset.bounds.get_bounds("X") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure it is okay to drop the .copy()
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped .copy()
in this function and meant to add .copy()
to get_bounds()
. Thanks for the reminder.
Lines 185 to 193 in 51627fd
try: | |
bounds_var = self._dataset[bounds_key].copy() | |
except KeyError: | |
raise KeyError( | |
f"Bounds were not found for the coordinate variable '{coord_var.name}'. " | |
"Add bounds with `Dataset.bounds.add_bounds()`." | |
) | |
return bounds_var |
When .copy()
is and isn't needed:
# Coordinate variable
# .copy() not needed, this creates a deep copy
lat = ds.lat
lat.data[0] = None
# Data variable
# .copy() needed, otherwise a shallow copy is created (ref by memory)
lat_bnds = ds.lat_bnds.copy()
lat_bnds.data[0] = None
84a5883
to
23a29cf
Compare
- Rename `get_coord_var()` to `get_axis_coord()` - Update `add_missing_bounds()`, `add_bounds()`, and `get_bounds()` logic to use updated `get_coord_var()` - Replace `GENERIC_AXIS_MAP` with `CF_NAME_MAP` - Add `CFAxisName`, `CFStandardName`, and `ShortName` type aliases - Update variable names in `_add_bounds()` for clarity - Move `add_bounds` kwarg in `open_dataset()` and `open_mfdataset()` to a higher position due to importance - Update spatial averaging to agnostically get dim names - Add `get_axis_dim()` - Fix `_align_lon_bounds_to_360` not getting the correct dimension name - Update name of attr `self._dim_name` to `self._dim` in `TemporalAccessor` - Add handles for lat ` "units" ` attr in `_add_bounds()` - Update "is_generated" attr to "xcdat_bounds"
- Update references to `.cf.get_bounds()` to `.bounds.get_bounds()` - Update `grid.create_grid()` to set the `"bounds"` attr on the coord var to avoid `add_missing_bounds()` to attempt to add bounds for global mean grids
23a29cf
to
f399789
Compare
get_coord_var()
to interpret more keysget_axis_coord()
to interpret more keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
try: | ||
axis_bnds = self._ds.bounds.get_bounds(axis.name) | ||
bounds_var = self._ds.bounds.get_bounds(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The accepted argument for bounds.get_bounds()
is now the cf-compliant axis name ("X", "Y", "T", "Z").
You'll notice a bunch of these were updated.
try: | ||
axis = self._ds.cf[name] | ||
except KeyError: | ||
raise KeyError( | ||
f"{standard_name} axis could not be correctly identified in the Dataset" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is already implemented by get_coord_var()
so I reused that function here.
The standard_name
kwarg isn't necessary because get_axis_coord()
tries to find the coord var using axis
, standard_name
, or dim name (e.g., "lat"). Refer to axis.CF_NAME_MAP
.
try: | ||
lat_bnds = source_grid.cf.get_bounds("lat") | ||
lat_bnds = source_grid.bounds.get_bounds("Y") | ||
except KeyError: | ||
pass | ||
else: | ||
target[lat_bnds.name] = lat_bnds.copy() | ||
|
||
try: | ||
lon_bnds = source_grid.cf.get_bounds("lon") | ||
lon_bnds = source_grid.bounds.get_bounds("X") | ||
except KeyError: | ||
pass | ||
else: | ||
target[lon_bnds.name] = lon_bnds.copy() | ||
|
||
for dim_name in source.cf.axes: | ||
try: | ||
source_bnds = source.cf.get_bounds(dim_name) | ||
source_bnds = source.bounds.get_bounds(dim_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced the API call to .cf.get_bounds()
with bounds.get_bounds()
because our API checks if the coordinate variable exists (e.g., "lat") and its "bounds"
attr is set to the bounds data var (e.g., "lat_bnds").
If both of those conditions aren't meant, it will raise an error.
lat = get_axis_coord(grid, "Y") | ||
lat_data = np.array([(lat[0] + lat[-1]) / 2.0]) | ||
|
||
lat_bnds = grid.cf.get_bounds("lat") | ||
lat_bnds = grid.bounds.get_bounds("Y") | ||
lat_bnds = np.array([[lat_bnds[0, 0], lat_bnds[-1, 1]]]) | ||
|
||
lon = grid.cf["lon"] | ||
lon = get_axis_coord(grid, "X") | ||
lon_data = np.array([(lon[0] + lon[-1]) / 2.0]) | ||
|
||
lon_bnds = grid.cf.get_bounds("lon") | ||
lon_bnds = grid.bounds.get_bounds("X") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replace calls such as grid.cf["lat"]
with get_axis_coord()
because sometimes the name of the coordinate variable might not be "lat".
get_axis_coord()
tries to find the coord var using axis
, standard_name
, or dim name (e.g., "lat").
@@ -493,6 +495,7 @@ def create_grid( | |||
lon_bnds = lon_bnds.copy() | |||
|
|||
data_vars["lon_bnds"] = lon_bnds | |||
lon.attrs["bounds"] = lon_bnds.name | |||
|
|||
grid = xr.Dataset(data_vars=data_vars, coords={"lat": lat, "lon": lon}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "bounds"
attr for a coord var must be set to the name of the bounds data var. Otherwise, the subsequent call to grid.bounds.add_missing_bounds()
will attempt to add bounds even if they already exist because it thinks the bounds don't exist.
If this "bounds"
attribute is not set, it breaks the test below.
Lines 533 to 543 in e366c9a
def test_global_mean_grid(self): | |
source_grid = grid.create_grid( | |
np.array([-80, -40, 0, 40, 80]), np.array([0, 45, 90, 180, 270, 360]) | |
) | |
mean_grid = grid.create_global_mean_grid(source_grid) | |
assert np.all(mean_grid.lat == np.array([0.0])) | |
assert np.all(mean_grid.lat_bnds == np.array([[-90, 90]])) | |
assert np.all(mean_grid.lon == np.array([180.0])) | |
assert np.all(mean_grid.lon_bnds == np.array([[-22.5, 405]])) |
Reasons why:
- Bounds already exist but
add_missing_bounds()
doesn't see it because the"bounds"
attr is not set _add_bounds()
breaks because there is only 1 coordinate point forlat
andlon
(conditional below)
What it should be doing instead:
- Ignore
add_missing_bounds()
altogether
Lines 266 to 270 in e366c9a
# Validate coordinate shape and dimensions | |
if da_coord.ndim != 1: | |
raise ValueError("Cannot generate bounds for multidimensional coordinates.") | |
if da_coord.shape[0] <= 1: | |
raise ValueError("Cannot generate bounds for a coordinate of length <= 1.") |
@@ -144,7 +144,7 @@ def add_missing_bounds(self, width: float = 0.5) -> xr.Dataset: | |||
try: | |||
self.get_bounds(axis) | |||
except KeyError: | |||
self._dataset = self._add_bounds(axis, width) | |||
self._dataset = self.add_bounds(axis, width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I accidentally referenced the wrong method.
ds_no_lat = ds.drop_vars(["lat"]) | ||
ds_no_lat = ds.drop_dims(["lat"]) | ||
|
||
with pytest.raises(KeyError): | ||
ds_no_lat.regridder.grid | ||
|
||
ds_no_lon = ds.drop_vars(["lon"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to drop the axis's dimension because its dimension can still exist without coordinates.
By (our) definition, the axis doesn't exist if the corresponding dimension does not exist.
@@ -21,6 +21,7 @@ | |||
"X": ["X", "longitude", "lon"], | |||
"Y": ["Y", "latitude", "lat"], | |||
"T": ["T", "time"], | |||
"Z": ["Z", "height", "pressure"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the axis coordinate mapping table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- xesmf=0.6.2 | ||
- esmpy=8.2.0 | ||
- numba=0.53.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to add this in #44 so the readthedocs build failed
Thanks Jason! |
Description
Summary of Changes
get_axis_coord()
to interpret more keysget_coord_var()
toget_axis_coord()
add_missing_bounds()
,add_bounds()
, andget_bounds()
logic to use updatedget_axis_coord()
GENERIC_AXIS_MAP
withCF_NAME_MAP
CFAxisName
,CFStandardName
, andShortName
type aliases_add_bounds()
for clarityadd_bounds
kwarg inopen_dataset()
andopen_mfdataset()
to a higher position due to importanceget_axis_coord()
.cf.get_bounds()
to.bounds.get_bounds()
grid.create_grid()
to set the"bounds"
attr on the coord var to avoidadd_missing_bounds()
to attempt to add bounds for global mean gridsget_axis_dim()
_align_lon_bounds_to_360
to use this functionself._dim_name
toself._dim
inTemporalAccessor
"units"
attr in_add_bounds()
Checklist
If applicable: