Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update get_axis_coord() to interpret more keys #262

Merged
merged 4 commits into from
Jun 24, 2022

Conversation

tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Jun 22, 2022

Description

Summary of Changes

  • Closes [Feature]: Improve error messages when attempting retrieve coords/coord bounds for an axis #260
  • Closes [Bug]: open_dataset error with "Y" axis bounds #215
  • Update get_axis_coord() to interpret more keys
    • Rename get_coord_var() to get_axis_coord()
    • Update add_missing_bounds(), add_bounds(), and get_bounds() logic to use updated get_axis_coord()
    • Replace GENERIC_AXIS_MAP with CF_NAME_MAP
    • Add CFAxisName, CFStandardName, and ShortName type aliases
    • Update variable names in _add_bounds() for clarity
    • Move add_bounds kwarg in open_dataset() and open_mfdataset() to a higher position due to importance
  • Update regridder APIs to use get_axis_coord()
    • Update references to .cf.get_bounds() to .bounds.get_bounds()
    • Update grid.create_grid() to set the "bounds" attr on the coord var to avoid add_missing_bounds() to attempt to add bounds for global mean grids
  • Add get_axis_dim()
    • Update spatial averaging APIs to use this function
    • Update _align_lon_bounds_to_360 to use this function
  • Misc Refactor
    • Update name of attr self._dim_name to self._dim in TemporalAccessor
    • Update "is_generated" attr to "xcdat_bounds"
    • Add handles for lat "units" attr in _add_bounds()

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder self-assigned this Jun 22, 2022
@tomvothecoder tomvothecoder changed the title Update get_coord_var() to intrepret more keys Update get_coord_var() to interpret more keys Jun 22, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jun 22, 2022

Codecov Report

Merging #262 (288ef1c) into main (1fdc8a9) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              main      #262    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files            9        14     +5     
  Lines          742      1133   +391     
==========================================
+ Hits           742      1133   +391     
Impacted Files Coverage Δ
xcdat/utils.py 100.00% <ø> (ø)
xcdat/__init__.py 100.00% <100.00%> (ø)
xcdat/axis.py 100.00% <100.00%> (ø)
xcdat/bounds.py 100.00% <100.00%> (ø)
xcdat/dataset.py 100.00% <100.00%> (ø)
xcdat/regridder/__init__.py 100.00% <100.00%> (ø)
xcdat/regridder/accessor.py 100.00% <100.00%> (ø)
xcdat/regridder/base.py 100.00% <100.00%> (ø)
xcdat/regridder/grid.py 100.00% <100.00%> (ø)
xcdat/regridder/regrid2.py 100.00% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e366c9a...288ef1c. Read the comment docs.

@tomvothecoder tomvothecoder requested review from pochedls and removed request for pochedls June 22, 2022 17:58
@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Jun 22, 2022

Hi @pochedls @durack1, this PR is ready for review. It relates to the GH issues you opened (#215 and #260).

In the Files changed tab, you can ignore the test files unless you're interested in them.

I wasn't able to tag you as a reviewer @durack1 since you're not in the xCDAT GH org. I just sent an invite for you to join.

xcdat/axis.py Outdated
Comment on lines 27 to 89
def get_coord_var(dataset: xr.Dataset, axis: CFAxisName):
"""Gets a coordinate variable for an axis in the dataset.

This function loops through the possible keys that can be interpreted by
``cf_xarray`` and returns the first match. The keys that can be intrepreted
for a coordinate variable include CF-compliant (1) ``"axis"`` attr,
(2) ``"standard_name"`` attr, and the (3) common short name for the dimension
and coordinates variable.

For example, ``cf_xarray`` can map the Y axis to the coordinate variable
"lat" using at least one of the following keys:

1. ``axis="Y"``
2. ``standard_name="latitude"``
3. Dimension and coordinate name for latitude is "lat"

Parameters
----------
dataset : xr.Dataset
The dataset.
axis : CFAxisName
The CF-compliant axis name.

Returns
-------
xr.DataArray
The coordinate variable.

Raises
------
KeyError
If the coordinate variable was not found in the dataset.

Notes
-----
Refer to [1]_ for a list of CF-compliant "axis" and "standard_name" attr
names that can be interpreted by ``cf_xarray``.

References
----------

.. [1] https://cf-xarray.readthedocs.io/en/latest/coord_axes.html#axes-and-coordinates
"""
keys = CF_NAME_MAP[axis]
coord_var = None

for key in keys:
try:
coord_var = dataset.cf[key]
break
except KeyError:
pass

if coord_var is None:
raise KeyError(
f"A coordinate variable for the {axis} axis was not found in the dataset. "
"Make sure the coordinate variable exists in the dataset and either the "
"'axis' or 'standard_name' attr is set, or the name of the dim and "
"coords is the common short name (e.g.,'lat')."
)
return coord_var


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function for review

xcdat/axis.py Outdated
Comment on lines 11 to 23
# https://cf-xarray.readthedocs.io/en/latest/coord_axes.html#axis-names
CFAxisName = Literal["X", "Y", "T"]
# https://cf-xarray.readthedocs.io/en/latest/coord_axes.html#coordinate-names
CFStandardName = Literal["latitude", "longitude", "time"]
ShortName = Literal["lat", "lon"]

# The key is the accepted value for method and function arguments, and the
# values are the CF-compliant axis and standard names that are interpreted in
# the dataset.
CF_NAME_MAP: Dict[CFAxisName, List[Union[CFAxisName, CFStandardName, ShortName]]] = {
"X": ["X", "longitude", "lon"],
"Y": ["Y", "latitude", "lat"],
"T": ["T", "time"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new axis coordinate mapping table structure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be expanded for "Z" after PR #164 is merged.

docs/api.rst Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR code changes

tests/test_bounds.py Outdated Show resolved Hide resolved
tests/test_axis.py Outdated Show resolved Hide resolved
xcdat/axis.py Outdated Show resolved Hide resolved
tests/test_bounds.py Outdated Show resolved Hide resolved
xcdat/bounds.py Outdated Show resolved Hide resolved
xcdat/bounds.py Outdated Show resolved Hide resolved
xcdat/bounds.py Outdated Show resolved Hide resolved
docs/index.rst Outdated Show resolved Hide resolved
README.rst Outdated Show resolved Hide resolved
@pochedls
Copy link
Collaborator

I was first seeing if this PR resolves the code snippet I outlined here, but I am getting the same error in spatial._validate_axis_arg. Am I doing something wrong or should this PR not resolve that specific issue?

@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Jun 22, 2022

Thanks @pochedls, I failed to update the spatial averaging methods to agnostically interpret the dimension names for the spatial axes.

Here's the commit that fixes #215: 436f29b

I tested it using the code snippet from that PR and it is now working.

import xcdat

fn = "/p/user_pub/climate_work/pochedley1/surface/gistemp1200_GHCNv4_ERSSTv5.nc"
ds = xcdat.open_dataset(fn)
dsa = ds.spatial.average(data_var="tempanomaly")

print(dsa)

<xarray.Dataset>
Dimensions:      (lat: 90, lon: 180, time: 1706, nv: 2, bnds: 2)
Coordinates:
  * lat          (lat) float32 -89.0 -87.0 -85.0 -83.0 ... 83.0 85.0 87.0 89.0
  * lon          (lon) float32 -179.0 -177.0 -175.0 -173.0 ... 175.0 177.0 179.0
  * time         (time) datetime64[ns] 1880-01-15 1880-02-15 ... 2022-02-15
Dimensions without coordinates: nv, bnds
Data variables:
    time_bnds    (time, nv) datetime64[ns] ...
    tempanomaly  (time) float32 -0.1809 -0.265 -0.09773 ... 0.8567 0.9067 0.8949
    lon_bnds     (lon, bnds) float32 -180.0 -178.0 -178.0 ... 178.0 178.0 180.0
    lat_bnds     (lat, bnds) float32 -90.0 -88.0 -88.0 -86.0 ... 88.0 88.0 90.0
Attributes:
    title:        GISTEMP Surface Temperature Analysis
    institution:  NASA Goddard Institute for Space Studies
    source:       http://data.giss.nasa.gov/gistemp/
    Conventions:  CF-1.6
    history:      Created 2022-03-11 09:08:31 by SBBX_to_nc 2.0 - ILAND=1200,...

xcdat/spatial.py Outdated Show resolved Hide resolved
xcdat/spatial.py Outdated Show resolved Hide resolved
@durack1
Copy link
Collaborator

durack1 commented Jun 23, 2022

@tomvothecoder a noob question, if I just checkout this branch, should that allow me to start testing against these edge case files that I've encountered? (or do I have to checkout more than this #262 branch alone)

Attempting to follow along, it seems that #262 (comment) covers a similar question, no?

@tomvothecoder
Copy link
Collaborator Author

@tomvothecoder a noob question, if I just checkout this branch, should that allow me to start testing against these edge case files that I've encountered? (or do I have to checkout more than this #262 branch alone)

Exactly, you just need to setup the Python development environment for this repository and you can test the branch in a Jupyter Notebook or Python script/console.

Guide to get that setup: https://xcdat.readthedocs.io/en/latest/contributing.html#local-development

@durack1
Copy link
Collaborator

durack1 commented Jun 23, 2022

Exactly, you just need to setup the Python development environment

Done that, and checked out this branch #262, but I still get an error so am wondering if I also need to pull the changes from the 436f29b commit?

@tomvothecoder
Copy link
Collaborator Author

@durack1 You might be behind on commits if you are still seeing errors that were resolved in previous comments, so I think pulling the latest commits might fix them.

@durack1
Copy link
Collaborator

durack1 commented Jun 23, 2022

@tomvothecoder looks like I am up-to-date:

(xcdat_dev) bash-4.2$ git pull
FIPS mode initialized
Updating f9f43f7..7bfdf56

And I am still hitting a similar issue (as #215)

(xcdat_dev) bash-4.2$ python
Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xcdat
>>> print(xcdat.__version__)
0.2.0
>>> f = "/p/css03/esgf_publish/CMIP6/PMIP/IPSL/IPSL-CM6A-LR/midPliocene-eoi400/r1i1p1f1/AERmonZ/o3/grz/v20190118/o3_AERmonZ_IPSL-CM6A-LR_midPliocene-eoi400_r1i1p1f1_grz_185001-204912.nc"
>>> from xcdat import open_dataset as od
>>> xH = od(f)
Traceback (most recent call last):
  File "~/git/xcdat/xcdat/bounds.py", line 178, in get_bounds
    bounds_key = coord_var.attrs["bounds"]
KeyError: 'bounds'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/git/xcdat/xcdat/bounds.py", line 145, in add_missing_bounds
    self.get_bounds(axis)
  File "~/git/xcdat/xcdat/bounds.py", line 180, in get_bounds
    raise KeyError(
KeyError: "The coordinate variable lat has no 'bounds' attr. Make sure the 'bounds' attr is set and the bounds data var exists"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/git/xcdat/xcdat/dataset.py", line 93, in open_dataset
    ds = _postprocess_dataset(ds, data_var, center_times, add_bounds, lon_orient)
  File "~/git/xcdat/xcdat/dataset.py", line 466, in _postprocess_dataset
    dataset = dataset.bounds.add_missing_bounds()
  File "~/git/xcdat/xcdat/bounds.py", line 147, in add_missing_bounds
    self._dataset = self._add_bounds(axis, width)
  File "~/git/xcdat/xcdat/bounds.py", line 307, in _add_bounds
    and "degree" in coord_var.attrs["units"]
KeyError: 'units'

@tomvothecoder
Copy link
Collaborator Author

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/git/xcdat/xcdat/dataset.py", line 93, in open_dataset
    ds = _postprocess_dataset(ds, data_var, center_times, add_bounds, lon_orient)
  File "~/git/xcdat/xcdat/dataset.py", line 466, in _postprocess_dataset
    dataset = dataset.bounds.add_missing_bounds()
  File "~/git/xcdat/xcdat/bounds.py", line 147, in add_missing_bounds
    self._dataset = self._add_bounds(axis, width)
  File "~/git/xcdat/xcdat/bounds.py", line 307, in _add_bounds
    and "degree" in coord_var.attrs["units"]
KeyError: 'units'

@durack1 What is happening with that specific dataset is:

  1. Attempt to open the dataset (add_bounds defaults to True to add missing bounds)
  2. Lat coordinate variable is missing a "units" attr
  3. Attempt to clip the newly generated lat bounds in _add_bounds()
  4. Breaks because the "units" attr is missing

Related code:

xcdat/xcdat/bounds.py

Lines 305 to 311 in c678c8b

# Clip latitude bounds at (-90, 90)
if (
da_coord.name in ("lat", "latitude", "grid_latitude")
and "degree" in da_coord.attrs["units"]
):
if (da_coord >= -90).all() and (da_coord <= 90).all():
np.clip(bounds, -90, 90, out=bounds)

Since the "units" is usually "degree", I removed that part of the conditional ( and "degree" in da_coord.attrs["units"]).

I pushed a commit, so we should be good to go for this specific issue.

@tomvothecoder
Copy link
Collaborator Author

To follow up on the previous comment, I am working on handling when the lat "units" attr is not set or is something other than degrees.

Prototype code:

        # Clip latitude bounds at (-90, 90)
        if coord_var.name in ("lat", "latitude", "grid_latitude"):
            units = coord_var.get("units")

            if units is None:
                ds[coord_var.name].attrs["units"] = "degree"
                logger.warning(
                    f"The '{coord_var.name}' coordinate variable is missing "
                    "a 'units' attribute. Assuming 'units' is 'degree'."
                )
            elif "degree" not in units.lower():
                raise ValueError(
                    "The {coord_var.name} coord variable has a 'units' attribute that "
                    "is not in degrees."
                )

            if (coord_var >= -90).all() and (coord_var <= 90).all():
                np.clip(bounds, -90, 90, out=bounds)

@durack1
Copy link
Collaborator

durack1 commented Jun 23, 2022

@tomvothecoder great, so c678c8b doesn't seem to have been added to this branch, right?

And while we are finding edge cases, here are 3 others. I can open these successfully using cdms2 but hit a problem with xcdat/xcdat/dataset.py, line 84, in open_dataset

/p/css03/esgf_publish/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/historical/r2i1p1f3/SIday/sitemptop/gr/v20191218/sitemptop_SIday_HadGEM3-GC31-MM_historical_r2i1p1f3_gr_19900101-19941230.nc
/p/css03/esgf_publish/CMIP6/PAMIP/CCCma/CanESM5/pdSST-futBKSeasSIC/r41i1p2f1/Amon/ua/gn/v20190429/ua_Amon_CanESM5_pdSST-futBKSeasSIC_r41i1p2f1_gn_200004-200105.nc
/p/css03/esgf_publish/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp245/r3i1p1f1/SImon/siu/gn/v20210517/siu_SImon_EC-Earth3_ssp245_r3i1p1f1_gn_203301-203312.nc

And this single file is garbled netcdf that has no data, but does have a valid header, so a very edgey case indeed (cdms2 also has a problem; xcdat/xcdat/dataset.py, line 90, in open_dataset):

/p/css03/esgf_publish/CMIP6/FAFMIP/MPI-M/MPI-ESM1-2-LR/faf-passiveheat/r1i1p1f1/Omon/epcalc100/gn/v20210901/epcalc100_Omon_MPI-ESM1-2-LR_faf-passiveheat_r1i1p1f1_gn_187001-188912.nc

I'll have a look to see if any other obvious problems were identified in my CMIP6 iteration code

@pochedls
Copy link
Collaborator

And while we are finding edge cases, here are 3 others.

@durack1 - I'll let @tomvothecoder decide what is best, but I think based on our recent experience documenting issues with the regridder, I would suggest opening new tickets for these examples unless they are the same underlying issue.

This will help in diagnosing / resolving the issues quickly and ensuring that the issues are documented in a way that is searchable (and may allow us to understand why changes were made).

@durack1
Copy link
Collaborator

durack1 commented Jun 23, 2022

unless they are the same underlying issue

@pochedls the 3 files all problems with the time coord, and 2 lines of code in xcdat/xcdat/dataset.py, line 402-404, in _has_cf_compliant_time.

The FAFMIP edge case could be ignored, but the resulting traceback is not all that helpful in indicating it's an errored file.

Happy to wedge these wherever they are useful, new issues are fine but will be duplicative

@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Jun 23, 2022

I just pushed code to handle the "units" attr for the lat coordinate variable in _add_bounds():

xcdat/xcdat/bounds.py

Lines 306 to 323 in a51e2dd

# Clip latitude bounds at (-90, 90)
if coord_var.name in ("lat", "latitude", "grid_latitude"):
units = coord_var.attrs.get("units")
if units is None:
coord_var.attrs["units"] = "degrees_north"
logger.warning(
f"The '{coord_var.name}' coordinate variable is missing "
"a 'units' attribute. Assuming 'units' is 'degrees_north'."
)
elif "degree" not in units.lower():
raise ValueError(
f"The {coord_var.name} coord variable has a 'units' attribute that "
"is not in degrees."
)
if (coord_var >= -90).all() and (coord_var <= 90).all():
np.clip(bounds, -90, 90, out=bounds)


And while we are finding edge cases, here are 3 others.

@durack1 - I'll let @tomvothecoder decide what is best, but I think based on our recent experience documenting issues with the regridder, I would suggest opening new tickets for these examples unless they are the same underlying issue.

This will help in diagnosing / resolving the issues quickly and ensuring that the issues are documented in a way that is searchable (and may allow us to understand why changes were made).

unless they are the same underlying issue

@pochedls the 3 files all problems with the time coord, and 2 lines of code in xcdat/xcdat/dataset.py, line 402-404, in _has_cf_compliant_time.

The FAFMIP edge case could be ignored, but the resulting traceback is not all that helpful in indicating it's an errored file.

Happy to wedge these wherever they are useful, new issues are fine but will be duplicative

Thanks @durack1 for documenting these issues. I would normally agree with @pochedls that new opening up new issues would be good so that the scope of the conversation stays related to the PR. However, it seems like your issues are related to #261, which is a PR independent of this one. I would post your comments over there if that's correct (instead of creating duplicate issues as you mentioned).


Also, I think this PR is in good shape after addressing the issues in the previous comments. I'll do a final review of this PR and merge, unless there are any other issues you've come across. This PR and #261 are blockers for v0.3.0, which I'm hoping to release tomorrow (or Monday).

@durack1
Copy link
Collaborator

durack1 commented Jun 23, 2022

@tomvothecoder, @pochedls no problem, the 3 files with time coord issues have been copied across into #261

Comment on lines +271 to +274
lon: xr.DataArray = get_axis_coord(ds, "X")
lon_bounds: xr.DataArray = dataset.bounds.get_bounds("X")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure it is okay to drop the .copy() here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped .copy() in this function and meant to add .copy() to get_bounds(). Thanks for the reminder.

xcdat/xcdat/bounds.py

Lines 185 to 193 in 51627fd

try:
bounds_var = self._dataset[bounds_key].copy()
except KeyError:
raise KeyError(
f"Bounds were not found for the coordinate variable '{coord_var.name}'. "
"Add bounds with `Dataset.bounds.add_bounds()`."
)
return bounds_var

When .copy() is and isn't needed:

# Coordinate variable
# .copy() not needed, this creates a deep copy
lat = ds.lat
lat.data[0] = None

# Data variable
# .copy() needed, otherwise a shallow copy is created (ref by memory)
lat_bnds = ds.lat_bnds.copy()
lat_bnds.data[0] = None

- Rename `get_coord_var()` to `get_axis_coord()`
- Update `add_missing_bounds()`, `add_bounds()`, and `get_bounds()` logic to use updated `get_coord_var()`
- Replace `GENERIC_AXIS_MAP` with `CF_NAME_MAP`
- Add `CFAxisName`, `CFStandardName`, and `ShortName` type aliases
- Update variable names in `_add_bounds()` for clarity
- Move `add_bounds` kwarg in `open_dataset()` and `open_mfdataset()` to a higher position due to importance
- Update spatial averaging to agnostically get dim names
- Add `get_axis_dim()`
- Fix `_align_lon_bounds_to_360` not getting the correct dimension name
- Update name of attr `self._dim_name` to `self._dim` in `TemporalAccessor`
- Add handles for lat ` "units" ` attr in `_add_bounds()`
- Update "is_generated" attr to "xcdat_bounds"
- Update references to `.cf.get_bounds()` to `.bounds.get_bounds()`
- Update `grid.create_grid()` to set the `"bounds"` attr on the coord var to avoid `add_missing_bounds()` to attempt to add bounds for global mean grids
@tomvothecoder tomvothecoder changed the title Update get_coord_var() to interpret more keys Update get_axis_coord() to interpret more keys Jun 24, 2022
Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jasonb5, I rebased this branch on top of main which includes PR #44. Can you review these changes before I merge?


try:
axis_bnds = self._ds.bounds.get_bounds(axis.name)
bounds_var = self._ds.bounds.get_bounds(name)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The accepted argument for bounds.get_bounds() is now the cf-compliant axis name ("X", "Y", "T", "Z").

You'll notice a bunch of these were updated.

Comment on lines -82 to -87
try:
axis = self._ds.cf[name]
except KeyError:
raise KeyError(
f"{standard_name} axis could not be correctly identified in the Dataset"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is already implemented by get_coord_var() so I reused that function here.

The standard_name kwarg isn't necessary because get_axis_coord() tries to find the coord var using axis, standard_name, or dim name (e.g., "lat"). Refer to axis.CF_NAME_MAP.

Comment on lines 35 to +51
try:
lat_bnds = source_grid.cf.get_bounds("lat")
lat_bnds = source_grid.bounds.get_bounds("Y")
except KeyError:
pass
else:
target[lat_bnds.name] = lat_bnds.copy()

try:
lon_bnds = source_grid.cf.get_bounds("lon")
lon_bnds = source_grid.bounds.get_bounds("X")
except KeyError:
pass
else:
target[lon_bnds.name] = lon_bnds.copy()

for dim_name in source.cf.axes:
try:
source_bnds = source.cf.get_bounds(dim_name)
source_bnds = source.bounds.get_bounds(dim_name)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the API call to .cf.get_bounds() with bounds.get_bounds() because our API checks if the coordinate variable exists (e.g., "lat") and its "bounds" attr is set to the bounds data var (e.g., "lat_bnds").

If both of those conditions aren't meant, it will raise an error.

Comment on lines +381 to +390
lat = get_axis_coord(grid, "Y")
lat_data = np.array([(lat[0] + lat[-1]) / 2.0])

lat_bnds = grid.cf.get_bounds("lat")
lat_bnds = grid.bounds.get_bounds("Y")
lat_bnds = np.array([[lat_bnds[0, 0], lat_bnds[-1, 1]]])

lon = grid.cf["lon"]
lon = get_axis_coord(grid, "X")
lon_data = np.array([(lon[0] + lon[-1]) / 2.0])

lon_bnds = grid.cf.get_bounds("lon")
lon_bnds = grid.bounds.get_bounds("X")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replace calls such as grid.cf["lat"] with get_axis_coord() because sometimes the name of the coordinate variable might not be "lat".

get_axis_coord() tries to find the coord var using axis, standard_name, or dim name (e.g., "lat").

@@ -493,6 +495,7 @@ def create_grid(
lon_bnds = lon_bnds.copy()

data_vars["lon_bnds"] = lon_bnds
lon.attrs["bounds"] = lon_bnds.name

grid = xr.Dataset(data_vars=data_vars, coords={"lat": lat, "lon": lon})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "bounds" attr for a coord var must be set to the name of the bounds data var. Otherwise, the subsequent call to grid.bounds.add_missing_bounds() will attempt to add bounds even if they already exist because it thinks the bounds don't exist.

If this "bounds" attribute is not set, it breaks the test below.

xcdat/tests/test_regrid.py

Lines 533 to 543 in e366c9a

def test_global_mean_grid(self):
source_grid = grid.create_grid(
np.array([-80, -40, 0, 40, 80]), np.array([0, 45, 90, 180, 270, 360])
)
mean_grid = grid.create_global_mean_grid(source_grid)
assert np.all(mean_grid.lat == np.array([0.0]))
assert np.all(mean_grid.lat_bnds == np.array([[-90, 90]]))
assert np.all(mean_grid.lon == np.array([180.0]))
assert np.all(mean_grid.lon_bnds == np.array([[-22.5, 405]]))

Reasons why:

  • Bounds already exist but add_missing_bounds() doesn't see it because the "bounds" attr is not set
  • _add_bounds() breaks because there is only 1 coordinate point for lat and lon (conditional below)

What it should be doing instead:

  • Ignore add_missing_bounds() altogether

xcdat/xcdat/bounds.py

Lines 266 to 270 in e366c9a

# Validate coordinate shape and dimensions
if da_coord.ndim != 1:
raise ValueError("Cannot generate bounds for multidimensional coordinates.")
if da_coord.shape[0] <= 1:
raise ValueError("Cannot generate bounds for a coordinate of length <= 1.")

@@ -144,7 +144,7 @@ def add_missing_bounds(self, width: float = 0.5) -> xr.Dataset:
try:
self.get_bounds(axis)
except KeyError:
self._dataset = self._add_bounds(axis, width)
self._dataset = self.add_bounds(axis, width)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accidentally referenced the wrong method.

Comment on lines -570 to -575
ds_no_lat = ds.drop_vars(["lat"])
ds_no_lat = ds.drop_dims(["lat"])

with pytest.raises(KeyError):
ds_no_lat.regridder.grid

ds_no_lon = ds.drop_vars(["lon"])
Copy link
Collaborator Author

@tomvothecoder tomvothecoder Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to drop the axis's dimension because its dimension can still exist without coordinates.

By (our) definition, the axis doesn't exist if the corresponding dimension does not exist.

@@ -21,6 +21,7 @@
"X": ["X", "longitude", "lon"],
"Y": ["Y", "latitude", "lat"],
"T": ["T", "time"],
"Z": ["Z", "height", "pressure"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the axis coordinate mapping table

Copy link
Collaborator

@jasonb5 jasonb5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +18 to +20
- xesmf=0.6.2
- esmpy=8.2.0
- numba=0.53.1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to add this in #44 so the readthedocs build failed

@tomvothecoder
Copy link
Collaborator Author

LGTM

Thanks Jason!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
None yet
5 participants