Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Failure to decode non-CF compliant time #261

Closed
pochedls opened this issue Jun 21, 2022 · 4 comments · Fixed by #263
Closed

[Bug]: Failure to decode non-CF compliant time #261

pochedls opened this issue Jun 21, 2022 · 4 comments · Fixed by #263
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@pochedls
Copy link
Collaborator

pochedls commented Jun 21, 2022

What happened?

I tried opening a dataset with non-CF compliant time units. We don't currently have code to process this particular type of non-CF compliant time axis so I got an error.

What did you expect to happen?

Ideally, we could change open_dataset (and open_mfdataset) to attempt to decode_non_cf_time and if it fails, simply do not decode the time axis, e.g., here:

        if cf_compliant_time is False:
            # XCDAT handles decoding time values with non-CF units.
            ds = xr.open_dataset(path, decode_times=False, **kwargs)
            try:
                 ds = decode_non_cf_time(ds)
            except:
                 # some log message to note that decoding failed

[Note that there would also need to be some error handling in _has_cf_compliant_time]

Minimal Complete Verifiable Example

fn = '/p/user_pub/climate_work/pochedley1/surface/best.nc'
ds = xcdat.open_dataset(fn)

Relevant log output

~/code/xcdat/xcdat/dataset.py in _split_time_units_attr(units_attr)
    612         raise KeyError("No 'units' attribute found for the dataset's time coordinates.")
    613 
--> 614     units, reference_date = units_attr.split(" since ")
    615 
    616     return units, reference_date

ValueError: not enough values to unpack (expected 2, got 1)

This error results because we were expecting units of the form "X since Y", but the units are year A.D. (time axis has decimal years, i.e., 1850.041667, 1850.125, ...).

Anything else we need to know?

Should I attempt a PR?

Environment

Recent-ish xcdat dev environment.

@pochedls pochedls added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Jun 21, 2022
@tomvothecoder
Copy link
Collaborator

Notes from meeting

  • Use try and except to catch this ValueError, and don't attempt to decode the coordinates
    • Although most likely uncommon, there might be cases where other types of non-CF compliant time units are used. We want to avoid having to handle all of these cases internally.
  • Instead, the user should open the dataset, fix the time units, and call decode_non_cf_time_units() to decode time units
  • Temporal averaging requires decoded time units because it uses xarray.core.accessor_dt.DatetimeAccessor in the lines below:

    xcdat/xcdat/temporal.py

    Lines 1106 to 1123 in 350ac8b

    # Use the TIME_GROUPS dictionary to determine which components
    # are needed to form the labeled time coordinates.
    for component in TIME_GROUPS[self._mode][self._freq]:
    df[component] = time_coords[f"{self._dim_name}.{component}"].values
    # The season frequency requires additional datetime components for
    # processing, which are later removed before time coordinates are
    # labeled for grouping. These components weren't included in the
    # `TIME_GROUPS` dictionary for the "season" frequency because
    # `TIME_GROUPS` represents the final grouping labels.
    if self._freq == "season":
    if self._mode in ["climatology", "departures"]:
    df["year"] = time_coords[f"{self._dim_name}.year"].values
    df["month"] = time_coords[f"{self._dim_name}.month"].values
    if self._mode == "group_average":
    df["month"] = time_coords[f"{self._dim_name}.month"].values

    start_year, end_year = (ds.time.dt.year.values[0], ds.time.dt.year.values[-1])

@durack1
Copy link
Collaborator

durack1 commented Jun 23, 2022

Here are 3 additional issues with time coords. These can be opened cdms2 but hit a problem with xcdat/xcdat/dataset.py, line 84, in open_dataset and xcdat/xcdat/dataset.py, lines 402-404, in _has_cf_compliant_time

/p/css03/esgf_publish/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/historical/r2i1p1f3/SIday/sitemptop/gr/v20191218/sitemptop_SIday_HadGEM3-GC31-MM_historical_r2i1p1f3_gr_19900101-19941230.nc
/p/css03/esgf_publish/CMIP6/PAMIP/CCCma/CanESM5/pdSST-futBKSeasSIC/r41i1p2f1/Amon/ua/gn/v20190429/ua_Amon_CanESM5_pdSST-futBKSeasSIC_r41i1p2f1_gn_200004-200105.nc
/p/css03/esgf_publish/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp245/r3i1p1f1/SImon/siu/gn/v20210517/siu_SImon_EC-Earth3_ssp245_r3i1p1f1_gn_203301-203312.nc

@pochedls
Copy link
Collaborator Author

These are file corruption issues that appear unrelated to this issue and would need further diagnosis.

For the first file, I get an error opening this using xcdat.open_dataset(fn, decode_times=False). I also get HDF or similar errors with ncdump -v time ... and accessing the time and/or time_bnds variables with cdms2 and netCDF4.

For the second file, the time_bnds variable is problematic/corrupted (again HDF error with cdms2, xarray, xcdat, and netCDF4).

The third one yields HDF errors when reading in the variable siu (again in cdms2, xarray, xcdat, ncdump, and netCDF4).

@durack1
Copy link
Collaborator

durack1 commented Jun 24, 2022

@pochedls yep, you're right all three are garbled. My code validated these using ncdump but somehow it seems neither that code, nor the publisher sha256 validation caught 2 of the 3 problems (the HadGEM3-GC31-MM is an upstream issue, faithfully synda'd).

I've just issued some errata logs, and these datasets have been flagged with @sashakames, so will be added to purge lists if that makes sense.

I'll spend some more time reviewing apparent "edge cases" in the future so as not to waste folks' time

tomvothecoder pushed a commit that referenced this issue Jun 24, 2022
pochedls added a commit that referenced this issue Jun 24, 2022
* attempt at #261

* adding some documentation

* Add PR review fixes

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants