Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: add_bounds() breaks when time coordinates are in cftime objects instead of datetime #240

Closed
2 tasks done
tomvothecoder opened this issue May 26, 2022 · 0 comments · Fixed by #241
Closed
2 tasks done
Assignees
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented May 26, 2022

Bug Report Criteria

  • Bug is not related to a data quality issue(s) beyond the scope of xCDAT
  • Bug is not related to core xarray APIs (please open an issue in the xarray repo if it is)

What happened?

cftime datetime objects are used to represent time coordinates for non-cf compliant calendars (360-day, noleap) and units ("months", "years"). Unlike datetime datetime objects, cftime datetime objects (e.g., cftime.Datetime, cftime.DatetimeNoLeap) don't support arithmetic involving timedelta64[ns], ints, floats, etc.

In the formula to calculate the lower and upper bounds for each coordinate point, a subtraction and addition operation is performed respectively (example below). The diffs array consists of timedelta64[ns], so it breaks (refer to MCV example and log outputs).

xcdat/xcdat/bounds.py

Lines 255 to 263 in 112eb58

# Add beginning and end points to account for lower and upper bounds.
diffs = np.insert(diffs, 0, diffs[0])
diffs = np.append(diffs, diffs[-1])
# Get lower and upper bounds by using the width relative to nearest point.
# Transpose both bound arrays into a 2D array.
lower_bounds = da_coord - diffs[:-1] * width
upper_bounds = da_coord + diffs[1:] * (1 - width)
bounds = np.array([lower_bounds, upper_bounds]).transpose()

Instead of subtracting diffs as a np.array of strings with a dtype of timedelta64[ns], we have to subtract using timedelta objects. This can be achieved by using pd.to_timedelta(diffs).

        # Add beginning and end points to account for lower and upper bounds.
        # np.array of string values with dtype "timedelta64[ns]""
        diffs = np.insert(diffs, 0, diffs[0])
        diffs = np.append(diffs, diffs[-1])

        # In xarray and xCDAT, `cftime` objects are used to represent time
        # coordinates for non-Cf compliant calendars (360-day, noleap) and
        # units ("months", "years"), instead of `datetime` objects. `cftime`
        # objects only support arithmetic using `timedelta`` objects, so
        # the values of `diffs` must be casted to `timedelta`.
        # FIXME: This line produces the warning: python3.9/site-packages/pandas
        # /core/arrays/datetimelike.py:1189: PerformanceWarning:
        # Adding/subtracting object-dtype array to TimedeltaArray not
        # vectorized.warnings.warn(
        diffs = pd.to_timedelta(diffs)

Related issue: Unidata/cftime#198

What did you expect to happen?

Bounds are generated regardless of the datetime object type used to represent time coordinates

Minimal Complete Verifiable Example

import xcdat

dataset_links = [
    "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_187001_189412.nc",
    "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_189501_191912.nc",
]

ds = xcdat.open_mfdataset(dataset_links)

# Drop the existing time bounds to demonstrate adding new bounds
ds = ds.drop_vars("time_bnds")

# Breaks here dataset_links = [
    "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_187001_189412.nc",
    "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_189501_191912.nc",
]

ds = ds.bounds.add_bounds("time")

Relevant log output

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/vo13/miniconda3/envs/xcdat_dev/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3397, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_9974/1848296045.py", line 1, in <cell line: 1>
    ds_new.bounds.add_bounds("time")
  File "/home/vo13/XCDAT/xcdat/xcdat/bounds.py", line 207, in add_bounds
    dataset = self._add_bounds(axis, width)
  File "/home/vo13/XCDAT/xcdat/xcdat/bounds.py", line 262, in _add_bounds
    lower_bounds = da_coord - diffs[:-1] * width
  File "/home/vo13/miniconda3/envs/xcdat_dev/lib/python3.9/site-packages/xarray/core/_typed_ops.py", line 209, in __sub__
    return self._binary_op(other, operator.sub)
  File "/home/vo13/miniconda3/envs/xcdat_dev/lib/python3.9/site-packages/xarray/core/dataarray.py", line 3098, in _binary_op
    f(self.variable, other_variable)
  File "/home/vo13/miniconda3/envs/xcdat_dev/lib/python3.9/site-packages/xarray/core/_typed_ops.py", line 399, in __sub__
    return self._binary_op(other, operator.sub)
  File "/home/vo13/miniconda3/envs/xcdat_dev/lib/python3.9/site-packages/xarray/core/variable.py", line 2467, in _binary_op
    f(self_data, other_data) if not reflexive else f(other_data, self_data)
numpy.core._exceptions._UFuncBinaryResolutionError: ufunc 'subtract' cannot use operands with types dtype('O') and dtype('<m8[ns]')

Anything else we need to know?

Related code:
https://github.com/Unidata/cftime/blob/dc75368cd02bbcd1352dbecfef10404a58683f94/src/cftime/_cftime.pyx#L1020-L1021

https://github.com/Unidata/cftime/blob/dc75368cd02bbcd1352dbecfef10404a58683f94/src/cftime/_cftime.pyx#L439-L472

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:22:55)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.45.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.3.0
pandas: 1.4.1
numpy: 1.22.3
scipy: 1.8.1
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2022.03.0
distributed: 2022.3.0
matplotlib: 3.5.1
cartopy: 0.20.1
seaborn: None
numbagg: None
fsspec: 2022.3.0
cupy: None
pint: None
sparse: None
setuptools: 61.2.0
pip: 22.0.4
conda: None
pytest: 7.1.1
IPython: 8.3.0
sphinx: 4.4.0

INSTALLED VERSIONS

commit: None
python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:22:55)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.45.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.3.0
pandas: 1.4.1
numpy: 1.22.3
scipy: 1.8.1
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2022.03.0
distributed: 2022.3.0
matplotlib: 3.5.1
cartopy: 0.20.1
seaborn: None
numbagg: None
fsspec: 2022.3.0
cupy: None
pint: None
sparse: None
setuptools: 61.2.0
pip: 22.0.4
conda: None
pytest: 7.1.1
IPython: 8.3.0
sphinx: 4.4.0
None

@tomvothecoder tomvothecoder added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label May 26, 2022
@tomvothecoder tomvothecoder self-assigned this May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant