Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor]: Possibly refactor our logic for converting numeric time values to datetime #310

Closed
tomvothecoder opened this issue Aug 11, 2022 · 4 comments
Assignees
Labels
type: enhancement New enhancement request

Comments

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Aug 11, 2022

I found some existing functions that convert numeric time values to datetime objects (which we need specifically for decoding non-CF compliant time units). We might be able to refactor our implementation of this function, or even remove it entirely and use one of these instead.

Our implementation: _get_cftime_coords()

xcdat/xcdat/dataset.py

Lines 694 to 733 in 4c51879

def _get_cftime_coords(
ref_date: str, offsets: np.ndarray, calendar: str, units: str
) -> np.ndarray:
"""Get an array of `cftime` coordinates starting from a reference date.
Parameters
----------
ref_date : str
The starting reference date.
offsets : np.ndarray
An array of numerically encoded time offsets from the reference date.
calendar : str
The CF calendar type supported by ``cftime``. This includes "noleap",
"360_day", "365_day", "366_day", "gregorian", "proleptic_gregorian",
"julian", "all_leap", and "standard".
units : str
The time units.
Returns
-------
np.ndarray
An array of `cftime` coordinates.
"""
# Starting from the reference date, create an array of `datetime` objects
# by adding each offset (a numerically encoded value) to the reference date.
# The `parse.parse` default is set to datetime(2000, 1, 1), with each
# component being a placeholder if the value does not exist. For example, 1
# and 1 are placeholders for month and day if those values don't exist.
ref_datetime: datetime = parser.parse(ref_date, default=datetime(2000, 1, 1))
offsets = np.array(
[ref_datetime + rd.relativedelta(**{units: offset}) for offset in offsets],
dtype="object",
)
# Convert the array of `datetime` objects into `cftime` objects based on
# the calendar type.
date_type = get_date_type(calendar)
coords = convert_times(offsets, date_type=date_type)
return coords

cftime.num2date

API Docs: https://unidata.github.io/cftime/api.html#cftime.num2date
API Code: https://github.com/Unidata/cftime/blob/76336464e27809c55471f360c46bfbea78787057/src/cftime/_cftime.pyx#L507-L628

def num2date(
    times,
    units,
    calendar='standard',
    only_use_cftime_datetimes=True,
    only_use_python_datetimes=False,
    has_year_zero=None
):
    """
    Return datetime objects given numeric time values. The units
    of the numeric time values are described by the **units** argument
    and the **calendar** keyword. The returned datetime objects represent
    UTC with no time-zone offset, even if the specified

EDIT 9/27/22 This function does not meet our requirements because "months since is allowed only for the 360_day calendar and common_years since is allowed only for the 365_day calendar."

units: a string of the form <time units> since <reference time> describing the time units. <time units> can be days, hours, minutes, seconds, milliseconds or microseconds. <reference time> is the time origin. months since is allowed only for the 360_day calendar and common_years since is allowed only for the 365_day calendar.

--- https://unidata.github.io/cftime/api.html#cftime.num2date

xr.coding.times.decode_cf_datetime()

This function uses cftime.num2date().

API Code: https://github.com/pydata/xarray/blob/f8fee902360f2330ab8c002d54480d357365c172/xarray/coding/times.py#L253-L302

Docstring


def decode_cf_datetime(num_dates, units, calendar=None, use_cftime=None):
    """Given an array of numeric dates in netCDF format, convert it into a
    numpy array of date time objects.
    For standard (Gregorian) calendars, this function uses vectorized
    operations, which makes it much faster than cftime.num2date. In such a
    case, the returned array will be of type np.datetime64.
    Note that time unit in `units` must not be smaller than microseconds and
    not larger than days.
    See Also
    --------
    cftime.num2date
    """

This one might now work because the docstring says, "Note that time unit in units must not be smaller than microseconds and not larger than days."

@tomvothecoder tomvothecoder changed the title Possibly refactor our logic for converting numeric time to datetime Possibly refactor our logic for converting numeric time values to datetime Aug 11, 2022
@tomvothecoder tomvothecoder self-assigned this Aug 11, 2022
@tomvothecoder tomvothecoder changed the title Possibly refactor our logic for converting numeric time values to datetime [Refactor] Possibly refactor our logic for converting numeric time values to datetime Aug 11, 2022
@tomvothecoder tomvothecoder changed the title [Refactor] Possibly refactor our logic for converting numeric time values to datetime [Refactor]: Possibly refactor our logic for converting numeric time values to datetime Aug 11, 2022
@jypeter
Copy link

jypeter commented Aug 12, 2022

Can you handle all the calendars that can be found in climate models (calendars in CF convention)?

@oliviermarti mentioned problems he had with understanding and working with datetime(64). I don't remember exactly what the problems were

What I remember, is how easy it was to work with the component times and relative times in cdms2 (starting page 111 of cdms5.pdf, available in #170)

Also, being able to easily check the actual dates of a time axis, using the asComponentTime() method helped me check the consistency of datasets countless times! And easily check if the leap years were handled correctly...

>>> import cdms2, vcs
>>> dataf = vcs.sample_data + '/tas_mo.nc'
>>> print(dataf)
/home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/share/cdat/sample_data/tas_mo.nc

>>> f_in = cdms2.open(dataf)
>>> v_in = f_in('tas')
>>> time_ax = v_in.getTime()
>>> time_ax.isTime()
True
>>> time_ax
   id: time
   Designated a time axis.
   units:  days since 1979-1-1 0
   Length: 206
   First:  15.5
   Last:   6254.5
   Other axis attributes:
      axis: T
      calendar: gregorian
      realtopology: linear
   Python id:  0x2accc116c370

>>> time_ax[:18]
array([ 15.5,  45. ,  74.5, 105. , 135.5, 166. , 196.5, 227.5, 258. ,
       288.5, 319. , 349.5, 380.5, 410.5, 440.5, 471. , 501.5, 532. ])
>>> time_ax[-18:]
array([5737. , 5767.5, 5798. , 5828.5, 5859.5, 5889. , 5918.5, 5949. ,
       5979.5, 6010. , 6040.5, 6071.5, 6102. , 6132.5, 6163. , 6193.5,
       6224.5, 6254.5])

>>> time_ax.asComponentTime()[:18]
[1979-1-16 12:0:0.0, 1979-2-15 0:0:0.0, 1979-3-16 12:0:0.0, 1979-4-16 0:0:0.0, 1979-5-16 12:0:0.0, 1979-6-16 0:0:0.0, 1979-7-16 12:0:0.0, 1979-8-16 12:0:0.0, 1979-9-16 0:0:0.0, 1979-10-16 12:0:0.0, 1979-11-16 0:0:0.0, 1979-12-16 12:0:0.0, 1980-1-16 12:0:0.0, 1980-2-15 12:0:0.0, 1980-3-16 12:0:0.0, 1980-4-16 0:0:0.0, 1980-5-16 12:0:0.0, 1980-6-16 0:0:0.0]

>>> time_ax.asComponentTime()[-18:]
[1994-9-16 0:0:0.0, 1994-10-16 12:0:0.0, 1994-11-16 0:0:0.0, 1994-12-16 12:0:0.0, 1995-1-16 12:0:0.0, 1995-2-15 0:0:0.0, 1995-3-16 12:0:0.0, 1995-4-16 0:0:0.0, 1995-5-16 12:0:0.0,  1995-6-16 0:0:0.0, 1995-7-16 12:0:0.0, 1995-8-16 12:0:0.0, 1995-9-16 0:0:0.0, 1995-10-16 12:0:0.0, 1995-11-16 0:0:0.0, 1995-12-16 12:0:0.0, 1996-1-16 12:0:0.0, 1996-2-15 12:0:0.0]

>>> f_in.close()

@durack1
Copy link
Collaborator

durack1 commented Aug 12, 2022

What I remember is how easy it was to work with the component times and relative times in cdms2 (starting page 111 of cdms5.pdf, available in #170)

@jypeter, I couldn't agree more. The comptime objects I found incredibly useful to use, interpret and validate that I was getting what I thought. Having said that, I am unaware of how much of this functionality is reproduced in the cftime library, if it has an equivalent to the comptime approach that would likely satisfy my needs.

(cdm315) bash-4.2$ python
Python 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cdtime as cdt
>>> dir(cdt)
['Calendar360', 'ClimCalendar', 'ClimLeapCalendar', 'Day', 'Days', 'DefaultCalendar', 'GregorianCalendar',
'Hour', 'Hours', 'JulianCalendar', 'Minute', 'Minutes', 'MixedCalendar', 'Month', 'Months', 'NoLeapCalendar',
'Season', 'Seasons', 'Second', 'Seconds', 'StandardCalendar', 'Week', 'Weeks', 'Year', 'Years', '__all__',
'__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__',
'__spec__', '_cdtime', 'abstime', 'c2r', 'compare', 'componenttime', 'compreltime', 'comptime', 'dump',
'error', 'numpy', 'r2c', 'r2r', 'relativetime', 'reltime', 's2c', 's2r']
# example use component time
>>> ct = cdt.comptime(2022,8,12,11,16,00)
>>> ct
2022-8-12 11:16:0.0
>>> print(ct.add(36, cdt.Hours))
2022-8-13 23:15:60.0
# example use relative time
>>> rt = cdt.reltime(11,"days since 2021-8-1 0:0:0")
>>> rt
11.000000 days since 2021-8-1 0:0:0
>>> print(rt.add(1., cdt.Day))
12.000000 days since 2021-8-1 0:0:0

# comptime and reltime classes
>>> dir(ct)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'add', 'cmp',
'sub', 'tocomp', 'tocomponent', 'torel', 'torelative']
>>> dir(rt)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'add', 'cmp', 'sub',
'tocomp', 'tocomponent', 'torel', 'torelative']

@pochedls
Copy link
Collaborator

pochedls commented Aug 12, 2022

This ticket illustrates some issues with pandas, which uses datetime64. For this reason, we've recognized that cftime is a better approach. cftime does support the calendars that adhere to CF conventions. The time axis is pretty human readable via xcdat / xarray:

In [ ]: ds.time
Out[ ]:
<xarray.DataArray 'time' (time: 1032)>
array([cftime.Datetime360Day(2015, 1, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(2015, 2, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(2015, 3, 1, 0, 0, 0, 0, has_year_zero=True), ...,
cftime.Datetime360Day(2100, 10, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(2100, 11, 1, 0, 0, 0, 0, has_year_zero=True),
cftime.Datetime360Day(2100, 12, 1, 0, 0, 0, 0, has_year_zero=True)],
dtype=object)
Coordinates:

  • time (time) object 2015-01-01 00:00:00 ... 2100-12-01 00:00:00
    Attributes:
    bounds: time_bnds
    axis: T
    realtopology: linear

And you can also format the time axis and perform cftime operations.

@tomvothecoder
Copy link
Collaborator Author

I found that cftime.num2date and xr.coding.times.decode_cf_datetime() have limitations around non-CF compliant units (issue description was updated).

Our implementation of _get_cftime_coords() meets our requirements without reproducing behaviors from other libraries. Closing this issue.

@tomvothecoder tomvothecoder added the type: enhancement New enhancement request label Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
None yet
Development

No branches or pull requests

4 participants