better handling of non-cf-compliant time data #263

pochedls · 2022-06-23T20:48:10Z

Description

xcdat can decode some non-cf compliant data, but it cannot handle every case. This update makes it so that xcdat attempts to decode non-cf-compliant data and if it fails it returns a non-decoded dataset (instead of raising a ValueError).

Closes [Bug]: Failure to decode non-CF compliant time #261

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

If applicable:

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass with my changes (locally and CI/CD build)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

pochedls · 2022-06-23T20:49:12Z

tests/fixtures.py

+tb = []
+for t in time_non_cf_unsupported:
+    tb.append([t - 1 / 24.0, t + 1 / 24.0])
+time_bnds_non_cf_unsupported = xr.DataArray(
+    name="time_bnds",
+    data=tb,
+    coords={"time": time_non_cf_unsupported},
+    dims=["time", "bnds"],
+    attrs={"is_generated": "True"},
+)


I created test bounds for a dataset that is non-cf-compliant (but not handled by xcdat), but I didn't end up using these in the unit test I created (see below).

pochedls · 2022-06-23T20:54:58Z

tests/test_dataset.py

+    def test_non_cf_compliant_and_unsupported_time_is_not_decoded(self):
+        ds = generate_dataset(cf_compliant=False, has_bounds=True, unsupported=True)
+        ds.to_netcdf(self.file_path)
+
+        # even though decode_times=True, it should fail to decode unsupported time axis
+        result = open_dataset(self.file_path, decode_times=True)
+        expected_times = np.arange(1850 + 1 / 24.0, 1851 + 3 / 12.0, 1 / 12.0)
+
+        assert np.all(expected_times == result.time.values)
+


Unit tests can sometimes be tough to follow. Let me explain what I did here: the generic generate_dataset functionality now has the option (unsupported=False) to create a non-cf-compliant time axis that is also unsupported by xcdat.

The expected behavior is for xcdat to return a time axis that is not decoded (since we do not support decoding for this time axis).

I tried to generate a time axis such that result['time'].identical would evaluate to True, but I couldn't get this to work (even though the values, metadata, and bounds all appeared to be identical). So I just ensure the time values are the same. Comment if this is problematic.

pochedls · 2022-06-23T20:57:40Z

xcdat/dataset.py

+    # if the time axis cannot be split, we do not yet
+    # support time decoding and we return the original
+    # dataset
+    try:
+        units, ref_date = _split_time_units_attr(units_attr)
+    except ValueError:
+        return ds


If there is no "since" in the time units, then the dataset is non-cf-compliant and unsupported. As a result _split_time_units_attr will return a ValueError. Instead of continuing to decode the time axis (via this function decode_non_cf_time) we return the original (non-decoded) dataset.

pochedls · 2022-06-23T21:01:07Z

xcdat/dataset.py

+    # if the time units attr cannot be split it is not cf_compliant
+    try:
+        units = _split_time_units_attr(time.attrs.get("units"))[0]
+    except ValueError:
+        return False


My read is that if there is no "since" the time axis is not cf-compliant here.

durack1 · 2022-06-23T21:14:29Z

@pochedls if you want a quick check of CF validity, point your file at https://pumatest.nerc.ac.uk/cgi-bin/cf-checker.pl - catches all manner of things that aren't obvious through a quick ncdump scan

pochedls · 2022-06-23T21:23:59Z

Just a note that I am getting some warnings, but I don't think I caused them:

tests/test_regrid.py::TestRegrid2Regridder::test_output_bounds
tests/test_regrid.py::TestAccessor::test_convenience_methods
/home/pochedley1/code/xcdat/xcdat/regridder/regrid2.py:233: RuntimeWarning: invalid value encountered in true_divide
np.multiply(input_lon_segment, dot_weight).sum(

tests/test_regrid.py::TestXESMFRegridder::test_regrid
/home/pochedley1/bin/anaconda3/envs/xcdat_dev/lib/python3.9/site-packages/numba/np/arraymath.py:3845: DeprecationWarning: np.MachAr is deprecated (NumPy 1.22).
@overload(np.MachAr)

tomvothecoder

Overall, it looks great to me!

After you take a look at some of my minor updates in this commit, feel free to merge.

tomvothecoder · 2022-06-24T21:41:04Z

tests/test_dataset.py

        # even though decode_times=True, it should fail to decode unsupported time axis
        result = open_dataset(self.file_path, decode_times=True)
-        expected_times = np.arange(1850 + 1 / 24.0, 1851 + 3 / 12.0, 1 / 12.0)
+        expected = ds

-        assert np.all(expected_times == result.time.values)
+        assert result.identical(expected)


I updated the assertion statement since we expect the result to be the same as the original dataset.

tomvothecoder · 2022-06-24T21:41:30Z

xcdat/dataset.py

    ValueError
        If the time units attribute is not of the form `X since Y`.
    """
    if units_attr is None:
-        raise KeyError("No 'units' attribute found for the dataset's time coordinates.")
+        raise KeyError("The dataset's time coordinates does not have a 'units' attr.")


Updated a preexisting KeyError message

tomvothecoder · 2022-06-24T21:45:50Z

Oh yeah, I also rebased this branch on the latest main so there are no merge conflicts.

pochedls · 2022-06-24T22:49:37Z

I updated the assertion statement since we expect the result to be the same as the original dataset.

Thank you...this makes a lot of sense and I wish this is what I had done.

Oh yeah, I also rebased this branch on the latest main so there are no merge conflicts.

You just needed to do this because you just merged #262, right? Or did I make a mistake?

I will squash and merge shortly. Thank you for the review, @tomvothecoder.

tomvothecoder · 2022-06-24T23:33:59Z

Just a note that I am getting some warnings, but I don't think I caused them:

tests/test_regrid.py::TestRegrid2Regridder::test_output_bounds
tests/test_regrid.py::TestAccessor::test_convenience_methods
/home/pochedley1/code/xcdat/xcdat/regridder/regrid2.py:233: RuntimeWarning: invalid value encountered in true_divide
np.multiply(input_lon_segment, dot_weight).sum(
tests/test_regrid.py::TestXESMFRegridder::test_regrid
/home/pochedley1/bin/anaconda3/envs/xcdat_dev/lib/python3.9/site-packages/numba/np/arraymath.py:3845: DeprecationWarning: np.MachAr is deprecated (NumPy 1.22).
@overload(np.MachAr)

This is from numba, which was introduced in PR #44. I'll see if there's a way to fix it in a separate PR.

Oh yeah, I also rebased this branch on the latest main so there are no merge conflicts.

You just needed to do this because you just merged #262, right? Or did I make a mistake?

I will squash and merge shortly. Thank you for the review, @tomvothecoder.

No mistakes were made, #262 was merged so main was ahead of this branch. Just needed to rebase on top of the latest main to avoid conflicts.

Thanks for the great work!

pochedls commented Jun 23, 2022

View reviewed changes

pochedls requested a review from tomvothecoder June 23, 2022 21:24

pochedls added 2 commits June 24, 2022 14:22

attempt at #261

bfcd38f

adding some documentation

1cbc602

tomvothecoder force-pushed the bugfix/261-decode-non-cf-time branch from c678c8b to 1cbc602 Compare June 24, 2022 21:23

Add PR review fixes

a2f5fbd

tomvothecoder approved these changes Jun 24, 2022

View reviewed changes

tomvothecoder added type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. type: enhancement New enhancement request Priority: High labels Jun 24, 2022

tomvothecoder added this to In progress in v0.3.0 via automation Jun 24, 2022

pochedls merged commit 91b1d88 into main Jun 24, 2022

v0.3.0 automation moved this from In progress to Done Jun 24, 2022

pochedls deleted the bugfix/261-decode-non-cf-time branch June 24, 2022 23:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better handling of non-cf-compliant time data #263

better handling of non-cf-compliant time data #263

pochedls commented Jun 23, 2022 •

edited

Loading

pochedls Jun 23, 2022

pochedls Jun 23, 2022 •

edited

Loading

pochedls Jun 23, 2022 •

edited

Loading

pochedls Jun 23, 2022

durack1 commented Jun 23, 2022

pochedls commented Jun 23, 2022

tomvothecoder left a comment

tomvothecoder Jun 24, 2022

tomvothecoder Jun 24, 2022

tomvothecoder commented Jun 24, 2022

pochedls commented Jun 24, 2022

tomvothecoder commented Jun 24, 2022

better handling of non-cf-compliant time data #263

better handling of non-cf-compliant time data #263

Conversation

pochedls commented Jun 23, 2022 • edited Loading

Description

Checklist

pochedls Jun 23, 2022

Choose a reason for hiding this comment

pochedls Jun 23, 2022 • edited Loading

Choose a reason for hiding this comment

pochedls Jun 23, 2022 • edited Loading

Choose a reason for hiding this comment

pochedls Jun 23, 2022

Choose a reason for hiding this comment

durack1 commented Jun 23, 2022

pochedls commented Jun 23, 2022

tomvothecoder left a comment

Choose a reason for hiding this comment

tomvothecoder Jun 24, 2022

Choose a reason for hiding this comment

tomvothecoder Jun 24, 2022

Choose a reason for hiding this comment

tomvothecoder commented Jun 24, 2022

pochedls commented Jun 24, 2022

tomvothecoder commented Jun 24, 2022

pochedls commented Jun 23, 2022 •

edited

Loading

pochedls Jun 23, 2022 •

edited

Loading

pochedls Jun 23, 2022 •

edited

Loading