Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better handling of non-cf-compliant time data #263

Merged
merged 3 commits into from
Jun 24, 2022

Conversation

pochedls
Copy link
Collaborator

@pochedls pochedls commented Jun 23, 2022

Description

xcdat can decode some non-cf compliant data, but it cannot handle every case. This update makes it so that xcdat attempts to decode non-cf-compliant data and if it fails it returns a non-decoded dataset (instead of raising a ValueError).

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

Comment on lines +115 to +124
tb = []
for t in time_non_cf_unsupported:
tb.append([t - 1 / 24.0, t + 1 / 24.0])
time_bnds_non_cf_unsupported = xr.DataArray(
name="time_bnds",
data=tb,
coords={"time": time_non_cf_unsupported},
dims=["time", "bnds"],
attrs={"is_generated": "True"},
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created test bounds for a dataset that is non-cf-compliant (but not handled by xcdat), but I didn't end up using these in the unit test I created (see below).

Comment on lines 40 to 49
def test_non_cf_compliant_and_unsupported_time_is_not_decoded(self):
ds = generate_dataset(cf_compliant=False, has_bounds=True, unsupported=True)
ds.to_netcdf(self.file_path)

# even though decode_times=True, it should fail to decode unsupported time axis
result = open_dataset(self.file_path, decode_times=True)
expected_times = np.arange(1850 + 1 / 24.0, 1851 + 3 / 12.0, 1 / 12.0)

assert np.all(expected_times == result.time.values)

Copy link
Collaborator Author

@pochedls pochedls Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests can sometimes be tough to follow. Let me explain what I did here: the generic generate_dataset functionality now has the option (unsupported=False) to create a non-cf-compliant time axis that is also unsupported by xcdat.

The expected behavior is for xcdat to return a time axis that is not decoded (since we do not support decoding for this time axis).

I tried to generate a time axis such that result['time'].identical would evaluate to True, but I couldn't get this to work (even though the values, metadata, and bounds all appeared to be identical). So I just ensure the time values are the same. Comment if this is problematic.

xcdat/dataset.py Outdated
Comment on lines 308 to 316
# if the time axis cannot be split, we do not yet
# support time decoding and we return the original
# dataset
try:
units, ref_date = _split_time_units_attr(units_attr)
except ValueError:
return ds
Copy link
Collaborator Author

@pochedls pochedls Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no "since" in the time units, then the dataset is non-cf-compliant and unsupported. As a result _split_time_units_attr will return a ValueError. Instead of continuing to decode the time axis (via this function decode_non_cf_time) we return the original (non-decoded) dataset.

xcdat/dataset.py Outdated
Comment on lines 413 to 419
# if the time units attr cannot be split it is not cf_compliant
try:
units = _split_time_units_attr(time.attrs.get("units"))[0]
except ValueError:
return False
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My read is that if there is no "since" the time axis is not cf-compliant here.

@durack1
Copy link
Collaborator

durack1 commented Jun 23, 2022

@pochedls if you want a quick check of CF validity, point your file at https://pumatest.nerc.ac.uk/cgi-bin/cf-checker.pl - catches all manner of things that aren't obvious through a quick ncdump scan

@pochedls
Copy link
Collaborator Author

Just a note that I am getting some warnings, but I don't think I caused them:

tests/test_regrid.py::TestRegrid2Regridder::test_output_bounds
tests/test_regrid.py::TestAccessor::test_convenience_methods
/home/pochedley1/code/xcdat/xcdat/regridder/regrid2.py:233: RuntimeWarning: invalid value encountered in true_divide
np.multiply(input_lon_segment, dot_weight).sum(

tests/test_regrid.py::TestXESMFRegridder::test_regrid
/home/pochedley1/bin/anaconda3/envs/xcdat_dev/lib/python3.9/site-packages/numba/np/arraymath.py:3845: DeprecationWarning: np.MachAr is deprecated (NumPy 1.22).
@overload(np.MachAr)

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks great to me!

After you take a look at some of my minor updates in this commit, feel free to merge.

Comment on lines 44 to 48
# even though decode_times=True, it should fail to decode unsupported time axis
result = open_dataset(self.file_path, decode_times=True)
expected_times = np.arange(1850 + 1 / 24.0, 1851 + 3 / 12.0, 1 / 12.0)
expected = ds

assert np.all(expected_times == result.time.values)
assert result.identical(expected)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the assertion statement since we expect the result to be the same as the original dataset.

ValueError
If the time units attribute is not of the form `X since Y`.
"""
if units_attr is None:
raise KeyError("No 'units' attribute found for the dataset's time coordinates.")
raise KeyError("The dataset's time coordinates does not have a 'units' attr.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated a preexisting KeyError message

@tomvothecoder tomvothecoder added type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. type: enhancement New enhancement request Priority: High labels Jun 24, 2022
@tomvothecoder tomvothecoder added this to In progress in v0.3.0 via automation Jun 24, 2022
@tomvothecoder
Copy link
Collaborator

Oh yeah, I also rebased this branch on the latest main so there are no merge conflicts.

@pochedls
Copy link
Collaborator Author

I updated the assertion statement since we expect the result to be the same as the original dataset.

Thank you...this makes a lot of sense and I wish this is what I had done.

Oh yeah, I also rebased this branch on the latest main so there are no merge conflicts.

You just needed to do this because you just merged #262, right? Or did I make a mistake?

I will squash and merge shortly. Thank you for the review, @tomvothecoder.

@tomvothecoder
Copy link
Collaborator

Just a note that I am getting some warnings, but I don't think I caused them:

tests/test_regrid.py::TestRegrid2Regridder::test_output_bounds
tests/test_regrid.py::TestAccessor::test_convenience_methods
/home/pochedley1/code/xcdat/xcdat/regridder/regrid2.py:233: RuntimeWarning: invalid value encountered in true_divide
np.multiply(input_lon_segment, dot_weight).sum(
tests/test_regrid.py::TestXESMFRegridder::test_regrid
/home/pochedley1/bin/anaconda3/envs/xcdat_dev/lib/python3.9/site-packages/numba/np/arraymath.py:3845: DeprecationWarning: np.MachAr is deprecated (NumPy 1.22).
@overload(np.MachAr)

This is from numba, which was introduced in PR #44. I'll see if there's a way to fix it in a separate PR.

Oh yeah, I also rebased this branch on the latest main so there are no merge conflicts.

You just needed to do this because you just merged #262, right? Or did I make a mistake?

I will squash and merge shortly. Thank you for the review, @tomvothecoder.

No mistakes were made, #262 was merged so main was ahead of this branch. Just needed to rebase on top of the latest main to avoid conflicts.

Thanks for the great work!

@pochedls pochedls merged commit 91b1d88 into main Jun 24, 2022
v0.3.0 automation moved this from In progress to Done Jun 24, 2022
@pochedls pochedls deleted the bugfix/261-decode-non-cf-time branch June 24, 2022 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. type: enhancement New enhancement request
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

[Bug]: Failure to decode non-CF compliant time
3 participants