-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better handling of non-cf-compliant time data #263
Conversation
tb = [] | ||
for t in time_non_cf_unsupported: | ||
tb.append([t - 1 / 24.0, t + 1 / 24.0]) | ||
time_bnds_non_cf_unsupported = xr.DataArray( | ||
name="time_bnds", | ||
data=tb, | ||
coords={"time": time_non_cf_unsupported}, | ||
dims=["time", "bnds"], | ||
attrs={"is_generated": "True"}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created test bounds for a dataset that is non-cf-compliant (but not handled by xcdat), but I didn't end up using these in the unit test I created (see below).
tests/test_dataset.py
Outdated
def test_non_cf_compliant_and_unsupported_time_is_not_decoded(self): | ||
ds = generate_dataset(cf_compliant=False, has_bounds=True, unsupported=True) | ||
ds.to_netcdf(self.file_path) | ||
|
||
# even though decode_times=True, it should fail to decode unsupported time axis | ||
result = open_dataset(self.file_path, decode_times=True) | ||
expected_times = np.arange(1850 + 1 / 24.0, 1851 + 3 / 12.0, 1 / 12.0) | ||
|
||
assert np.all(expected_times == result.time.values) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tests can sometimes be tough to follow. Let me explain what I did here: the generic generate_dataset
functionality now has the option (unsupported=False
) to create a non-cf-compliant time axis that is also unsupported by xcdat.
The expected behavior is for xcdat to return a time axis that is not decoded (since we do not support decoding for this time axis).
I tried to generate a time axis such that result['time'].identical
would evaluate to True, but I couldn't get this to work (even though the values, metadata, and bounds all appeared to be identical). So I just ensure the time values are the same. Comment if this is problematic.
xcdat/dataset.py
Outdated
# if the time axis cannot be split, we do not yet | ||
# support time decoding and we return the original | ||
# dataset | ||
try: | ||
units, ref_date = _split_time_units_attr(units_attr) | ||
except ValueError: | ||
return ds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no "since" in the time units, then the dataset is non-cf-compliant and unsupported. As a result _split_time_units_attr
will return a ValueError. Instead of continuing to decode the time axis (via this function decode_non_cf_time
) we return the original (non-decoded) dataset.
xcdat/dataset.py
Outdated
# if the time units attr cannot be split it is not cf_compliant | ||
try: | ||
units = _split_time_units_attr(time.attrs.get("units"))[0] | ||
except ValueError: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My read is that if there is no "since" the time axis is not cf-compliant here.
@pochedls if you want a quick check of CF validity, point your file at https://pumatest.nerc.ac.uk/cgi-bin/cf-checker.pl - catches all manner of things that aren't obvious through a quick |
Just a note that I am getting some warnings, but I don't think I caused them:
|
c678c8b
to
1cbc602
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks great to me!
After you take a look at some of my minor updates in this commit, feel free to merge.
# even though decode_times=True, it should fail to decode unsupported time axis | ||
result = open_dataset(self.file_path, decode_times=True) | ||
expected_times = np.arange(1850 + 1 / 24.0, 1851 + 3 / 12.0, 1 / 12.0) | ||
expected = ds | ||
|
||
assert np.all(expected_times == result.time.values) | ||
assert result.identical(expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the assertion statement since we expect the result to be the same as the original dataset.
ValueError | ||
If the time units attribute is not of the form `X since Y`. | ||
""" | ||
if units_attr is None: | ||
raise KeyError("No 'units' attribute found for the dataset's time coordinates.") | ||
raise KeyError("The dataset's time coordinates does not have a 'units' attr.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated a preexisting KeyError message
Oh yeah, I also rebased this branch on the latest |
Thank you...this makes a lot of sense and I wish this is what I had done.
You just needed to do this because you just merged #262, right? Or did I make a mistake? I will squash and merge shortly. Thank you for the review, @tomvothecoder. |
This is from
No mistakes were made, #262 was merged so Thanks for the great work! |
Description
xcdat can decode some non-cf compliant data, but it cannot handle every case. This update makes it so that xcdat attempts to decode non-cf-compliant data and if it fails it returns a non-decoded dataset (instead of raising a ValueError).
Checklist
If applicable: