Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc]: Calculating Daily Climatology and Departures from Monthly Time Series needs to use daily input #303

Closed
msahn opened this issue Aug 9, 2022 · 14 comments · Fixed by #350
Assignees
Labels
type: docs Updates to documentation

Comments

@msahn
Copy link

msahn commented Aug 9, 2022

Describe your documentation update

https://xcdat.readthedocs.io/en/v0.3.0/examples/climatology-and-departures.html#2.4-Daily-Cycle

I think the example calculation of daily climatology did not work properly because the input data was monthly data. It is expected that the output time steps are 365 or 366, but the example output time steps are 12.

@msahn msahn added the type: docs Updates to documentation label Aug 9, 2022
@msahn
Copy link
Author

msahn commented Aug 9, 2022

https://xcdat.readthedocs.io/en/v0.3.0/examples/climatology-and-departures.html#3.4.-Daily-Cycle

I think this part should also be modified using daily input data.

@lee1043
Copy link
Collaborator

lee1043 commented Aug 9, 2022

@msahn Nice catch! Agreed.

@chengzhuzhang
Copy link
Collaborator

Thank you for testing and catching this problem @msahn. @lee1043 I'm not sure if the docs on temporal averaging has been reviewed thoroughly, would you please help fix the docs here?

@tomvothecoder
Copy link
Collaborator

Thank you for reporting this issue @msahn.

Thank you for testing and catching this problem @msahn. @lee1043 I'm not sure if the docs on temporal averaging has been reviewed thoroughly, would you please help fix the docs here?

Yeah, we did a general review of these notebooks so it would be helpful if we had a more thorough review. If @lee1043 can take a look, that would be great.

@lee1043
Copy link
Collaborator

lee1043 commented Aug 10, 2022

@tomvothecoder @chengzhuzhang sure, I will take this a look.

@lee1043 lee1043 self-assigned this Aug 10, 2022
@lee1043
Copy link
Collaborator

lee1043 commented Aug 17, 2022

I am in a motion of experimenting loading daily dataset but getting below error, @tomvothecoder any chance this look familiar to you?

Loading daily netcdf file:

filepath2 = "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r1i1p1f1/day/tas/gn/v20191115/tas_day_ACCESS-ESM1-5_historical_r1i1p1f1_gn_20000101-20141231.nc"
ds2 = xcdat.open_dataset(filepath2)

# Unit adjust (-273.15, K to C)
ds2["tas"] = ds2.tas - 273.15

ds2

Error message:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [7], in <cell line: 2>()
      1 # Unit adjust (-273.15, K to C)
----> 2 ds2["tas"] = ds2.tas - 273.15
      4 ds2

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/_typed_ops.py:209, in DataArrayOpsMixin.__sub__(self, other)
    208 def __sub__(self, other):
--> 209     return self._binary_op(other, operator.sub)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/dataarray.py:3098, in DataArray._binary_op(self, other, f, reflexive)
   3094 other_variable = getattr(other, "variable", other)
   3095 other_coords = getattr(other, "coords", None)
   3097 variable = (
-> 3098     f(self.variable, other_variable)
   3099     if not reflexive
   3100     else f(other_variable, self.variable)
   3101 )
   3102 coords, indexes = self.coords._merge_raw(other_coords, reflexive)
   3103 name = self._result_name(other)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/_typed_ops.py:399, in VariableOpsMixin.__sub__(self, other)
    398 def __sub__(self, other):
--> 399     return self._binary_op(other, operator.sub)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/variable.py:2462, in Variable._binary_op(self, other, f, reflexive)
   2460     other_data, self_data, dims = _broadcast_compat_data(other, self)
   2461 else:
-> 2462     self_data, other_data, dims = _broadcast_compat_data(self, other)
   2463 keep_attrs = _get_keep_attrs(default=False)
   2464 attrs = self._attrs if keep_attrs else None

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/variable.py:2941, in _broadcast_compat_data(self, other)
   2938     dims = new_self.dims
   2939 else:
   2940     # rely on numpy broadcasting rules
-> 2941     self_data = self.data
   2942     other_data = other
   2943     dims = self.dims

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/variable.py:339, in Variable.data(self)
    337     return self._data
    338 else:
--> 339     return self.values

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/variable.py:512, in Variable.values(self)
    509 @property
    510 def values(self):
    511     """The variable's data as a numpy.ndarray"""
--> 512     return _as_array_or_item(self._data)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/variable.py:252, in _as_array_or_item(data)
    238 def _as_array_or_item(data):
    239     """Return the given values as a numpy array, or as an individual item if
    240     it's a 0d datetime64 or timedelta64 array.
    241 
   (...)
    250     TODO: remove this (replace with np.asarray) once these issues are fixed
    251     """
--> 252     data = np.asarray(data)
    253     if data.ndim == 0:
    254         if data.dtype.kind == "M":

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:552, in MemoryCachedArray.__array__(self, dtype)
    551 def __array__(self, dtype=None):
--> 552     self._ensure_cached()
    553     return np.asarray(self.array, dtype=dtype)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:549, in MemoryCachedArray._ensure_cached(self)
    547 def _ensure_cached(self):
    548     if not isinstance(self.array, NumpyIndexingAdapter):
--> 549         self.array = NumpyIndexingAdapter(np.asarray(self.array))

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:522, in CopyOnWriteArray.__array__(self, dtype)
    521 def __array__(self, dtype=None):
--> 522     return np.asarray(self.array, dtype=dtype)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:423, in LazilyIndexedArray.__array__(self, dtype)
    421 def __array__(self, dtype=None):
    422     array = as_indexable(self.array)
--> 423     return np.asarray(array[self.key], dtype=None)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/coding/variables.py:70, in _ElementwiseFunctionArray.__array__(self, dtype)
     69 def __array__(self, dtype=None):
---> 70     return self.func(self.array)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/coding/variables.py:137, in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype)
    133 def _apply_mask(
    134     data: np.ndarray, encoded_fill_values: list, decoded_fill_value: Any, dtype: Any
    135 ) -> np.ndarray:
    136     """Mask all matching values in a NumPy arrays."""
--> 137     data = np.asarray(data, dtype=dtype)
    138     condition = False
    139     for fv in encoded_fill_values:

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:423, in LazilyIndexedArray.__array__(self, dtype)
    421 def __array__(self, dtype=None):
    422     array = as_indexable(self.array)
--> 423     return np.asarray(array[self.key], dtype=None)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:93, in NetCDF4ArrayWrapper.__getitem__(self, key)
     92 def __getitem__(self, key):
---> 93     return indexing.explicit_indexing_adapter(
     94         key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
     95     )

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/core/indexing.py:712, in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
    690 """Support explicit indexing by delegating to a raw indexing method.
    691 
    692 Outer and/or vectorized indexers are supported by indexing a second time
   (...)
    709 Indexing result, in the form of a duck numpy-array.
    710 """
    711 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 712 result = raw_indexing_method(raw_key.tuple)
    713 if numpy_indices.tuple:
    714     # index the loaded np.ndarray
    715     result = NumpyIndexingAdapter(np.asarray(result))[numpy_indices]

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:106, in NetCDF4ArrayWrapper._getitem(self, key)
    104     with self.datastore.lock:
    105         original_array = self.get_array(needs_lock=False)
--> 106         array = getitem(original_array, key)
    107 except IndexError:
    108     # Catch IndexError in netCDF4 and return a more informative
    109     # error message.  This is most often called when an unsorted
    110     # indexer is used before the data is loaded from disk.
    111     msg = (
    112         "The indexing operation you are attempting to perform "
    113         "is not valid on netCDF4.Variable object. Try loading "
    114         "your data into memory first by calling .load()."
    115     )

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xarray/backends/common.py:64, in robust_getitem(array, key, catch, max_retries, initial_delay)
     62 for n in range(max_retries + 1):
     63     try:
---> 64         return array[key]
     65     except catch:
     66         if n == max_retries:

File src/netCDF4/_netCDF4.pyx:4406, in netCDF4._netCDF4.Variable.__getitem__()

File src/netCDF4/_netCDF4.pyx:5350, in netCDF4._netCDF4.Variable._get()

File src/netCDF4/_netCDF4.pyx:1927, in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: Authorization failure

@pochedls
Copy link
Collaborator

I don't think this issue is related to the unit conversion or xcdat: If you open the dataset with xarray (i.e., ds = xr.open_dataset(filepath2).load()) you get the same problem. You may need a password to open this dataset.

@tomvothecoder
Copy link
Collaborator

I am in a motion of experimenting loading daily dataset but getting below error, @tomvothecoder any chance this look familiar to you?

I don't think this issue is related to the unit conversion or xcdat: If you open the dataset with xarray (i.e., ds = xr.open_dataset(filepath2).load()) you get the same problem. You may need a password to open this dataset.

When dataset file sizes are larger than the default OPeNDAP setting (usually 500mb), you have to chunk to avoid being rate limited.

# The size of this file is approximately 1.45 GB, so we will be chunking our
# request using Dask to avoid hitting the OPeNDAP file size request limit for
# this ESGF node.
ds2 = xcdat.open_dataset(
    "http://esgf-data3.diasjp.net/thredds/dodsC/esg_dataroot/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r1i1p1f1/day/tas/gn/v20191115/tas_day_ACCESS-ESM1-5_historical_r1i1p1f1_gn_18500101-18991231.nc",
    chunks={"time": "auto"},
)

# Unit adjust (-273.15, K to C)
ds2["tas"] = ds2.tas - 273.15

ds2

Here's an explanation in another xCDAT notebook (scroll down to 3.3 Month Averages because the headers are broken): https://xcdat.readthedocs.io/en/latest/examples/temporal-average.html#3.3-Monthly-Averages-(freq=

@tomvothecoder
Copy link
Collaborator

#277 should be explored to replace the referenced datasets in the demos, since OPeNDAP is introducing overhead with needing to chunk for larger datasets.

@lee1043 lee1043 changed the title [Doc]: Calculating Climatology and Departures from Monthly Time Series [Doc]: Calculating Daily Climatology and Departures from Monthly Time Series needs to use daily input Aug 17, 2022
@lee1043
Copy link
Collaborator

lee1043 commented Aug 17, 2022

@tomvothecoder @pochedls thanks for your comments. @tomvothecoder's above solution for adding chunks in the open_dataset solved the error.

After that step, I am getting another error that I suspect it is from leap year handling (see the last line of the error message, invalid day number provided in cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True))

@msahn could you share us (or point to your code in PMP) for how you handle leap year when getting the daily climatology in your precipitation metric work?

>> daily_climo = ds2.temporal.climatology("tas", freq="day", weighted=True)

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 daily_climo = ds2.temporal.climatology("tas", freq="day", weighted=True)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:477, in TemporalAccessor.climatology(self, data_var, freq, weighted, keep_weights, season_config)
    348 def climatology(
    349     self,
    350     data_var: str,
   (...)
    354     season_config: SeasonConfigInput = DEFAULT_SEASON_CONFIG,
    355 ):
    356     """Returns a Dataset with the climatology of a data variable.
    357 
    358     Parameters
   (...)
    475     }
    476     """
--> 477     return self._averager(
    478         data_var, "climatology", freq, weighted, keep_weights, season_config
    479     )

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:697, in TemporalAccessor._averager(self, data_var, mode, freq, weighted, keep_weights, season_config)
    695     dv = self._average(dv)
    696 elif self._mode in ["group_average", "climatology", "departures"]:
--> 697     dv = self._group_average(dv)
    699 # The original time dimension is dropped from the dataset because
    700 # it becomes obsolete after the data variable is averaged. When the
    701 # averaged data variable is added to the dataset, the new time dimension
    702 # and its associated coordinates are also added.
    703 ds = ds.drop_dims(self._dim)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:921, in TemporalAccessor._group_average(self, data_var)
    918 dv = data_var.copy()
    920 if self._weighted:
--> 921     self._weights = self._get_weights()
    922     dv *= self._weights
    923     dv = self._group_data(dv).sum()  # type: ignore

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:990, in TemporalAccessor._get_weights(self)
    987     time_lengths.load()
    988 time_lengths = time_lengths.astype(np.float64)
--> 990 grouped_time_lengths = self._group_data(time_lengths)
    991 weights: xr.DataArray = grouped_time_lengths / grouped_time_lengths.sum()  # type: ignore
    992 weights.name = f"{self._dim}_wts"

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:1022, in TemporalAccessor._group_data(self, data_var)
   1020     dv_gb = dv.groupby(f"{self._dim}.{self._freq}")
   1021 else:
-> 1022     self._labeled_time = self._label_time_coords(dv[self._dim])
   1023     dv.coords[self._labeled_time.name] = self._labeled_time
   1024     dv_gb = dv.groupby(self._labeled_time.name)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:1073, in TemporalAccessor._label_time_coords(self, time_coords)
   1029 """Labels time coordinates with a group for grouping.
   1030 
   1031 This methods labels time coordinates for grouping by first extracting
   (...)
   1070 >>> * time     (time) datetime64[ns] 2000-01-01T00:00:00 ... 2000-04-01T00:00:00
   1071 """
   1072 df_dt_components: pd.DataFrame = self._get_df_dt_components(time_coords)
-> 1073 dt_objects = self._convert_df_to_dt(df_dt_components)
   1075 time_grouped = xr.DataArray(
   1076     name="_".join(df_dt_components.columns),
   1077     data=dt_objects,
   (...)
   1080     attrs=time_coords[self._dim].attrs,
   1081 )
   1082 time_grouped.encoding = time_coords[self._dim].encoding

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:1360, in TemporalAccessor._convert_df_to_dt(self, df)
   1357     if component not in df_new.columns:
   1358         df_new[component] = default_val
-> 1360 dates = [
   1361     self.date_type(year, month, day, hour)
   1362     for year, month, day, hour in zip(
   1363         df_new.year, df_new.month, df_new.day, df_new.hour
   1364     )
   1365 ]
   1367 return np.array(dates)

File /opt/anaconda3/envs/xcdat_dev_20220816/lib/python3.9/site-packages/xcdat-0.3.0-py3.9.egg/xcdat/temporal.py:1361, in <listcomp>(.0)
   1357     if component not in df_new.columns:
   1358         df_new[component] = default_val
   1360 dates = [
-> 1361     self.date_type(year, month, day, hour)
   1362     for year, month, day, hour in zip(
   1363         df_new.year, df_new.month, df_new.day, df_new.hour
   1364     )
   1365 ]
   1367 return np.array(dates)

File src/cftime/_cftime.pyx:2016, in cftime._cftime.DatetimeProlepticGregorian.__init__()

File src/cftime/_cftime.pyx:1145, in cftime._cftime.datetime.__init__()

File src/cftime/_cftime.pyx:1647, in cftime._cftime.assert_valid_date()

ValueError: invalid day number provided in cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True)

Follow up checking:

>> import cftime
>> cftime.DatetimeProlepticGregorian(0, 2, 29, 0, 0, 0, 0, has_year_zero=True)  # okay (leap year every 4th year starting 0th)
cftime.DatetimeProlepticGregorian(0, 2, 29, 0, 0, 0, 0, has_year_zero=True)
>> cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True)  # Error (because no leap year in 1th year)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [27], in <cell line: 1>()
----> 1 cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True)

File src/cftime/_cftime.pyx:2016, in cftime._cftime.DatetimeProlepticGregorian.__init__()

File src/cftime/_cftime.pyx:1145, in cftime._cftime.datetime.__init__()

File src/cftime/_cftime.pyx:1647, in cftime._cftime.assert_valid_date()

ValueError: invalid day number provided in cftime.DatetimeProlepticGregorian(1, 2, 29, 0, 0, 0, 0, has_year_zero=True)

@lee1043
Copy link
Collaborator

lee1043 commented Aug 17, 2022

@msahn I think I found your code in PMP that gets daily climatology. I think you are basically stretching non-leap year days to 366 and insert missing value at 2/29 so they can be excluded when calculating daily climatology, is that correct?

@lee1043
Copy link
Collaborator

lee1043 commented Aug 18, 2022

Following this, it looks like removing 2/29 of leap year prior to daily climatology calculation might be an easier way with xCDAT.

# Remove leap year prior to daily climatology
ds3 = ds2.sel(time=~((ds2.time.dt.month==2)&(ds2.time.dt.day==29)))
daily_climo = ds3.temporal.climatology("tas", freq="day", weighted=True)
daily_climo.tas.shape
(365, 145, 192)

I am not sure if this step needs to be included in the climatology function, or just having it noted in the document should be enough.

@msahn
Copy link
Author

msahn commented Aug 18, 2022

@msahn I think I found your code in PMP that gets daily climatology. I think you are basically stretching non-leap year days to 366 and insert missing value at 2/29 so they can be excluded when calculating daily climatology, is that correct?

Yes, that is correct. In this way, I could calculate 366-day climatology including 2/29.

@lee1043
Copy link
Collaborator

lee1043 commented Aug 31, 2022

Opened discussion at #332

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: docs Updates to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants