ENH: add iris/sigmet xarray backend #520

kmuehlbauer · 2021-09-05T10:30:56Z

This PR covers #361.

only RAW data files for now
open_iris_dataset(filename), open_iris_mfdataset(filename) - reads sweeps into RadarVolume
xr.open_dataset(filename, engine="iris", group=1), xr.open_mfdataset(filename, engine="iris", group=1) reads sweep (group) into xr.Dataset

Please refer to function docstring for further explanations.

Update: Fix group numbers, they are int.

kmuehlbauer · 2021-09-05T10:51:47Z

Pinging some folks who might be interested in testing this branch. Any comments, suggestions and other reports are very much appreciated. This will need quite some more work before merge.

Todo:

fix and add tests

@mustafaalicelik @jorahu @aschueth @tsmsalper @ckaradavut

jorahu · 2021-09-06T14:31:19Z

Hello Kai!
Thanks for the invite for testing the xarray backend.

I tried to test the functions you provided in the first post.
wrl.io.iris.open_iris_dataset(filename) worked fine and returned the RadarVolume, as expected. Although the RadarVolume did not include most of the metadata present in the file. I guess the full metadata import is still to be developed in the RadarVolume?
wrl.io.iris_open_iris_mfdataset(filename) failed and produced error:
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in

~/wradlib-iris_backend/wradlib-iris_backend/wradlib/io/iris.py in open_iris_mfdataset(filename_or_obj, group, **kwargs)
4188 raise_on_missing_xarray_backend()
4189 kwargs["group"] = group
-> 4190 return open_radar_mfdataset(filename_or_obj, engine="iris", **kwargs)

~/wradlib-iris_backend/wradlib-iris_backend/wradlib/io/xarray.py in open_radar_mfdataset(paths, **kwargs)
1940 group = kwargs.pop("group", None)
1941 if group is None:
-> 1942 group = _get_h5group_names(patharr.flat[0], engine)
1943 elif isinstance(group, str):
1944 group = [group]

~/wradlib-iris_backend/wradlib-iris_backend/wradlib/io/xarray.py in _get_h5group_names(filename, engine)
1552 groupname = "sweep"
1553 else:
-> 1554 raise ValueError(f"wradlib: unknown engine {engine}.")
1555 with h5netcdf.File(filename, "r", decode_vlen_strings=True) as fh:
1556 groups = ["/".join(["", grp]) for grp in fh.groups if groupname in grp.lower()]

ValueError: wradlib: unknown engine iris.`

The second options kind of confused me. Which xarray am I supposed to use here? The normal xarray, or wrl.io.xarray?
I could not use the original xarray here, so I went to wrl.io.xarray and used the open_radar_dataset() (not open_dataset()), because wrl.io.xarray did not include open_dataset() function. So I guess the open_radar_dataset() is the same as the previous open_iris_dataset()? Anyway, it also worked as before (wrl.io.xarray.open_radar_dataset(filename, endine='iris')). Adding the "group" argument failed with KeyError: '1'. But as I suspect, I am probably not testing the correct things here.
The wrl.io.xarray.open_radar_mfdataset(filename, endine='iris') failed with same error as wrl.io.iris_open_iris_mfdataset(filename) (as expected because it is calling the same function).

Hope this helps. You can maybe comment, what I am doing wrong with calling the second options (the xarray options), or why I can't find them.

kmuehlbauer · 2021-09-06T19:50:50Z

@jorahu Thanks for trying. I'm sorry, I should have made some examples showing how to use it. So I've created a gist with some explanations. Happy testing!

codecov · 2021-09-07T09:54:10Z

Codecov Report

Merging #520 (1c7ec6a) into main (1d88cf8) will decrease coverage by 0.97%.
The diff coverage is 88.60%.

@@            Coverage Diff             @@
##             main     #520      +/-   ##
==========================================
- Coverage   89.43%   88.46%   -0.98%     
==========================================
  Files          36       36              
  Lines        8592     8874     +282     
==========================================
+ Hits         7684     7850     +166     
- Misses        908     1024     +116

Flag	Coverage Δ
notebooktests	`0.00% <0.00%> (ø)`
unittests	`88.46% <88.60%> (-0.98%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
wradlib/io/hdf.py	`96.59% <ø> (ø)`
wradlib/io/netcdf.py	`83.49% <ø> (ø)`
wradlib/io/radolan.py	`96.58% <ø> (ø)`
wradlib/io/xarray.py	`88.15% <76.66%> (-0.18%)`	⬇️
wradlib/io/backends.py	`88.88% <89.43%> (+0.09%)`	⬆️
wradlib/io/iris.py	`75.39% <89.69%> (+2.14%)`	⬆️
wradlib/io/dem.py	`17.28% <0.00%> (-69.14%)`	⬇️
wradlib/georef/raster.py	`88.55% <0.00%> (-8.96%)`	⬇️
wradlib/vis.py	`92.96% <0.00%> (-1.70%)`	⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d88cf8...1c7ec6a. Read the comment docs.

kmuehlbauer · 2021-09-07T09:59:26Z

@jorahu I've fixed some issues and the tests. Should work a bit faster too, since the coordinate data (azimuth, elevation etc.) Is only decoded once per sweep.

kmuehlbauer · 2021-09-24T09:48:46Z

This is now ready for merge. It might still have some rough edges though.

jorahu · 2021-09-24T13:12:38Z

Sorry for the delayed answer. I was out of office for some time.

Thanks for the detailed notebook, it worked well. Except for some files, I was unable to read them for some reason.
We save our files sweep-by-sweep, so every file only includes one elevation data. Mostly both reading volume (without the 'group' kw) and sweep files (with 'group' kw) worked well, but for some files reading only with 'group' kw worked. The first option gave following error:

TypeError: The DType <class 'numpy.dtype[void]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is object. The full list of DTypes is: (<class 'numpy.dtype[void]'>, <class 'numpy.dtype[float64]'>)

Got the same error while trying to read RHI files. Other options seemed to work without issues.

But am I missing something about the overall metadata? RAW files include a myriad of metadata, but could not find it from the xarray datasets (the RadarVolume objects). Is is just not there (yet?) or I just can't find them?

kmuehlbauer · 2021-09-24T13:26:19Z

@jorahu Thanks for testing. Are you able to share some of the problematic files? Regarding metadata, it's not there atm. Let's see what WMO/CfRadial2 will bring us: https://community.wmo.int/wmo-jet-owr-seminar-series-weather-radar-data-exchange

jorahu · 2021-09-24T13:47:38Z

Sent you 2 files by email.

kmuehlbauer · 2021-09-24T15:52:14Z

@jorahu Those files have been very helpful. OK the issue is with that for the ODIM/GAMIC reader I've introduced a magic reindexing of the major angle (azimuth or elevation) to overcome issues with jittering values in subsequent timesteps.

Now, the problem is that the used default value of tolerance (0.4) does not work for all datasets. So I invented another kwarg for parametrizing.

vol = wrl.io.open_iris_dataset(filename, reindex_angle=False)

This will not reindex the angle and obtain the data as is. If you give a floating point value, the angle will be reindexed with some assumptions (resolution, number of rays etc.) with a nearest neighbour search. In some cases this search doesn't find a value within the tolerance and fills the voids with NaN. This fires the error you are seeing, because the time array is filled with NaT at some places.

With your data it doesn't work at all with any given tolerance number, so there might be another issue/bug in the processing. I'll dig further.

kmuehlbauer · 2021-09-24T16:14:47Z

So, digging brought up some issues with the azimuth stop angles.
plt.plot(data["data"][1]["sweep_data"]["DB_DBT2"]["azi_stop"])

It looks like the angle has added 360deg to this two outliers. This looks like a bit which erroneously is set in these two points for the azi_stop values.

kmuehlbauer · 2021-09-24T16:18:33Z

I take that back, maybe it is introduced by this part of the code

https://github.com/kmuehlbauer/wradlib/blob/9ea90392b97e9984d461d93e1e4cb4ae884fc2ff/wradlib/io/iris.py#L3564-L3568

            # calculate beam center values
            zero_index = np.where(azi_stop < azi_start)
            azi_stop[zero_index[0]] += 360
            az = (azi_start + azi_stop) / 2.0
            az[az >= 360] -= 360

kmuehlbauer · 2021-09-24T16:34:13Z

Root cause is finally some inconsistencies in the azimuth values of both azi_start n (left) and azi_stop (right):

array([[80.61218262, 81.52404785],
       [82.65563965, 82.11730957],
       [82.58972168, 83.51257324],
       [83.33679199, 85.11657715],
       [85.61096191, 86.51733398],
       [86.59973145, 87.52258301],
       [87.52258301, 88.50036621],
       [88.50036621, 89.50012207],
       [89.59899902, 90.51086426],
       [90.5657959 , 91.49963379]])

This will not be an easy task to work around this. Maybe I have to apply the reindexing to both arrays separately before merging,,,

jorahu · 2021-09-25T11:05:22Z

Iterestingly, I am getting different results. Could it depend on the wradlib version? (There was a bit different data access way in the xarray testing wradlib version.) But the 82th angle is still weird (stop angle lower than start angle).

kmuehlbauer · 2021-09-25T11:19:29Z

@jorahu I changed the reader implementation a bit, but the output should be the same.

kmuehlbauer · 2021-09-25T11:40:16Z

Ah, the "0" rays, I just drop them now as they do not contain any data. That should be fixed by reindexing anyway.

kmuehlbauer · 2021-09-28T10:34:23Z

@jorahu You might test the latest changes in this branch. I think we are approaching a usable state.

TypeError: The DType <class 'numpy.dtype[void]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is object. The full list of DTypes is: (<class 'numpy.dtype[void]'>, <class 'numpy.dtype[float64]'>)

This error was induced by DB_XHDR. This moment contains a structured array with dtime_ms millisecond resolution timestamps. Reindexing doesn't work for structured arrays in xarray, as they are not supported. If reindexing is required, DB_XHDR will be dropped from the dataset.

It is checked if DB_XHDR is available from the file and then the millisecond resolution raytimes are used instead of second resolution.

…et, nodata, undetect, fillvalue)

kmuehlbauer · 2021-09-29T08:15:19Z

Final step: splitted and recreated commits in reasonable order.

…lisecond resolution if available, fix `decode_time` to correctly decode milliseconds and timezone, minor fixes

kmuehlbauer · 2021-09-30T05:58:18Z

Another set of minor tweaks. Moved xarray Variable creation from io.iris to io.backends to remove any possible circular dependency. Cleaned up if-clause for IrisVariableWrapper.

@jorahu Would it be possible to get the permission to use the two files (PPI/RHI) as test data via wradlib-data repository? The XHDR stuff is undertested because of lack of test files containing that feature.

kmuehlbauer · 2021-09-30T06:30:00Z

@jorahu I'll merge this into main now, as we've approached a useful state. A example notebook will be added to https://github.com/wradlib/wradlib-notebooks soon. Please open an issue, if you find any flaws in the implementation.

Plan is to get this out with wradlib 1.12

jorahu · 2021-09-30T19:26:56Z

Sorry for the delay again. Yes, of course you can use the files. I can provide more, if needed (to test the mfdataset/timeseries reading for example).
I now tried the merged version and everything seemed to work. No crashes this time and even the RHI reading/plotting worked. One thing I noticed, if I tried reading files with different scan setups it did not give any warnings, just added the dimensions. Our reflectivity scan is (360, 833) and wind scan is (360, 620), so the resulting dataset shape became (360, 1453, XX) and thus the resulting visualisations did not work as expected. I don't know if it is possible, but maybe it should raise warning in these cases to notify the user that they try to read files with different shapes (?). Although, sweep-by-sweep scan strategy can have different data shape for every sweep, so this can also be annoying. But as much as I tried, with 20 consecutive files, the visualisation was totally chopped up in some cases. But I guess more careful filename filtering can easily avoid this problem.

kmuehlbauer · 2021-10-01T05:28:02Z

@jorahu I'll add the two files for testing to the wradlib-data repo, thanks!

Great to hear, that no immediate crashes have been observed. The problem with reading multiple files with different scan setups will need some testing. From your explanation I think this might not be an easy task, it might even be not possible at all. At least your proposed solution to only read similar files should work well.

For testing the multi-file reader it would be great to have a one or two hour dataset, PPI and RHI. It would be nice, if there would be some precipitation going on over that time. The idea would be to make a dedicated repository (iris-format) below wradlib github organisation and create a tutorial showing the insides of iris/sigmet format and how the reader works. Would you be interested to actively collaborate on such an endeavour?

kmuehlbauer force-pushed the iris_backend branch 2 times, most recently from c93f029 to 8189ca3 Compare September 5, 2021 10:35

kmuehlbauer marked this pull request as draft September 5, 2021 10:50

kmuehlbauer force-pushed the iris_backend branch from 86ce25d to 9ea9039 Compare September 24, 2021 09:30

kmuehlbauer marked this pull request as ready for review September 24, 2021 09:30

kmuehlbauer added 3 commits September 29, 2021 09:46

MIN: use reasonable default values in io.xarray.to_odim (gain, offs…

66ada7f

…et, nodata, undetect, fillvalue)

DOC: docstring updates (open_*dataset-functions)

cd2bf4f

MIN: rename _get_encoding -> _get_h5netcdf_encoding

f98353d

kmuehlbauer force-pushed the iris_backend branch from 361e915 to 2497f45 Compare September 29, 2021 08:13

kmuehlbauer force-pushed the iris_backend branch from 2497f45 to 138c31d Compare September 30, 2021 05:41

ENH: enhance IrisRawFile with lazy loading capabilities, use XHDR mil…

f6ca27d

…lisecond resolution if available, fix `decode_time` to correctly decode milliseconds and timezone, minor fixes

kmuehlbauer force-pushed the iris_backend branch from 138c31d to bc33444 Compare September 30, 2021 05:53

kmuehlbauer added 2 commits September 30, 2021 08:01

ENH: add iris/sigmet xarray backend

2fea9b2

FIX: use tolerance in degree, do not combine with resolution

1c7ec6a

kmuehlbauer force-pushed the iris_backend branch from bc33444 to 1c7ec6a Compare September 30, 2021 06:02

kmuehlbauer added this to In progress in xarray Sep 30, 2021

kmuehlbauer mentioned this pull request Sep 30, 2021

Sigmet Reader Todo List #180

Closed

14 tasks

kmuehlbauer added this to the 1.11 milestone Sep 30, 2021

kmuehlbauer modified the milestones: 1.11, 1.12 Sep 30, 2021

kmuehlbauer merged commit 588c612 into wradlib:main Sep 30, 2021

kmuehlbauer mentioned this pull request Sep 30, 2021

ENH: Implement xarray dataset for IRIS Sigmet data files #361

Closed

kmuehlbauer moved this from In progress to Done in xarray Nov 15, 2021

kmuehlbauer deleted the iris_backend branch November 17, 2021 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add iris/sigmet xarray backend #520

ENH: add iris/sigmet xarray backend #520

kmuehlbauer commented Sep 5, 2021 •

edited

kmuehlbauer commented Sep 5, 2021 •

edited

jorahu commented Sep 6, 2021 •

edited

kmuehlbauer commented Sep 6, 2021

codecov bot commented Sep 7, 2021 •

edited

kmuehlbauer commented Sep 7, 2021

kmuehlbauer commented Sep 24, 2021

jorahu commented Sep 24, 2021

kmuehlbauer commented Sep 24, 2021

jorahu commented Sep 24, 2021

kmuehlbauer commented Sep 24, 2021

kmuehlbauer commented Sep 24, 2021

kmuehlbauer commented Sep 24, 2021

kmuehlbauer commented Sep 24, 2021

jorahu commented Sep 25, 2021 •

edited

kmuehlbauer commented Sep 25, 2021

kmuehlbauer commented Sep 25, 2021

kmuehlbauer commented Sep 28, 2021

kmuehlbauer commented Sep 29, 2021

kmuehlbauer commented Sep 30, 2021

kmuehlbauer commented Sep 30, 2021

jorahu commented Sep 30, 2021

kmuehlbauer commented Oct 1, 2021

ENH: add iris/sigmet xarray backend #520

ENH: add iris/sigmet xarray backend #520

Conversation

kmuehlbauer commented Sep 5, 2021 • edited

kmuehlbauer commented Sep 5, 2021 • edited

jorahu commented Sep 6, 2021 • edited

kmuehlbauer commented Sep 6, 2021

codecov bot commented Sep 7, 2021 • edited

Codecov Report

kmuehlbauer commented Sep 7, 2021

kmuehlbauer commented Sep 24, 2021

jorahu commented Sep 24, 2021

kmuehlbauer commented Sep 24, 2021

jorahu commented Sep 24, 2021

kmuehlbauer commented Sep 24, 2021

kmuehlbauer commented Sep 24, 2021

kmuehlbauer commented Sep 24, 2021

kmuehlbauer commented Sep 24, 2021

jorahu commented Sep 25, 2021 • edited

kmuehlbauer commented Sep 25, 2021

kmuehlbauer commented Sep 25, 2021

kmuehlbauer commented Sep 28, 2021

kmuehlbauer commented Sep 29, 2021

kmuehlbauer commented Sep 30, 2021

kmuehlbauer commented Sep 30, 2021

jorahu commented Sep 30, 2021

kmuehlbauer commented Oct 1, 2021

kmuehlbauer commented Sep 5, 2021 •

edited

kmuehlbauer commented Sep 5, 2021 •

edited

jorahu commented Sep 6, 2021 •

edited

codecov bot commented Sep 7, 2021 •

edited

jorahu commented Sep 25, 2021 •

edited