Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add iris/sigmet xarray backend #520

Merged
merged 6 commits into from Sep 30, 2021

Conversation

kmuehlbauer
Copy link
Member

@kmuehlbauer kmuehlbauer commented Sep 5, 2021

This PR covers #361.

  • only RAW data files for now
  • open_iris_dataset(filename), open_iris_mfdataset(filename) - reads sweeps into RadarVolume
  • xr.open_dataset(filename, engine="iris", group=1), xr.open_mfdataset(filename, engine="iris", group=1) reads sweep (group) into xr.Dataset

Please refer to function docstring for further explanations.

Update: Fix group numbers, they are int.

@kmuehlbauer kmuehlbauer force-pushed the iris_backend branch 2 times, most recently from c93f029 to 8189ca3 Compare September 5, 2021 10:35
@kmuehlbauer kmuehlbauer marked this pull request as draft September 5, 2021 10:50
@kmuehlbauer
Copy link
Member Author

kmuehlbauer commented Sep 5, 2021

Pinging some folks who might be interested in testing this branch. Any comments, suggestions and other reports are very much appreciated. This will need quite some more work before merge.

Todo:

  • fix and add tests

@mustafaalicelik @jorahu @aschueth @tsmsalper @ckaradavut

@jorahu
Copy link
Contributor

jorahu commented Sep 6, 2021

Hello Kai!
Thanks for the invite for testing the xarray backend.

I tried to test the functions you provided in the first post.
wrl.io.iris.open_iris_dataset(filename) worked fine and returned the RadarVolume, as expected. Although the RadarVolume did not include most of the metadata present in the file. I guess the full metadata import is still to be developed in the RadarVolume?
wrl.io.iris_open_iris_mfdataset(filename) failed and produced error:
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in

~/wradlib-iris_backend/wradlib-iris_backend/wradlib/io/iris.py in open_iris_mfdataset(filename_or_obj, group, **kwargs)
4188 raise_on_missing_xarray_backend()
4189 kwargs["group"] = group
-> 4190 return open_radar_mfdataset(filename_or_obj, engine="iris", **kwargs)

~/wradlib-iris_backend/wradlib-iris_backend/wradlib/io/xarray.py in open_radar_mfdataset(paths, **kwargs)
1940 group = kwargs.pop("group", None)
1941 if group is None:
-> 1942 group = _get_h5group_names(patharr.flat[0], engine)
1943 elif isinstance(group, str):
1944 group = [group]

~/wradlib-iris_backend/wradlib-iris_backend/wradlib/io/xarray.py in _get_h5group_names(filename, engine)
1552 groupname = "sweep"
1553 else:
-> 1554 raise ValueError(f"wradlib: unknown engine {engine}.")
1555 with h5netcdf.File(filename, "r", decode_vlen_strings=True) as fh:
1556 groups = ["/".join(["", grp]) for grp in fh.groups if groupname in grp.lower()]

ValueError: wradlib: unknown engine iris.`

The second options kind of confused me. Which xarray am I supposed to use here? The normal xarray, or wrl.io.xarray?
I could not use the original xarray here, so I went to wrl.io.xarray and used the open_radar_dataset() (not open_dataset()), because wrl.io.xarray did not include open_dataset() function. So I guess the open_radar_dataset() is the same as the previous open_iris_dataset()? Anyway, it also worked as before (wrl.io.xarray.open_radar_dataset(filename, endine='iris')). Adding the "group" argument failed with KeyError: '1'. But as I suspect, I am probably not testing the correct things here.
The wrl.io.xarray.open_radar_mfdataset(filename, endine='iris') failed with same error as wrl.io.iris_open_iris_mfdataset(filename) (as expected because it is calling the same function).

Hope this helps. You can maybe comment, what I am doing wrong with calling the second options (the xarray options), or why I can't find them.

@kmuehlbauer
Copy link
Member Author

@jorahu Thanks for trying. I'm sorry, I should have made some examples showing how to use it. So I've created a gist with some explanations. Happy testing!

@codecov
Copy link

codecov bot commented Sep 7, 2021

Codecov Report

Merging #520 (1c7ec6a) into main (1d88cf8) will decrease coverage by 0.97%.
The diff coverage is 88.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #520      +/-   ##
==========================================
- Coverage   89.43%   88.46%   -0.98%     
==========================================
  Files          36       36              
  Lines        8592     8874     +282     
==========================================
+ Hits         7684     7850     +166     
- Misses        908     1024     +116     
Flag Coverage Δ
notebooktests 0.00% <0.00%> (ø)
unittests 88.46% <88.60%> (-0.98%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
wradlib/io/hdf.py 96.59% <ø> (ø)
wradlib/io/netcdf.py 83.49% <ø> (ø)
wradlib/io/radolan.py 96.58% <ø> (ø)
wradlib/io/xarray.py 88.15% <76.66%> (-0.18%) ⬇️
wradlib/io/backends.py 88.88% <89.43%> (+0.09%) ⬆️
wradlib/io/iris.py 75.39% <89.69%> (+2.14%) ⬆️
wradlib/io/dem.py 17.28% <0.00%> (-69.14%) ⬇️
wradlib/georef/raster.py 88.55% <0.00%> (-8.96%) ⬇️
wradlib/vis.py 92.96% <0.00%> (-1.70%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d88cf8...1c7ec6a. Read the comment docs.

@kmuehlbauer
Copy link
Member Author

@jorahu I've fixed some issues and the tests. Should work a bit faster too, since the coordinate data (azimuth, elevation etc.) Is only decoded once per sweep.

@kmuehlbauer kmuehlbauer marked this pull request as ready for review September 24, 2021 09:30
@kmuehlbauer
Copy link
Member Author

This is now ready for merge. It might still have some rough edges though.

@jorahu
Copy link
Contributor

jorahu commented Sep 24, 2021

Sorry for the delayed answer. I was out of office for some time.

Thanks for the detailed notebook, it worked well. Except for some files, I was unable to read them for some reason.
We save our files sweep-by-sweep, so every file only includes one elevation data. Mostly both reading volume (without the 'group' kw) and sweep files (with 'group' kw) worked well, but for some files reading only with 'group' kw worked. The first option gave following error:

TypeError: The DType <class 'numpy.dtype[void]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is object. The full list of DTypes is: (<class 'numpy.dtype[void]'>, <class 'numpy.dtype[float64]'>)

Got the same error while trying to read RHI files. Other options seemed to work without issues.

But am I missing something about the overall metadata? RAW files include a myriad of metadata, but could not find it from the xarray datasets (the RadarVolume objects). Is is just not there (yet?) or I just can't find them?

@kmuehlbauer
Copy link
Member Author

@jorahu Thanks for testing. Are you able to share some of the problematic files? Regarding metadata, it's not there atm. Let's see what WMO/CfRadial2 will bring us: https://community.wmo.int/wmo-jet-owr-seminar-series-weather-radar-data-exchange

@jorahu
Copy link
Contributor

jorahu commented Sep 24, 2021

Sent you 2 files by email.

@kmuehlbauer
Copy link
Member Author

@jorahu Those files have been very helpful. OK the issue is with that for the ODIM/GAMIC reader I've introduced a magic reindexing of the major angle (azimuth or elevation) to overcome issues with jittering values in subsequent timesteps.

Now, the problem is that the used default value of tolerance (0.4) does not work for all datasets. So I invented another kwarg for parametrizing.

vol = wrl.io.open_iris_dataset(filename, reindex_angle=False)

This will not reindex the angle and obtain the data as is. If you give a floating point value, the angle will be reindexed with some assumptions (resolution, number of rays etc.) with a nearest neighbour search. In some cases this search doesn't find a value within the tolerance and fills the voids with NaN. This fires the error you are seeing, because the time array is filled with NaT at some places.

With your data it doesn't work at all with any given tolerance number, so there might be another issue/bug in the processing. I'll dig further.

@kmuehlbauer
Copy link
Member Author

So, digging brought up some issues with the azimuth stop angles.
plt.plot(data["data"][1]["sweep_data"]["DB_DBT2"]["azi_stop"])
image

It looks like the angle has added 360deg to this two outliers. This looks like a bit which erroneously is set in these two points for the azi_stop values.

@kmuehlbauer
Copy link
Member Author

I take that back, maybe it is introduced by this part of the code

https://github.com/kmuehlbauer/wradlib/blob/9ea90392b97e9984d461d93e1e4cb4ae884fc2ff/wradlib/io/iris.py#L3564-L3568

            # calculate beam center values
            zero_index = np.where(azi_stop < azi_start)
            azi_stop[zero_index[0]] += 360
            az = (azi_start + azi_stop) / 2.0
            az[az >= 360] -= 360

@kmuehlbauer
Copy link
Member Author

Root cause is finally some inconsistencies in the azimuth values of both azi_start n (left) and azi_stop (right):

array([[80.61218262, 81.52404785],
       [82.65563965, 82.11730957],
       [82.58972168, 83.51257324],
       [83.33679199, 85.11657715],
       [85.61096191, 86.51733398],
       [86.59973145, 87.52258301],
       [87.52258301, 88.50036621],
       [88.50036621, 89.50012207],
       [89.59899902, 90.51086426],
       [90.5657959 , 91.49963379]])

This will not be an easy task to work around this. Maybe I have to apply the reindexing to both arrays separately before merging,,,

@jorahu
Copy link
Contributor

jorahu commented Sep 25, 2021

Iterestingly, I am getting different results. Could it depend on the wradlib version? (There was a bit different data access way in the xarray testing wradlib version.) But the 82th angle is still weird (stop angle lower than start angle).
image

@kmuehlbauer
Copy link
Member Author

@jorahu I changed the reader implementation a bit, but the output should be the same.

@kmuehlbauer
Copy link
Member Author

Ah, the "0" rays, I just drop them now as they do not contain any data. That should be fixed by reindexing anyway.

@kmuehlbauer
Copy link
Member Author

@jorahu You might test the latest changes in this branch. I think we are approaching a usable state.

TypeError: The DType <class 'numpy.dtype[void]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is object. The full list of DTypes is: (<class 'numpy.dtype[void]'>, <class 'numpy.dtype[float64]'>)

This error was induced by DB_XHDR. This moment contains a structured array with dtime_ms millisecond resolution timestamps. Reindexing doesn't work for structured arrays in xarray, as they are not supported. If reindexing is required, DB_XHDR will be dropped from the dataset.

It is checked if DB_XHDR is available from the file and then the millisecond resolution raytimes are used instead of second resolution.

@kmuehlbauer
Copy link
Member Author

Final step: splitted and recreated commits in reasonable order.

…lisecond resolution if available, fix `decode_time` to correctly decode milliseconds and timezone, minor fixes
@kmuehlbauer
Copy link
Member Author

Another set of minor tweaks. Moved xarray Variable creation from io.iris to io.backends to remove any possible circular dependency. Cleaned up if-clause for IrisVariableWrapper.

@jorahu Would it be possible to get the permission to use the two files (PPI/RHI) as test data via wradlib-data repository? The XHDR stuff is undertested because of lack of test files containing that feature.

@kmuehlbauer kmuehlbauer added this to In progress in xarray Sep 30, 2021
@kmuehlbauer kmuehlbauer mentioned this pull request Sep 30, 2021
14 tasks
@kmuehlbauer kmuehlbauer added this to the 1.11 milestone Sep 30, 2021
@kmuehlbauer
Copy link
Member Author

@jorahu I'll merge this into main now, as we've approached a useful state. A example notebook will be added to https://github.com/wradlib/wradlib-notebooks soon. Please open an issue, if you find any flaws in the implementation.

Plan is to get this out with wradlib 1.12

@kmuehlbauer kmuehlbauer modified the milestones: 1.11, 1.12 Sep 30, 2021
@kmuehlbauer kmuehlbauer merged commit 588c612 into wradlib:main Sep 30, 2021
@jorahu
Copy link
Contributor

jorahu commented Sep 30, 2021

Sorry for the delay again. Yes, of course you can use the files. I can provide more, if needed (to test the mfdataset/timeseries reading for example).
I now tried the merged version and everything seemed to work. No crashes this time and even the RHI reading/plotting worked. One thing I noticed, if I tried reading files with different scan setups it did not give any warnings, just added the dimensions. Our reflectivity scan is (360, 833) and wind scan is (360, 620), so the resulting dataset shape became (360, 1453, XX) and thus the resulting visualisations did not work as expected. I don't know if it is possible, but maybe it should raise warning in these cases to notify the user that they try to read files with different shapes (?). Although, sweep-by-sweep scan strategy can have different data shape for every sweep, so this can also be annoying. But as much as I tried, with 20 consecutive files, the visualisation was totally chopped up in some cases. But I guess more careful filename filtering can easily avoid this problem.

@kmuehlbauer
Copy link
Member Author

@jorahu I'll add the two files for testing to the wradlib-data repo, thanks!

Great to hear, that no immediate crashes have been observed. The problem with reading multiple files with different scan setups will need some testing. From your explanation I think this might not be an easy task, it might even be not possible at all. At least your proposed solution to only read similar files should work well.

For testing the multi-file reader it would be great to have a one or two hour dataset, PPI and RHI. It would be nice, if there would be some precipitation going on over that time. The idea would be to make a dedicated repository (iris-format) below wradlib github organisation and create a tutorial showing the insides of iris/sigmet format and how the reader works. Would you be interested to actively collaborate on such an endeavour?

@kmuehlbauer kmuehlbauer moved this from In progress to Done in xarray Nov 15, 2021
@kmuehlbauer kmuehlbauer deleted the iris_backend branch November 17, 2021 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
xarray
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants