This notebook is part of the $\omega radlib$ documentation: https://docs.wradlib.org.

Copyright (c) $\omega radlib$ developers.
Distributed under the MIT License. See LICENSE.txt for more info.

# Multi File OdimH5 reader

<div class="alert alert-warning">

**Note** <br>

The following functionality is deprecated. Please use the [xarray backend loaders](fileio/wradlib_xarray_backends.ipynb) instead.
    
</div>

This implementation is based on several classes which are described below.

## Class Overview

### XRadBase

Implements `collections.abc.MutableSequence` for holding sequential data in the derived classes (eg. sweeps, timeseries, moments).

### OdimH5GroupAttributeMixin

Implements properties for `XRadMoment`, `XRadSweep`, `XRadTimeSeries` and `XRadVolume` to nicely acquire ODIM group metadata, eg. `how`, `what` and `where` groups. Other wanted attributes can be acquired via `attrs`-property and other (sub-) groups be inspected via `groups`-property.

### OdimH5SweepMetaDataMixin

Implements properties for `XRadSweep` to nicely acquire ODIM sweep metadata, eg. `a1gate`, `azimuth`, `nrays`, `nbins` etc.

### XRadMoment

Uses `OdimH5GroupAttributeMixin` to access ODIM metadata. Does not hold any data. Property `data` fetches the moment as xarray DataArray from the parent `XRadSweep`. 

### XRadSweep

Inherits from `XRadBase`, uses `OdimH5GroupAttributeMixin` and `OdimH5SweepMetaDataMixin`. Worker class, where everything happens. Implements methods and properties to retrieve sweep metadata and data. Holds `XRadMoments` in it's `MutableSequence`. Property `data` is used to load and cache the `XRadMoments` as combined xarray Dataset. Implements a whole arsenal of other properties to inspect metadata.

#### XRadSweepOdim:

Inherits from `XRadSweep`. Accounts for ODIM data layout.

#### XRadSweepGamic:

Inherits from `XRadSweep`. Accounts for GAMIC data layout.

### XRadTimeSeries

Inherits from `XRadBase`, holds several `XRadSweep` objects in it's `MutableSequence`. Property `data` is used to concat and cache the `XRadSweeps` as xarray Dataset along time dimension. Implements check methods to quickly get information about layout of timeseries data. 

### XRadVolume

Inherits from `XRadBase`, holds several `XRadTimeSeries` objects in it's `MutableSequence`. Implements CfRadial2 like `root` property.

## Loading Function

For opening ODIMH5 datafiles `wrl.io.open_odim(filename, loader='h5py', **kwargs)` can be used.

The user can decide which loader to use (`h5py`, `h5netcdf` or `netcdf4`) to open the files for reading. The output should be the same in any case, although the memory footprint can differ quite substantially. The default loader is `netcdf4` if loader isn't specified.

The datasets are retrieved in further succession via `xarray.open_dataset()` in combination with either `xarray.backends.H5NetCDFStore` (for loader `h5py` and `h5netcdf`) or `xarray.backends.NetCDF4DataStore` (for loader `netcdf4`.

Possible keyword arguments are:

- `mask_and_scale` *bool* - If True, apply mask and scale to moment data arrays
- `decode_coords` *bool* - If True, decode ODIMH5 coordinates
- `decode_times` *bool* - If True, decode times into datetime objects
- `chunks` *int or dict* - Data loaded as dask array
- `parallel` *bool* - if True, use `dask.delayed` to load moments in parallel

The user is encouraged to play with the keyword arguments for best alignment with the needs in terms of speed and performance.

In [None]:
import wradlib as wrl
import warnings

# warnings.filterwarnings('ignore')
import matplotlib.pyplot as pl
import numpy as np
import xarray as xr
import os
import glob

try:
    get_ipython().run_line_magic("matplotlib inline")
except:
    pl.ion()

In [None]:
import os
import psutil
import gc

process = psutil.Process(os.getpid())

In [None]:
def memory_usage_psutil():
    # return the memory usage in MB
    rocess = psutil.Process(os.getpid())
    mem = process.memory_full_info().uss / float(1 << 20)
    return mem

In [None]:
def free_memory():
    mem0 = memory_usage_psutil()
    print(gc.collect())
    proc = psutil.Process()
    mem1 = memory_usage_psutil()
    print("Memory freed: {0:.5f} MB".format((mem0 - mem1)))

In [None]:
def check_open_files(full=False):
    proc = psutil.Process()
    print(len(proc.open_files()))
    if full:
        print(proc.open_files())

In [None]:
%%capture
flist = [
    "hdf5/behel/20200207130000.rad.behel.pvol.dbzh.scanz.hdf",
    "hdf5/behel/20200207130000.rad.behel.pvol.rhohv.scanz.hdf",
    "hdf5/behel/20200207130000.rad.behel.pvol.vrad.scanz.hdf",
    "hdf5/behel/20200207130000.rad.behel.pvol.wrad.scanz.hdf",
    "hdf5/behel/20200207130500.rad.behel.pvol.dbzh.scanz.hdf",
    "hdf5/behel/20200207130500.rad.behel.pvol.rhohv.scanz.hdf",
    "hdf5/behel/20200207130500.rad.behel.pvol.vrad.scanz.hdf",
    "hdf5/behel/20200207130500.rad.behel.pvol.wrad.scanz.hdf",
    "hdf5/behel/20200207131000.rad.behel.pvol.dbzh.scanz.hdf",
    "hdf5/behel/20200207131000.rad.behel.pvol.rhohv.scanz.hdf",
    "hdf5/behel/20200207131000.rad.behel.pvol.vrad.scanz.hdf",
    "hdf5/behel/20200207131000.rad.behel.pvol.wrad.scanz.hdf",
    "hdf5/behel/20200207131500.rad.behel.pvol.dbzh.scanz.hdf",
    "hdf5/behel/20200207131500.rad.behel.pvol.rhohv.scanz.hdf",
    "hdf5/behel/20200207131500.rad.behel.pvol.vrad.scanz.hdf",
    "hdf5/behel/20200207131500.rad.behel.pvol.wrad.scanz.hdf",
    "hdf5/behel/20200207132000.rad.behel.pvol.dbzh.scanz.hdf",
    "hdf5/behel/20200207132000.rad.behel.pvol.rhohv.scanz.hdf",
    "hdf5/behel/20200207132000.rad.behel.pvol.vrad.scanz.hdf",
    "hdf5/behel/20200207132000.rad.behel.pvol.wrad.scanz.hdf",
    "hdf5/behel/20200207132500.rad.behel.pvol.dbzh.scanz.hdf",
    "hdf5/behel/20200207132500.rad.behel.pvol.rhohv.scanz.hdf",
    "hdf5/behel/20200207132500.rad.behel.pvol.vrad.scanz.hdf",
    "hdf5/behel/20200207132500.rad.behel.pvol.wrad.scanz.hdf",
    "hdf5/behel/20200207133000.rad.behel.pvol.dbzh.scanz.hdf",
    "hdf5/behel/20200207133000.rad.behel.pvol.rhohv.scanz.hdf",
    "hdf5/behel/20200207133000.rad.behel.pvol.vrad.scanz.hdf",
    "hdf5/behel/20200207133000.rad.behel.pvol.wrad.scanz.hdf",
    "hdf5/behel/20200207133500.rad.behel.pvol.dbzh.scanz.hdf",
    "hdf5/behel/20200207133500.rad.behel.pvol.rhohv.scanz.hdf",
    "hdf5/behel/20200207133500.rad.behel.pvol.vrad.scanz.hdf",
    "hdf5/behel/20200207133500.rad.behel.pvol.wrad.scanz.hdf",
]
f = [wrl.util.get_wradlib_data_file(f) for f in flist]

In [None]:
mem_start = memory_usage_psutil()
print("Current Memory:", mem_start)

## Check open files

In [None]:
check_open_files()

## Claim Files into class structure

The different files will be opened with `h5netcdf`, `h5py` or `netcdf4` depending on `loader` keyword. Only absolutely neccessary metadata (time, elevation) is read from the files to be used for aligning into the structure.

Normally `h5py` is the most performant loader for `ODIM` data. But your mileage may vary.

This means that every file is opened once and the filehandle is distributed to `XRadSweep`. If `XRadSweep` will be destroyed, the memory will be ready for garbage collection.

Under the hood `netcdf4` or `h5netcdf` will be used to open data as `xarray.Dataset` depending on the loader-type. All filehandling issues are moved to xarray. No memory holes, no need to track filehandles.

In [None]:
%%time
vol = wrl.io.xarray_depr.open_odim(f, loader="h5py", chunks={})

## Check open files

In [None]:
check_open_files()

## Overview type and lenght

In [None]:
print("Volume:", type(vol), len(vol))
print("TimeSeries:", type(vol[0]), len(vol[0]))
print("Sweep:", type(vol[0][0]), len(vol[0][0]))
print("Moment:", type(vol[0][0][0]), vol[0][0][0].quantity)

## Overview Contents (__repr__())

When printing the objects, they tell us a little about themselves and the data they can get access to. 

### Volume

Here we see, that it is of type `wradlib.XRadVolume`. It holds 12 sweeps with the shown elevations.

In [None]:
print(vol)

### TimeSeries

Here we see, that it is of type `wradlib.XRadTimeseries`. It holds 8 timesteps with a data layout of 360 azimuths by 800 range bins. The elevation is 25.0 deg.

In [None]:
print(vol[0])

### Sweep

Here we see, that it is of type `wradlib.XRadSweepOdim`, which means it is leaded from ODIMH5 standard data. It holds data with layout of 360 azimuths by 800 range bins. The elevation is 25.0 deg. It consists of the radar moments `DBZH`, `RHOHV`, `VRAD` and `WRAD`.

In [None]:
print(vol[0][0])

### Moment

Here we see, that it is of type `wradlib.XRadMoment`. It holds data with layout of 360 azimuths by 800 range bins. The elevation is 25.0 deg. It is the radar moment `DBZH`.

In [None]:
print(vol[0][0][0])

## Accessing metadata via `OdimH5GroupAttributeMixin`


You can access underlying metadata for every object. The properties `ncpath`, `ncid`, `ncfile` and `filename` give information about the location of the metadata. Properties `groups` and `attrs` give information about attached subgroups and attributes. `how`, `what` and `where` return the contents of the respective ODIMH5-subgroups if available.

As long as the objects are not deleted the according files are open and the handles can be used to retrieve data from it.

### Volume

The `OdimH5GroupAttributeMixin` access in `XRadVolume` will retrieve the root-metadata of the first file of the first timeseries, which is the first volume file in most cases.

In [None]:
print("path:", vol.ncpath)
print("  id:", vol.ncid)
print("file:", vol.ncfile)
print("name:", vol.filename)

In [None]:
print(vol.groups)

In [None]:
print(vol.attrs)

In [None]:
print(vol.how)

In [None]:
print(vol.what)

In [None]:
print(vol.where)

### Timeseries

The `OdimH5GroupAttributeMixin` access in `XRadTimeseries` will retrieve the group-metadata of the first sweep of the selected timeseries.

In [None]:
ts = vol[0]
print("path:", ts.ncpath)
print("  id:", ts.ncid)
print("file:", ts.ncfile)
print("name:", ts.filename)

In [None]:
print(ts.groups)

In [None]:
print(ts.attrs)

In [None]:
print(ts.how)

In [None]:
print(ts.what)

In [None]:
print(ts.where)

### Sweep

The `OdimH5GroupAttributeMixin` access in `XRadSweep` will retrieve the group-metadata of the selected sweep.

In [None]:
swp = vol[0][5]
print("path:", swp.ncpath)
print("  id:", swp.ncid)
print("file:", swp.ncfile)
print("name:", swp.filename)

In [None]:
print(swp.groups)

In [None]:
print(swp.attrs)

In [None]:
print(swp.how)

In [None]:
print(swp.what)

In [None]:
print(swp.where)

### Moment

The `OdimH5GroupAttributeMixin` access in `XRadMoment` will retrieve the group-metadata of the selected moment.

In [None]:
mom = vol[0][0][0]
print("path:", mom.ncpath)
print("  id:", mom.ncid)
print("file:", mom.ncfile)
print("name:", mom.filename)

In [None]:
print(mom.groups)

In [None]:
print(mom.attrs)

In [None]:
print(mom.what)

## CfRadial2 style root object

The XRadVolume object is equipped with a CfRadial2-style `root`-object, where some information can be retrieved. 

In [None]:
vol.root

## Get hold of data using xarray

- The outer class instance `XRadVolume` does not contain a `.data`-property because the volume cannot be represented using xarray. 
- `XRadTimeseries` `.data` works on the sweep level, it can contain one or multiple consecutive sweeps.
    It will be created on the fly from the `XRadSweep` `.data` xarray.Dataset objects via concatenation.
- `XRadSweep` `.data` is one single sweep containing multiple radar moments. It is created **and** cached when first accessed.
- `XRadMoment` `.data` is one single moment as xarray DataArray, which is claimed from the parent `XRadSweep`

### Moment

In [None]:
%%time
print("First Access")
mem0 = memory_usage_psutil()
print(vol[-2][0][0].data)
mem1 = memory_usage_psutil()
print("Memory: {} - {}".format(mem0, mem1))
print("Memory added: {0:.5f} MB".format((mem1 - mem0)))

In [None]:
%%time
print("Second Access")
mem0 = memory_usage_psutil()
print(vol[-2][0][0].data)
mem1 = memory_usage_psutil()
print("Memory: {} - {}".format(mem0, mem1))
print("Memory added: {0:.5f} MB".format((mem1 - mem0)))

### Sweep

In [None]:
%%time
print("First Access")
mem0 = memory_usage_psutil()
print(vol[-1][0].data)
mem1 = memory_usage_psutil()
print("Memory: {} - {}".format(mem0, mem1))
print("Memory added: {0:.5f} MB".format((mem1 - mem0)))

In [None]:
%%time
print("Second Access")
mem0 = memory_usage_psutil()
print(vol[-1][0].data)
mem1 = memory_usage_psutil()
print("Memory: {} - {}".format(mem0, mem1))
print("Memory added: {0:.5f} MB".format((mem1 - mem0)))

### TimeSeries

In [None]:
%%time
print("First Access")
mem0 = memory_usage_psutil()
print(vol[-1].data)
mem1 = memory_usage_psutil()
print("Memory: {} - {}".format(mem0, mem1))
print("Memory added: {0:.5f} MB".format((mem1 - mem0)))

In [None]:
%%time
print("Second Access")
mem0 = memory_usage_psutil()
print(vol[-1].data)
mem1 = memory_usage_psutil()
print("Memory: {} - {}".format(mem0, mem1))
print("Memory added: {0:.5f} MB".format((mem1 - mem0)))

## Plot Data

### Plot Single Sweep

In [None]:
vol[-1].data.pipe(wrl.georef.georeference_dataset).DBZH[0].wradlib.plot()

### Plot same single sweep from Timeseries

In [None]:
vol[-1].data.DBZH[0].plot()

## Exporting Data

Data can be exported to ODIMH5, CfRadial2 and NetCDF4.

### ODIMH5

ODIMH5 can only handle one volume not timeseries. So we have to select the timestep which we want to export.

The example shows, how to output the volume to a ODIMH5-file, read it back and check for equality.

In [None]:
vol.to_odim("test_odim.h5", timestep=5)

In [None]:
vol1 = wrl.io.open_odim("test_odim.h5")

In [None]:
print(vol[0][5])

In [None]:
print(vol1[0][0])

In [None]:
xr.testing.assert_equal(vol[0][5].data, vol1[0][0].data)

### CfRadial2

CfRadial2 can only handle one volume not timeseries. So we have to select the timestep which we want to export.

The example shows, how to output the volume to a CfRadial2-file and read it back. For there is currently no fitting counterpart to `open_odim` for reading CfRadial2 volumes we resort to `wradlib.io.CfRadial` reader and compare the underlying numpy arrays. As CfRadial2 data is sorted by time, we have to sort it by azimuth first.

In [None]:
vol.to_cfradial2("test_cfradial2.nc", timestep=5)

In [None]:
vol2 = wrl.io.CfRadial("test_cfradial2.nc", dim0="azimuth")

In [None]:
np.testing.assert_equal(
    vol[0][5].data.DBZH.values, vol2["sweep_0"].sortby("azimuth").DBZH.values
)

### NetCDF4

Using this, the complete volume/timeseries is exported to a NetCDF4 file.

The example shows, how to output the volume to such NetCDF4-file and read it back. For there is currently no fitting counterpart to `open_odim` for reading these NetCDF4 volumes we resort to `xarray.open_dataset` reader.

In [None]:
vol.to_netcdf("test_netcdf.nc", timestep=slice(None, None))

In [None]:
vol3 = xr.open_dataset("test_netcdf.nc", group="sweep_0")

In [None]:
display(vol3)

In [None]:
xr.testing.assert_equal(vol[0][5].data, vol3.isel(time=5))

## Delete object

In [None]:
del mom
del swp
del ts
del vol
del vol1
del vol2
del vol3

## Run Garbage Collection

In [None]:
free_memory()

## Check Memory

In [None]:
mem_end = memory_usage_psutil()
print("Memory: {} - {}".format(mem_start, mem_end))
print("Memory still in use: {0:.5f} MB".format((mem_end - mem_start)))

## Check Open files

No open data files!

In [None]:
check_open_files(True)