# Converting MetObs Toolkit data to xarray Datasets

This notebook demonstrates the `to_xr()` methods of the `Station` and `Dataset` classes from the `metobs_toolkit` package using the built-in demo dataset.

We show:
1. Loading the demo dataset.
2. Converting a single station to an `xarray.Dataset`.
3. Inspecting the structure (dimensions, variables, attributes).
4. Converting the full multi-station dataset to xarray.
5. Exploring selections (e.g. picking observation values vs. labels).


## What is xarray?

[xarray](https://xarray.dev) is a Python library that brings the labeled data concepts of pandas to N-dimensional arrays (NetCDF-style). It enables:
- Named dimensions (e.g. `datetime`, `kind`, `name`)
- Coordinate-based indexing and selection
- Rich metadata via attributes
- Easy export to formats like NetCDF / Zarr / GRIB (with plugins)

It is especially useful for structured time series, gridded data, or any multi-dimensional scientific data.

In [2]:
# Imports
import metobs_toolkit
import xarray as xr

In [3]:
# 1. Load the demo dataset into a Dataset object
dataset = metobs_toolkit.Dataset()
dataset.import_data_from_file(
    template_file=metobs_toolkit.demo_template,
    input_metadata_file=metobs_toolkit.demo_metadatafile,
    input_data_file=metobs_toolkit.demo_datafile,
)

print(f"Number of stations: {len(dataset.stations)}")
print("First 5 station names:", [s.name for s in dataset.stations[:5]])

Luchtdruk is present in the datafile, but not found in the template! This column will be ignored.
Neerslagintensiteit is present in the datafile, but not found in the template! This column will be ignored.
Neerslagsom is present in the datafile, but not found in the template! This column will be ignored.
Rukwind is present in the datafile, but not found in the template! This column will be ignored.
Luchtdruk_Zeeniveau is present in the datafile, but not found in the template! This column will be ignored.
Globe Temperatuur is present in the datafile, but not found in the template! This column will be ignored.
The following columns are present in the data file, but not in the template! They are skipped!
 ['Luchtdruk_Zeeniveau', 'Rukwind', 'Neerslagsom', 'Globe Temperatuur', 'Luchtdruk', 'Neerslagintensiteit']
The following columns are found in the metadata, but not in the template and are therefore ignored: 
['benaming', 'sponsor', 'Network', 'stad']


Number of stations: 28
First 5 station names: ['vlinder01', 'vlinder02', 'vlinder03', 'vlinder04', 'vlinder05']


In [4]:
# 2. Pick one station (e.g. 'vlinder05') and run a simple QC check to add labels
station = dataset.get_station('vlinder05')
station.repetitions_check(max_N_repetitions=200)


  groups.get_group(


In [6]:
# 3. Convert the single station to an xarray Dataset
ds_station = station.to_xr()

ds_station


### Structure of the station-level Dataset

For each observed variable (e.g. `temp`, `humidity`, etc.) a DataArray is created with:
- Dimension `kind`: separates 'obs' (values) and 'label' (QC and gap-fill labels)
- Dimension `datetime`: corresponding timestamp

Attributes on each variable include:
- `obstype_name`, `obstype_desc`, `obstype_unit`
- `QC`: dictionary of applied quality control checks
- `GF`: dictionary of applied gap-fill

In [7]:
# 4. Inspect one variable (e.g. temperature)
ds_station['temp']


In [10]:
# 5. Inspect the QC labels (kind='label')
labels = ds_station['temp'].sel(kind='label')
labels

In [11]:
#or the observations
records = ds_station['temp'].sel(kind='obs')
records

## Converting the full Dataset

We can also use `to_xr()` on a ``Dataset`` object. Doing so, an extra dimension `name` is added in the `xarray.Dataset`.

In [12]:
# 6. Convert the entire collection of stations
ds_all = dataset.to_xr()

ds_all

In [13]:
# 7. Selecting a single station from the multi-station Dataset
ds_one = ds_all.sel(name='vlinder05')
ds_one['temp']


### Dimension summary (multi-station)

- `name`: name of the station
- `kind`: sub-type of the data (e.g. 'obs', 'label', possibly 'model' if model time series added)
- `datetime`: consolidated time axis (union across stations)

If model time series (e.g. ERA5) are imported, an additional internal dimension (e.g. `models`) appears inside the model DataArrays (stacked under `kind='model'`).