# NICAM model

## setup

All the dependencies are set up for use with [`pixi`](https://pixi.sh/latest/). You can thus use
```sh
pixi run -e nicam jupyter lab
```
from the project directory to run a local jupyterhub with all the dependencies.

On a jupyterhub instance, use
```sh
pixi workspace export -e nicam env.yml
conda env update -n <env> -f env.yml
```
or
```sh
pip install "xdggs @ git+https://github.com/keewis/xdggs.git@test
```

In [None]:
from distributed import Client

client = Client()
client

imports

In [None]:
import dask.array as da
import intake
import numpy as np
import xarray as xr
import xdggs  # noqa: F401

## open the data

open the hackathon intake catalog

In [None]:
cat = intake.open_catalog(
    "https://digital-earths-global-hackathon.github.io/catalog/catalog.yaml"
)["online"]
cat

open the model at the highest level, and derive the total wind speed from the wind vector

In [None]:
level = 15
ds = (
    cat.nicam_220m_test(zoom=level, chunks={"cell": "auto"})
    .to_dask()
    .squeeze()
    .assign(windspeed=lambda ds: np.hypot(ds["ss_u10m"], ds["ss_v10m"]))
)
ds

construct a (lazy) cell ids coordinate and create a lazy xdggs index (a healpix `"moc"` index, which is based on a multi-order coverage map instead of the standard eager pandas index)

In [None]:
decoded = ds.assign_coords(
    {
        "cell_ids": (
            "cell",
            da.arange(
                12 * 4**level, dtype="uint64", chunks=ds["windspeed"].data.chunks[0][0]
            ),
        )
    }
).dggs.decode(
    {"grid_name": "healpix", "level": level, "indexing_scheme": "nested"},
    index_kind="moc",
)
decoded

Manually subset the dataset, based on a bounding box:

```{note}

The healpix `"moc"` index does not support selecting with arrays, yet, only by a single slice. As such, we need to:

1. determine the initial slice for a coarse selection
2. replace the moc index with a pandas index
3. compute the exact subset
```

In [None]:
box = [125.8, 29, 151, 55]
polygon = np.array(
    [
        [box[0], box[1]],
        [box[2], box[1]],
        [box[2], box[3]],
        [box[0], box[3]],
    ]
)

In [None]:
import cdshealpix
from astropy.coordinates import Latitude, Longitude

In [None]:
lon = Longitude(polygon[:, 0], unit="degree")
lat = Latitude(polygon[:, 1], unit="degree")

In [None]:
cell_ids, _, _ = cdshealpix.nested.polygon_search(
    lon, lat, depth=decoded.dggs.grid_info.level, flat=True
)
cell_ids

convert to a slice, as the moc index does not support subsetting by an array right now:

In [None]:
slice_ = slice(cell_ids.min().item(), cell_ids.max().item())
slice_

In [None]:
subset = decoded.isel(cell=slice_)
subset

## downscale

For plotting with `explore`, we need to downscale the data to something we can keep in-memory.

compute (chunked) parent cells

The extra call to `chunk` can be dropped once `xdggs` supports dask.

In [None]:
parents = subset.dggs.zoom_to(level=12)
parents

In [None]:
parents_ = parents.chunk(subset.chunksizes)
parents_

Aggregate using a `groupby`-`mean` operation by parent cell ids

In [None]:
grid_info = subset.dggs.grid_info.to_dict() | {"level": 12}
downsampled = (
    subset.assign_coords(parents=parents_)
    .drop_indexes("cell_ids")
    .groupby(parents=xr.groupers.UniqueGrouper(labels=np.unique(parents)))
    .mean()
    .rename_vars({"parents": "cell_ids"})
    .rename_dims({"parents": "cells"})
    .dggs.decode(grid_info)
)
downsampled

## visualization

load the downscaled data into memory

In [None]:
computed = downsampled["windspeed"].compute().assign_latlon_coords()
computed

visualize using `explore`

In [None]:
computed.dggs.explore(alpha=0.8)