# Appending to an Icechunk Store with Virtual References

This notebook demonstrates how to append to an icechunk store.

Please ensure the correct dependencies are installed before starting.

In [1]:
# !pip install -e ".[icechunk]"
# !pip install git+https://github.com/mpiannucci/kerchunk@v3
# !pip install fsspec s3fs

In [2]:
!pip show icechunk

Name: icechunk
Version: 0.1.0a7
Summary: Transactional storage engine for Zarr designed for use on cloud object storage
Home-page: https://github.com/earth-mover/icechunk
Author: Earthmover PBC
Author-email: 
License: Apache-2.0
Location: /Users/aimeebarciauskas/github/virtualizarr/venv/lib/python3.12/site-packages
Requires: zarr
Required-by: 


In [3]:
import warnings

import fsspec
import xarray as xr
from icechunk import IcechunkStore, StorageConfig, StoreConfig, VirtualRefConfig

from virtualizarr import open_virtual_dataset

warnings.filterwarnings("ignore", category=UserWarning)

# Before you start

Identify the dataset you will be using and create a list of files to generate a virtual icechunk datastore with.

In [4]:
fs = fsspec.filesystem("s3", anon=True)

oisst_files = fs.glob(
    "s3://noaa-cdr-sea-surface-temp-optimum-interpolation-pds/data/v2.1/avhrr/202408/oisst-avhrr-v02r01.*.nc"
)

oisst_files = sorted(["s3://" + f for f in oisst_files])

## Create virtual datasets with VirtualiZarr's `open_virtual_dataset`

In [5]:
so = dict(anon=True, default_fill_cache=False, default_cache_type="none")

virtual_datasets = [
    open_virtual_dataset(url, indexes={}, reader_options={"storage_options": so})
    for url in oisst_files[0:2]
]

In [6]:
virtual_ds = xr.concat(
    virtual_datasets,
    dim="time",
    coords="minimal",
    compat="override",
    combine_attrs="override",
)

In [7]:
# Clean up the store if running this notebook multiple times.
#!rm -rf ./noaa-cdr-icechunk/

## Initialize the Icechunk Store

In [8]:
storage_config = StorageConfig.filesystem("./noaa-cdr-icechunk")
virtual_ref_store_config = StoreConfig(
    virtual_ref_config=VirtualRefConfig.s3_anonymous(region="us-east-1"),
)

In [9]:
store = IcechunkStore.create(
    storage=storage_config, config=virtual_ref_store_config, read_only=False
)

## Write the virtual datasets to the icechunk store and commit

In [10]:
virtual_ds.virtualize.to_icechunk(store)

In [11]:
store.commit("first 2 days of 202408 data")

'R1BP6057NW5A1ZANMBDG'

## Check your work!

In [12]:
ds = xr.open_zarr(store, consolidated=False, zarr_format=3)
ds

# Append

That was all nothing new! Basically a repeat of what is in the [icechunk docs](https://icechunk.io/icechunk-python/virtual/). Here we follow the same steps to create a virtual dataset, but we add an `append_dim` argument to the `to_icechunk` function.

In [13]:
virtual_datasets_a = [
    open_virtual_dataset(
        url, indexes={}, reader_options={"storage_options": {"anon": True}}
    )
    for url in oisst_files[2:4]
]

In [14]:
virtual_ds_a = xr.concat(
    virtual_datasets_a,
    dim="time",
    coords="minimal",
    compat="override",
    combine_attrs="override",
)

In [15]:
append_store = IcechunkStore.open_existing(
    storage=storage_config, config=virtual_ref_store_config, read_only=False
)

In [16]:
virtual_ds_a.virtualize.to_icechunk(append_store, append_dim="time")

In [17]:
append_store.commit("wrote 2 more days of data")

'0HE5RZ869HTG8RZESHCG'

# Check that it worked!

In [18]:
read_store = IcechunkStore.open_existing(
    storage=storage_config, config=virtual_ref_store_config, read_only=True
)

In [19]:
ds = xr.open_zarr(read_store, consolidated=False, zarr_format=3)
ds