# Appending to an Icechunk Store with Virtual References

This notebook demonstrates how to append to an icechunk store.

Please ensure the correct dependencies are installed before starting.

In [1]:
# !pip install 'virtualizarr['icechunk','hdf']' ipykernel

In [2]:
!pip show icechunk

Name: icechunk
Version: 0.1.2
Summary: Icechunk Python
Home-page: https://github.com/earth-mover/icechunk
Author: Earthmover PBC
Author-email: Earthmover <info@earthmover.io>
License: Apache-2.0
Location: /opt/homebrew/envs/virtualizarr-tests/lib/python3.12/site-packages
Requires: zarr
Required-by: 


In [3]:
import warnings

import fsspec
import icechunk
import xarray as xr

from virtualizarr import open_virtual_dataset

warnings.filterwarnings("ignore", category=UserWarning)

# Before you start

Identify the dataset you will be using and create a list of files to generate a virtual icechunk datastore with.

In [4]:
fs = fsspec.filesystem("s3", anon=True)

oisst_files = fs.glob(
    "s3://noaa-cdr-sea-surface-temp-optimum-interpolation-pds/data/v2.1/avhrr/202408/oisst-avhrr-v02r01.*.nc"
)

oisst_files = sorted(["s3://" + f for f in oisst_files])

## Create virtual datasets with VirtualiZarr's `open_virtual_dataset`

In [5]:
so = dict(anon=True, default_fill_cache=False, default_cache_type="none")

virtual_datasets = [
    open_virtual_dataset(url, indexes={}, reader_options={"storage_options": so})
    for url in oisst_files[0:2]
]

In [6]:
virtual_ds = xr.concat(
    virtual_datasets,
    dim="time",
    coords="minimal",
    compat="override",
    combine_attrs="override",
)

In [7]:
virtual_ds

In [8]:
# Clean up the store if running this notebook multiple times.
#!rm -rf ./noaa-cdr-icechunk/

## Initialize the Icechunk Store

In [9]:
storage = icechunk.local_filesystem_storage("./noaa-cdr-icechunk")

config = icechunk.RepositoryConfig.default()

config.set_virtual_chunk_container(
    icechunk.VirtualChunkContainer("s3", "s3://", icechunk.s3_store(region="us-east-1"))
)

credentials = icechunk.containers_credentials(
    s3=icechunk.s3_credentials(anonymous=True)
)

repo = icechunk.Repository.open_or_create(storage, config, credentials)

session = repo.writable_session("main")

## Write the virtual datasets to the icechunk store and commit

In [10]:
virtual_ds.virtualize.to_icechunk(session.store)

In [11]:
session.commit("first 2 days of 202408 data")

'M9QAVC2ZG8MS9BVNGB80'

## Check your work!

In [12]:
ds = xr.open_zarr(session.store, consolidated=False, zarr_format=3)
ds

Unnamed: 0,Array,Chunk
Bytes,15.82 MiB,7.91 MiB
Shape,"(2, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.82 MiB 7.91 MiB Shape (2, 1, 720, 1440) (1, 1, 720, 1440) Dask graph 2 chunks in 2 graph layers Data type float64 numpy.ndarray",2  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,15.82 MiB,7.91 MiB
Shape,"(2, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.82 MiB,7.91 MiB
Shape,"(2, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.82 MiB 7.91 MiB Shape (2, 1, 720, 1440) (1, 1, 720, 1440) Dask graph 2 chunks in 2 graph layers Data type float64 numpy.ndarray",2  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,15.82 MiB,7.91 MiB
Shape,"(2, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.82 MiB,7.91 MiB
Shape,"(2, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.82 MiB 7.91 MiB Shape (2, 1, 720, 1440) (1, 1, 720, 1440) Dask graph 2 chunks in 2 graph layers Data type float64 numpy.ndarray",2  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,15.82 MiB,7.91 MiB
Shape,"(2, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,15.82 MiB,7.91 MiB
Shape,"(2, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 15.82 MiB 7.91 MiB Shape (2, 1, 720, 1440) (1, 1, 720, 1440) Dask graph 2 chunks in 2 graph layers Data type float64 numpy.ndarray",2  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,15.82 MiB,7.91 MiB
Shape,"(2, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


# Append

That was all nothing new! Basically a repeat of what is in the [icechunk docs](https://icechunk.io/icechunk-python/virtual/). Here we follow the same steps to create a virtual dataset, but we add an `append_dim` argument to the `to_icechunk` function.

In [13]:
virtual_datasets_a = [
    open_virtual_dataset(
        url, indexes={}, reader_options={"storage_options": {"anon": True}}
    )
    for url in oisst_files[2:4]
]

In [14]:
virtual_ds_a = xr.concat(
    virtual_datasets_a,
    dim="time",
    coords="minimal",
    compat="override",
    combine_attrs="override",
)

In [15]:
append_session = repo.writable_session("main")

In [16]:
virtual_ds_a.virtualize.to_icechunk(append_session.store, append_dim="time")

In [17]:
append_session.commit("wrote 2 more days of data")

'3MEW3ECB74ZYANAZZHT0'

# Check that it worked!

In [18]:
read_session = repo.readonly_session(branch="main")

In [19]:
ds = xr.open_zarr(read_session.store, consolidated=False, zarr_format=3)
ds

Unnamed: 0,Array,Chunk
Bytes,31.64 MiB,7.91 MiB
Shape,"(4, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 31.64 MiB 7.91 MiB Shape (4, 1, 720, 1440) (1, 1, 720, 1440) Dask graph 4 chunks in 2 graph layers Data type float64 numpy.ndarray",4  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,31.64 MiB,7.91 MiB
Shape,"(4, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,31.64 MiB,7.91 MiB
Shape,"(4, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 31.64 MiB 7.91 MiB Shape (4, 1, 720, 1440) (1, 1, 720, 1440) Dask graph 4 chunks in 2 graph layers Data type float64 numpy.ndarray",4  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,31.64 MiB,7.91 MiB
Shape,"(4, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,31.64 MiB,7.91 MiB
Shape,"(4, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 31.64 MiB 7.91 MiB Shape (4, 1, 720, 1440) (1, 1, 720, 1440) Dask graph 4 chunks in 2 graph layers Data type float64 numpy.ndarray",4  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,31.64 MiB,7.91 MiB
Shape,"(4, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,31.64 MiB,7.91 MiB
Shape,"(4, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 31.64 MiB 7.91 MiB Shape (4, 1, 720, 1440) (1, 1, 720, 1440) Dask graph 4 chunks in 2 graph layers Data type float64 numpy.ndarray",4  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,31.64 MiB,7.91 MiB
Shape,"(4, 1, 720, 1440)","(1, 1, 720, 1440)"
Dask graph,4 chunks in 2 graph layers,4 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
