# Image Data Storage for the Web

## Learning objectives

- Become familiar with the design of modern, **cloud storage systems**
- Gain experience with the **zarr** and **n5 formats**
- Understand the relationship between **chunked, compressed**, object storage and **parallel processing and multi-scale visualization**

*See also*: [I2K 2020 Tutorial: Zarr, N5, NGFF, Towards a community standard image file format for sharing big image data in the cloud](https://www.janelia.org/sites/default/files/You%20%2B%20Janelia/Conferences/19.pdf)

# Cloud storage

**Cloud storage services**, such as:

- Amazon Simple Storage Service (AWS S3)
- Google Cloud Storage
- Microsoft Azure Storage
- Minio Cloud Storage

**differ from traditional filesystem storage**.

In *File Storage*:

- Data is organized into files and folders.
- There is generally a pool of storage, e.g. a volume, with limited capacity that can be accessed.
- Data can be overwritten.
- Limited metadata is associated with the file.

In cloud, *Object Storage* systems:

- Objects, binary blobs, live in a flat structure.
- Object have a unique identifier and associated metadata, typically JSON-compatible
- Access is possible via simple HTTP requests
- Object's cannot be modified
- There are not structural limits to scaling

## Zarr and n5 formats

[Zarr](https://zarr-developers.github.io/about/) and [n5](https://github.com/saalfeldlab/n5/) are file formats with reference implementations that map well to cloud Object Storage services. They are also suitable for storage of large bioimages.

Together zarr and n5 are implementations of the [Next-generation File Format (NGFF)](https://ngff.openmicroscopy.org/latest/), which is *a hierarchy of n-dimensional (dense) arrays with metadata*.

Zarr and n5 support:

- Group hierarchies
- Arbitrary JSON-compatible meta-data
- Chunked, n-dimensional binary tensor datasets
- Binary component types: [u]int8, [u]int16, [u]int32, [u]int64, float32, float64
- Next-generation lossless compression with [blosc](https://blosc.org/pages/blosc-in-depth/) of binary chunks.

When combined with a **multi-scale image model** such as [OME-Zarr](https://blog.openmicroscopy.org/file-formats/community/2020/11/04/zarr-data/), **large image visualization** is possible.

The object storage-compatible model facilitates **parallel processing** because it is conducive to **compressed chunk writes**, even in a cloud storage environment.

In [26]:
# Load the OME-Zarr image array
import dask.array as da

arr = da.from_zarr('https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr', '0')
arr

Unnamed: 0,Array,Chunk
Bytes,67.09 MiB,145.56 kiB
Shape,"(1, 2, 236, 275, 271)","(1, 1, 1, 275, 271)"
Count,473 Tasks,472 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 67.09 MiB 145.56 kiB Shape (1, 2, 236, 275, 271) (1, 1, 1, 275, 271) Count 473 Tasks 472 Chunks Type uint16 numpy.ndarray",2  1  271  275  236,

Unnamed: 0,Array,Chunk
Bytes,67.09 MiB,145.56 kiB
Shape,"(1, 2, 236, 275, 271)","(1, 1, 1, 275, 271)"
Count,473 Tasks,472 Chunks
Type,uint16,numpy.ndarray


In [27]:
from itkwidgets import view

vol = arr[0,1,:,:,:]
view(vol)

Viewer(geometries=[], gradient_opacity=0.22, point_sets=[], rendered_image=<itk.itkImagePython.itkImageUS3; pr…

### Zarr Resources

- [Zarr documentation](https://zarr.readthedocs.io/en/stable/)
- [NGFF](https://ngff.openmicroscopy.org/latest/)
- [image.sc Discourse Discussion](https://forum.image.sc/)
- [I2K 2020 Tutorial: Zarr, N5, NGFF, Towards a community standard image file format for sharing big image data in the cloud](https://www.janelia.org/sites/default/files/You%20%2B%20Janelia/Conferences/19.pdf)