# Image Data Storage for the Web

## Learning objectives

- Become familiar with the design of modern, **cloud storage systems**
- Gain experience with the **zarr** and **n5 formats**
- Understand the relationship between **chunked, compressed**, object storage and **parallel processing and multi-scale visualization**

*See also*: [I2K 2020 Tutorial: Zarr, N5, NGFF, Towards a community standard image file format for sharing big image data in the cloud](https://www.janelia.org/sites/default/files/You%20%2B%20Janelia/Conferences/19.pdf)

# Cloud storage

**Cloud storage services**, such as:

- Amazon Simple Storage Service (AWS S3)
- Google Cloud Storage
- Microsoft Azure Storage
- Minio Cloud Storage

**differ from traditional filesystem storage**.

In *File Storage*:

- Data is organized into files and folders.
- There is generally a pool of storage, e.g. a volume, with limited capacity that can be accessed.
- Data can be overwritten.
- Limited metadata is associated with the file.

In cloud, *Object Storage* systems:

- Objects, binary blobs, live in a flat structure.
- Object have a unique identifier and associated metadata, typically JSON-compatible
- Access is possible via simple HTTP requests
- Object's cannot be modified
- There are not structural limits to scaling

## Zarr and n5 formats

[Zarr](https://zarr-developers.github.io/about/) and [n5](https://github.com/saalfeldlab/n5/) are file formats with reference implementatinos that map well to cloud Object Storage services. They are also suitable for storage of large bioimages.

Together zarr and n5 are implementations of the [Next-generation File Format (NGFF)](https://ngff.openmicroscopy.org/latest/), which is *a hierarchy of n-dimensional (dense) arrays with metadata*.

Zarr and n5 support:

- Group hierarchies
- Arbitrary JSON-compatible meta-data
- Chunked, n-dimensional binary tensor datasets
- Binary component types: [u]int8, [u]int16, [u]int32, [u]int64, float32, float64
- Next-generation lossless compression with [blosc](https://blosc.org/pages/blosc-in-depth/) of binary chunks.

When combined with a **multi-scale image model** such as [OME-Zarr](https://blog.openmicroscopy.org/file-formats/community/2020/11/04/zarr-data/), **large image visualization** is possible.

The object storage-compatible model facilitates **parallel processing** because it is conducive to **compressed chunk writes**, even in a cloud storage environment.

## Exercises

In [None]:
# Get metadata on an image
!ome_zarr info https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/

https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/ [zgroup]
 - metadata
   - Multiscales
   - OMERO
 - data
   - (1, 2, 236, 275, 271)
   - (1, 2, 236, 137, 135)


*Does the entire dataset need to be downloaded to examine its metadata?*

In [None]:
# Download an image dataset
!ome_zarr download https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/ --output image.zarr

downloading...
   .
to image.zarr
[########################################] | 100% Completed |  4.9s
[########################################] | 100% Completed |  7.6s


*Examine the contents of the filesystem representation of the OME-Zarr multi-scale image. What information is stored in each file?

In [None]:
%ls -a image.zarr

[0m[01;34m.[0m/  [01;34m..[0m/  [01;34m0[0m/  [01;34m1[0m/  .zattrs  .zgroup


In [None]:
%cat image.zarr/.zattrs

{"multiscales": [{"datasets": [{"path": "0"}, {"path": "1"}], "version": "0.1"}], "omero": {"channels": [{"active": true, "coefficient": 1.0, "color": "0000FF", "family": "linear", "inverted": false, "label": "LaminB1", "window": {"end": 1500.0, "max": 65535.0, "min": 0.0, "start": 0.0}}, {"active": true, "coefficient": 1.0, "color": "FFFF00", "family": "linear", "inverted": false, "label": "Dapi", "window": {"end": 1500.0, "max": 65535.0, "min": 0.0, "start": 0.0}}], "id": 6001240, "name": "B1_C1.tif", "rdefs": {"defaultT": 0, "defaultZ": 118, "model": "color"}, "version": "0.1"}}

In [None]:
%cat image.zarr/.zgroup

{"zarr_format": 2}

In [None]:
%ls -a image.zarr/0

[0m[01;34m.[0m/           0.0.169.0.0  0.0.29.0.0  0.1.10.0.0   0.1.172.0.0  0.1.32.0.0
[01;34m..[0m/          0.0.17.0.0   0.0.3.0.0   0.1.100.0.0  0.1.173.0.0  0.1.33.0.0
0.0.0.0.0    0.0.170.0.0  0.0.30.0.0  0.1.101.0.0  0.1.174.0.0  0.1.34.0.0
0.0.1.0.0    0.0.171.0.0  0.0.31.0.0  0.1.102.0.0  0.1.175.0.0  0.1.35.0.0
0.0.10.0.0   0.0.172.0.0  0.0.32.0.0  0.1.103.0.0  0.1.176.0.0  0.1.36.0.0
0.0.100.0.0  0.0.173.0.0  0.0.33.0.0  0.1.104.0.0  0.1.177.0.0  0.1.37.0.0
0.0.101.0.0  0.0.174.0.0  0.0.34.0.0  0.1.105.0.0  0.1.178.0.0  0.1.38.0.0
0.0.102.0.0  0.0.175.0.0  0.0.35.0.0  0.1.106.0.0  0.1.179.0.0  0.1.39.0.0
0.0.103.0.0  0.0.176.0.0  0.0.36.0.0  0.1.107.0.0  0.1.18.0.0   0.1.4.0.0
0.0.104.0.0  0.0.177.0.0  0.0.37.0.0  0.1.108.0.0  0.1.180.0.0  0.1.40.0.0
0.0.105.0.0  0.0.178.0.0  0.0.38.0.0  0.1.109.0.0  0.1.181.0.0  0.1.41.0.0
0.0.106.0.0  0.0.179.0.0  0.0.39.0.0  0.1.11.0.0   0.1.182.0.0  0.1.42.0.0
0.0.107.0.0  0.0.18.0.0   0.0.4.0.0   0.1.110.0.0  0.1.183.0.

In [None]:
%cat image.zarr/0/.zarray

{
    "chunks": [
        1,
        1,
        1,
        275,
        271
    ],
    "compressor": {
        "blocksize": 0,
        "clevel": 5,
        "cname": "lz4",
        "id": "blosc",
        "shuffle": 1
    },
    "dtype": ">u2",
    "fill_value": 0,
    "filters": null,
    "order": "C",
    "shape": [
        1,
        2,
        236,
        275,
        271
    ],
    "zarr_format": 2
}

In [None]:
import zarr
group = zarr.open('image.zarr')
group

<zarr.hierarchy.Group '/'>

In [None]:
group.attrs.keys()

dict_keys(['multiscales', 'omero'])

In [None]:
group.attrs['multiscales']

[{'datasets': [{'path': '0'}, {'path': '1'}], 'version': '0.1'}]

In [None]:
list(group.keys())

['0', '1']

In [None]:
scale0 = group['0']

In [None]:
scale0

<zarr.core.Array '/0' (1, 2, 236, 275, 271) >u2>

In [None]:
import numpy as np
np.asarray(scale0)

array([[[[[ 8,  9,  8, ...,  9,  9, 10],
          [ 9,  9,  9, ...,  8,  9,  9],
          [ 8,  8,  8, ..., 26, 40,  8],
          ...,
          [ 9,  9,  9, ...,  9, 10, 14],
          [ 8,  9, 10, ...,  9, 10,  9],
          [ 9,  8, 10, ..., 10,  8,  8]],

         [[ 9,  9,  9, ...,  8, 11, 11],
          [ 9,  8,  9, ..., 10,  9, 10],
          [ 9, 16,  9, ..., 39, 30,  9],
          ...,
          [10,  9, 10, ..., 10, 10,  9],
          [10,  8, 10, ..., 10, 10, 10],
          [10, 11,  9, ...,  9, 10, 10]],

         [[ 9,  9,  9, ..., 14,  7, 15],
          [ 9,  9,  9, ..., 10,  9,  9],
          [ 8,  9,  9, ...,  9, 67,  8],
          ...,
          [ 8,  9,  9, ...,  9, 19,  9],
          [ 8,  9,  8, ...,  7,  9, 10],
          [ 7,  9,  9, ...,  9,  9, 10]],

         ...,

         [[ 8,  9, 57, ...,  9,  9,  8],
          [ 8,  9,  8, ...,  7,  8,  9],
          [21,  9,  9, ...,  8,  9,  7],
          ...,
          [ 9,  9,  8, ...,  7,  8,  9],
          [14,  9