# Intro to zarr

---
## 0. What is zarr? 

**zarr is a file format for storing big data.
It divides your big data  into 'chunks' sized files and store each chunks in individual files located in your zarr archive (a simple directory).
The 'zarr' archive also stores dimensions, coordinates, and, attributes, e.g. all sorts of meta data.**

Long story short for netcdf users: it cuts your netcdf file in small pieces and store them in a directory which ease parallelism. 
Without using MPI-IO, you can write/read your zarr file in parallel.
For netcdf file instead, even if you use hdf5/parallel format, if you do not read/write your file from MPI job, you cannot read/write your file in parallel. Netcdf is not thread safe.

**AP: try to simplify the last sentence maybe**

For oceanographic numerical modelers: if you chose to read/write data as zarr (not in netcdf) online, for each mpi domain that you compute with a mpi process, it just writes your 'zarr' chunk, that may speed up overall computation time by winning IO access...

**AP: idem, try to simplify the last sentence maybe**

Take away tips for users that are on Lustre filesystem (**AP: need to ask admin, or available command ?**):
As zarr is composed of many small files, do not forget to change the striping as 1 for the directory you use, before you starts to store your zarr file.
 `mkdir dir_for_zarr`
 `lfs setstripe -c 1 dir_for_zarr `
Then, save your zarr file in 'dir_for_zarr'

Link to zarr documentation https://zarr.readthedocs.io

---

## 1.  Set up your python environment 
call python environments to use xarray and Dask, then create a Dask cluster (as explained in notebook [1 DASK, with HPC cluster](https://github.com/tinaok/Pangeo-for-beginners/blob/master/1%20DASK%2C%20with%20HPC%20cluster%20(PBS%20Pro).ipynb))

In [1]:
import dask
import xarray as xr

In [2]:
from dask_jobqueue import PBSCluster
cluster = PBSCluster(cores=6,memory='30 gb', walltime='1:00:00')
w = cluster.scale(10)

In [4]:
from dask.distributed import Client
client=Client(cluster)
client

0,1
Client  Scheduler: tcp://10.120.43.58:59577  Dashboard: http://10.120.43.58:8787/status,Cluster  Workers: 10  Cores: 60  Memory: 300.00 GB


---
## 2. Read a zarr file, as xarray data set.

In [7]:
filename='/work/ALT/swot/swotpub/LLC4320/zarr/SST.zarr'
ds =xr.open_zarr(filename)

In [6]:
print(ds)
print('\n data size: %.1f GB' %(ds.nbytes / 1e9))

<xarray.Dataset>
Dimensions:  (face: 13, i: 4320, j: 4320, time: 8785)
Coordinates:
    dtime    (time) datetime64[ns] dask.array<shape=(8785,), chunksize=(8785,)>
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i        (i) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
    iters    (time) int64 dask.array<shape=(8785,), chunksize=(1,)>
  * j        (j) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * time     (time) float64 5.702e+06 5.706e+06 5.71e+06 ... 3.732e+07 3.732e+07
Data variables:
    SST      (time, face, j, i) float32 dask.array<shape=(8785, 13, 4320, 4320), chunksize=(1, 1, 4320, 4320)>

 data size: 8525.4 GB


Above, you see that the zarr archive contains a dataset of 'SST' (sea surface temperature) that has several dimensions: face, i, j , and, time.
- dtime is a time cordinate.
- i and j dimensions correspond approximately to longitudes and latitudes (XC and YC), 
- face corresponds to one of 13 patch of earth surface. 

---

'chunksize=(1, 1, 4320, 4320)' indicates that size of each chunks SST is divided into. 
This is like a mpi domain decomposition.
When needed,  Dask workers will load some of these chunks in memory and make computations.  
With zarr, the zarr archive directory contains as many files as there are chunks.

If you use parallel file system, with huge data size (**AP: you need numbers here, >1TB for example**), one should have less meta-data access.
Thus you better put enough size of data in each chunk, otherwise you'll just kill the meta-data server.  
But, if you want to put your data in the cash of disk to have fast read-wrtite of your data, this chunk size should be smaller than the cash size, so that controller considers that these are the 'cachable' small files. (like GPFS...)  
In anycase, before you 

**AP: last paragraph need rewriting and clarification**

---
## 3. Look into a zarr archive.

The zarr archive `SST.zarr` is a directory. 
It contains directories that corresponds to 'coordinates' (dtime, face, i iters, j, time), and data variables (SST).  

In [11]:
!ls -a /work/ALT/swot/swotpub/LLC4320/zarr/SST.zarr

.  ..  dtime  face  i  iters  j  SST  time  .zattrs  .zgroup


The data variable directory 'SST' contains 114205 file, because 8785 (time) * 13(face) =114205, and each file corresponds to the data in each 'chunk'.

In [17]:
!ls -1 /work/ALT/swot/swotpub/LLC4320/zarr/SST.zarr/SST/ |wc -l

114205


In [24]:
!ls -1 /work/ALT/swot/swotpub/LLC4320/zarr/SST.zarr/SST/ |head

0.0.0.0
0.1.0.0
0.10.0.0
0.11.0.0
0.12.0.0
0.2.0.0
0.3.0.0
0.4.0.0
0.5.0.0
0.6.0.0
ls: write error: Broken pipe


For example, 0.1.0.0 contains, SST data for time = 0, face=1, and  i= 0-4319, and j= 0- 4319 

In [22]:
!ls -a /work/ALT/swot/swotpub/LLC4320/zarr/SST.zarr/dtime

.  ..  0  .zarray  .zattrs


In [23]:
!cat /work/ALT/swot/swotpub/LLC4320/zarr/SST.zarr/dtime/.zarray

{
    "chunks": [
        8785
    ],
    "compressor": {
        "blocksize": 0,
        "clevel": 5,
        "cname": "lz4",
        "id": "blosc",
        "shuffle": 1
    },
    "dtype": "<i8",
    "fill_value": null,
    "filters": null,
    "order": "C",
    "shape": [
        8785
    ],
    "zarr_format": 2
}

You can look into the `.zarray` file to see the zarr files encodings. 
You can also access to encoding with the following commands:

In [6]:
ds.SST.encoding

{'chunks': (1, 1, 4320, 4320),
 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0),
 'filters': None,
 '_FillValue': nan,
 'dtype': dtype('float32'),
 'coordinates': 'dtime iters'}

In [7]:
ds.dtime.encoding

{'chunks': (8785,),
 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0),
 'filters': None,
 'units': 'hours since 2011-11-15 00:00:00',
 'calendar': 'proleptic_gregorian',
 'dtype': dtype('int64')}

In [8]:
ds.iters.encoding

{'chunks': (1,),
 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0),
 'filters': None,
 'dtype': dtype('int64')}

---
## 4.  Let's try to create a subset of data we just read, and write to another zarr file.  

In [9]:
dsmille=ds.isel(time=slice(0,1000))
print(dsmille)
print('\n data size: %.1f GB' %(dsmille.nbytes / 1e9))

<xarray.Dataset>
Dimensions:  (face: 13, i: 4320, j: 4320, time: 1000)
Coordinates:
    dtime    (time) datetime64[ns] dask.array<shape=(1000,), chunksize=(1000,)>
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i        (i) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
    iters    (time) int64 dask.array<shape=(1000,), chunksize=(1,)>
  * j        (j) int64 0 1 2 3 4 5 6 7 ... 4313 4314 4315 4316 4317 4318 4319
  * time     (time) float64 5.702e+06 5.706e+06 5.71e+06 ... 9.295e+06 9.299e+06
Data variables:
    SST      (time, face, j, i) float32 dask.array<shape=(1000, 13, 4320, 4320), chunksize=(1, 1, 4320, 4320)>

 data size: 970.4 GB


In [10]:
%time dsmille.to_zarr('/work/scratch/odakat/test.zarr', mode='w')

CPU times: user 1min 40s, sys: 5.43 s, total: 1min 45s
Wall time: 10min 54s


<xarray.backends.zarr.ZarrStore at 0x2b77370e6da0>

In [11]:
!du -hs /work/scratch/odakat/test.zarr

338G	/work/scratch/odakat/test.zarr


As you can see above, the dataset selected is 970G big in memory.
Compressions is used for the zarr archive(heritated from 'compressor 'encoding from orignal file 'ds' which is , 
`'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)`
so the filesize it self is 338G.

**AP: I am not sure compression is heritated here but it's simply the default ...**

---

## 5. We can try other compression method here. 

In [12]:
from numcodecs import blosc
blosc.list_compressors()

['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd']

In [13]:
import zarr
compressor = zarr.Blosc(cname='zstd', clevel=3, shuffle=2)

In [14]:
%time dsmille.to_zarr('/work/scratch/odakat/testzarr',  encoding={'SST': {'compressor': compressor}} , mode='w')

CPU times: user 3min 22s, sys: 8.16 s, total: 3min 30s
Wall time: 11min 14s


<xarray.backends.zarr.ZarrStore at 0x2b7741d79630>

In [15]:
!du -hs /work/scratch/odakat/testzarr

319G	/work/scratch/odakat/testzarr


 Well, looks like new compressing made us win about 20 G of space, but took 1.5 min more..

 ---
## 6. clean up

In [16]:
!rm -rf /work/scratch/odakat/testzarr
!rm -rf /work/scratch/odakat/test.zarr

In [17]:
cluster.close()