# Distributed Data Frontier

## Learning objectives

- Become familiar the Interplanetary File System (IPFS), and the **verifiable, distributed Web 3**
- Understand IPFS's relationship with **zarr**
- Identify benefits of **reliability, performance, and sustainability**
- Identify **next steps for Dask** to leverage IPFS

In [1]:
# Get metadata on an image
!ome_zarr info https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/

https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/ [zgroup]
 - metadata
   - Multiscales
   - OMERO
 - data
   - (1, 2, 236, 275, 271)
   - (1, 2, 236, 137, 135)


*Does the entire dataset need to be downloaded to examine its metadata?*

In [15]:
# Download an image dataset
!ome_zarr download https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/6001240.zarr/ --output images

downloading...
   6001240.zarr
   6001240.zarr/labels
   6001240.zarr/masks
to images
[########################################] | 100% Completed |  2.5s
[########################################] | 100% Completed |  2.1s
[########################################] | 100% Completed |  4.3s


*Examine the contents of the filesystem representation of the OME-Zarr multi-scale image. What information is stored in each file?

In [27]:
%ls -a images/6001240.zarr

[0m[01;34m.[0m/  [01;34m..[0m/  [01;34m0[0m/  [01;34m1[0m/  [01;34m2[0m/  [01;34mlabels[0m/  [01;34mmasks[0m/  .zattrs  .zgroup


In [28]:
%pycat images/6001240.zarr/.zattrs

In [29]:
%pycat images/6001240.zarr/.zgroup

In [1]:
%ls -a images/6001240.zarr/0

[0m[01;34m.[0m/           0.0.169.0.0  0.0.29.0.0  0.1.10.0.0   0.1.172.0.0  0.1.32.0.0
[01;34m..[0m/          0.0.17.0.0   0.0.3.0.0   0.1.100.0.0  0.1.173.0.0  0.1.33.0.0
0.0.0.0.0    0.0.170.0.0  0.0.30.0.0  0.1.101.0.0  0.1.174.0.0  0.1.34.0.0
0.0.1.0.0    0.0.171.0.0  0.0.31.0.0  0.1.102.0.0  0.1.175.0.0  0.1.35.0.0
0.0.10.0.0   0.0.172.0.0  0.0.32.0.0  0.1.103.0.0  0.1.176.0.0  0.1.36.0.0
0.0.100.0.0  0.0.173.0.0  0.0.33.0.0  0.1.104.0.0  0.1.177.0.0  0.1.37.0.0
0.0.101.0.0  0.0.174.0.0  0.0.34.0.0  0.1.105.0.0  0.1.178.0.0  0.1.38.0.0
0.0.102.0.0  0.0.175.0.0  0.0.35.0.0  0.1.106.0.0  0.1.179.0.0  0.1.39.0.0
0.0.103.0.0  0.0.176.0.0  0.0.36.0.0  0.1.107.0.0  0.1.18.0.0   0.1.4.0.0
0.0.104.0.0  0.0.177.0.0  0.0.37.0.0  0.1.108.0.0  0.1.180.0.0  0.1.40.0.0
0.0.105.0.0  0.0.178.0.0  0.0.38.0.0  0.1.109.0.0  0.1.181.0.0  0.1.41.0.0
0.0.106.0.0  0.0.179.0.0  0.0.39.0.0  0.1.11.0.0   0.1.182.0.0  0.1.42.0.0
0.0.107.0.0  0.0.18.0.0   0.0.4.0.0   0.1.110.0.0  0.1.183.0.

In [2]:
%pycat images/6001240.zarr/0/.zarray

In [32]:
import zarr
group = zarr.open('images/6001240.zarr/')
group

<zarr.hierarchy.Group '/'>

In [33]:
group.attrs.keys()

dict_keys(['_creator', 'multiscales', 'omero'])

In [34]:
group.attrs['multiscales']

[{'datasets': [{'path': '0'}, {'path': '1'}, {'path': '2'}], 'version': '0.1'}]

In [35]:
list(group.keys())

['0', '1', '2', 'labels', 'masks']

In [36]:
scale0 = group['0']

In [37]:
scale0

<zarr.core.Array '/0' (1, 2, 236, 275, 271) uint16>

In [38]:
import numpy as np
np.asarray(scale0)

array([[[[[ 8,  9,  8, ...,  9,  9, 10],
          [ 9,  9,  9, ...,  8,  9,  9],
          [ 8,  8,  8, ..., 26, 40,  8],
          ...,
          [ 9,  9,  9, ...,  9, 10, 14],
          [ 8,  9, 10, ...,  9, 10,  9],
          [ 9,  8, 10, ..., 10,  8,  8]],

         [[ 9,  9,  9, ...,  8, 11, 11],
          [ 9,  8,  9, ..., 10,  9, 10],
          [ 9, 16,  9, ..., 39, 30,  9],
          ...,
          [10,  9, 10, ..., 10, 10,  9],
          [10,  8, 10, ..., 10, 10, 10],
          [10, 11,  9, ...,  9, 10, 10]],

         [[ 9,  9,  9, ..., 14,  7, 15],
          [ 9,  9,  9, ..., 10,  9,  9],
          [ 8,  9,  9, ...,  9, 67,  8],
          ...,
          [ 8,  9,  9, ...,  9, 19,  9],
          [ 8,  9,  8, ...,  7,  9, 10],
          [ 7,  9,  9, ...,  9,  9, 10]],

         ...,

         [[ 8,  9, 57, ...,  9,  9,  8],
          [ 8,  9,  8, ...,  7,  8,  9],
          [21,  9,  9, ...,  8,  9,  7],
          ...,
          [ 9,  9,  8, ...,  7,  8,  9],
          [14,  9

In [None]:
# Note: `ipfs init` is required as a first-time setup

In [11]:
import subprocess
cid = subprocess.check_output(['ipfs', 'add', '-r', '--hidden', '-s', 'size-1000000',
                               '--raw-leaves', '--cid-version', '1', '-w', '-Q',
                               './images/6001240.zarr']).decode().strip()
print(cid)

bafybeigxtkiy6y6vlcyqjspxqkpfr2pignkzepbyhai3zfomut6mtbyu5u


In [12]:
from ipfsspec import IPFSFileSystem

fs = IPFSFileSystem()
store = fs.get_mapper(f'ipfs://{cid}/6001240.zarr')
store

<fsspec.mapping.FSMap at 0x7f26ca46c220>

In [13]:
import zarr

group = zarr.open(store)

In [14]:
scale0 = group['0']

In [15]:
scale0

<zarr.core.Array '/0' (1, 2, 236, 275, 271) uint16>

In [16]:
import dask.array as da

scale0_da = da.from_zarr(scale0)
scale0_da

Unnamed: 0,Array,Chunk
Bytes,67.09 MiB,145.56 kiB
Shape,"(1, 2, 236, 275, 271)","(1, 1, 1, 275, 271)"
Count,473 Tasks,472 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 67.09 MiB 145.56 kiB Shape (1, 2, 236, 275, 271) (1, 1, 1, 275, 271) Count 473 Tasks 472 Chunks Type uint16 numpy.ndarray",2  1  271  275  236,

Unnamed: 0,Array,Chunk
Bytes,67.09 MiB,145.56 kiB
Shape,"(1, 2, 236, 275, 271)","(1, 1, 1, 275, 271)"
Count,473 Tasks,472 Chunks
Type,uint16,numpy.ndarray


In [17]:
vol_c1 = scale0_da[0,1,:,:,:]

In [18]:
from itkwidgets import view
view(vol_c1)

Viewer(geometries=[], gradient_opacity=0.22, point_sets=[], rendered_image=<itk.itkImagePython.itkImageUS3; pr…

## Next steps

- Dask workers start an IPFS daemon, if not already started
- Dask workers make direct swarm connections


### IPFS Resources

- IPFS Introduction
- IPFS Docs
- Awesome IPFS
- Juan