In [1]:
!pip install xarray zarr s5cmd





# Bulk Data Download

This notebook shows how to perform bulk downloads with a S3 command line tool. This is useful if you want to have local access to a big subset of the data or event download the whole archive!

We can download data in bulk using any command line for that supports the S3 protocol. We recommend using the [s5cmd](https://github.com/peak/s5cmd) tool, which can be simply installed by running:

`pip install s5cmd`


Now we can download data using the `cp` command.

In this example, we are going to transfer the thompson scattering data for shot `30420` locally.

We need to set the endpoint of where the bucket is hosted (for now: `https://s3.echo.stfc.ac.uk`) and we need to set `--no-sign-request` for annonymous access.

In [2]:
%%bash
s5cmd --no-sign-request --endpoint-url https://s3.echo.stfc.ac.uk cp s3://mast/level2/shots/30420.zarr/thomson_scattering/* ./30420.zarr/thomson_scattering

cp s3://mast/level2/shots/30420.zarr/thomson_scattering/t_e_core/0 30420.zarr/thomson_scattering/t_e_core/0


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/n_e_core/0 30420.zarr/thomson_scattering/n_e_core/0


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/t_e_core/.zattrs 30420.zarr/thomson_scattering/t_e_core/.zattrs


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/n_e/.zattrs 30420.zarr/thomson_scattering/n_e/.zattrs


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/t_e/.zattrs 30420.zarr/thomson_scattering/t_e/.zattrs


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/time/0 30420.zarr/thomson_scattering/time/0


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/t_e/.zarray 30420.zarr/thomson_scattering/t_e/.zarray


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/p_e/.zattrs 30420.zarr/thomson_scattering/p_e/.zattrs


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/.zattrs 30420.zarr/thomson_scattering/.zattrs


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/major_radius/0 30420.zarr/thomson_scattering/major_radius/0


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/time/.zattrs 30420.zarr/thomson_scattering/time/.zattrs


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/t_e_core/.zarray 30420.zarr/thomson_scattering/t_e_core/.zarray


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/p_e/0.0 30420.zarr/thomson_scattering/p_e/0.0


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/major_radius/.zattrs 30420.zarr/thomson_scattering/major_radius/.zattrs


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/n_e/.zarray 30420.zarr/thomson_scattering/n_e/.zarray


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/p_e/.zarray 30420.zarr/thomson_scattering/p_e/.zarray


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/n_e_core/.zattrs 30420.zarr/thomson_scattering/n_e_core/.zattrs


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/.zgroup 30420.zarr/thomson_scattering/.zgroup


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/time/.zarray 30420.zarr/thomson_scattering/time/.zarray


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/major_radius/.zarray 30420.zarr/thomson_scattering/major_radius/.zarray


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/t_e/0.0 30420.zarr/thomson_scattering/t_e/0.0


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/n_e/0.0 30420.zarr/thomson_scattering/n_e/0.0


cp s3://mast/level2/shots/30420.zarr/thomson_scattering/n_e_core/.zarray 30420.zarr/thomson_scattering/n_e_core/.zarray


Finally, we can open the file locally:

In [3]:
import xarray as xr
xr.open_zarr('30420.zarr', group='thomson_scattering')

1. Consolidating metadata in this existing store with zarr.consolidate_metadata().
2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or
3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.
  xr.open_zarr('30420.zarr', group='thomson_scattering')


Unnamed: 0,Array,Chunk
Bytes,82.50 kiB,82.50 kiB
Shape,"(88, 120)","(88, 120)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 82.50 kiB 82.50 kiB Shape (88, 120) (88, 120) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",120  88,

Unnamed: 0,Array,Chunk
Bytes,82.50 kiB,82.50 kiB
Shape,"(88, 120)","(88, 120)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,704 B,704 B
Shape,"(88,)","(88,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 704 B 704 B Shape (88,) (88,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",88  1,

Unnamed: 0,Array,Chunk
Bytes,704 B,704 B
Shape,"(88,)","(88,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,82.50 kiB,82.50 kiB
Shape,"(88, 120)","(88, 120)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 82.50 kiB 82.50 kiB Shape (88, 120) (88, 120) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",120  88,

Unnamed: 0,Array,Chunk
Bytes,82.50 kiB,82.50 kiB
Shape,"(88, 120)","(88, 120)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,82.50 kiB,82.50 kiB
Shape,"(88, 120)","(88, 120)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 82.50 kiB 82.50 kiB Shape (88, 120) (88, 120) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",120  88,

Unnamed: 0,Array,Chunk
Bytes,82.50 kiB,82.50 kiB
Shape,"(88, 120)","(88, 120)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,704 B,704 B
Shape,"(88,)","(88,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 704 B 704 B Shape (88,) (88,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",88  1,

Unnamed: 0,Array,Chunk
Bytes,704 B,704 B
Shape,"(88,)","(88,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
