Requirements to run this notebook:

1. [Install `perfcapture` and its dependencies in a new virtual environment](https://github.com/zarr-developers/perfcapture).
2. Activate that venv.
3. Install additional Python dependencies within the venv: `pip install ipykernel zarr matplotlib`
4. Optionally, install `fio` to benchmark your local hard disk. (On Ubuntu: `sudo apt install fio`)

# Benchmark local hard disk using `fio`

[`fio`](https://fio.readthedocs.io/) is a disk benchmarking tool written by [Jens Axboe](https://en.wikipedia.org/wiki/Jens_Axboe) (Jens is the current Linux kernel maintainer of the block layer).





In [89]:
!mkdir -p ~/temp/fio

In [108]:
%%capture fio_json
!fio --name=read --size=1g --direct=1 --bs=64k --rw=read \
     --ioengine=libaio --iodepth=32 \
     --directory="$HOME/temp/fio/" --output-format=json

In [109]:
import json
fio_json = json.loads(fio_json.stdout)
max_gbytes_per_sec = fio_json['jobs'][0]['read']['bw_bytes'] / 1E9
print(f"This hard drive is capable of {max_gbytes_per_sec:.3f} gigabytes per second.")

This hard drive is capable of 1.632 gigabytes per second.


# Create datasets and run workloads

In [1]:
from perfcapture.workload import load_workloads_from_filename, run_workloads
from perfcapture.dataset import create_datasets_if_necessary
import pathlib
import os

In [2]:
path = pathlib.Path('../recipes/simple_zarr_python_workloads.py').resolve().absolute()
os.chdir(path.parent)
workloads = load_workloads_from_filename(path)

Instantiating ZarrPythonLoadEntireArray


In [3]:
DATA_PATH = pathlib.Path("~/temp/perfcapture_data_path").expanduser()
create_datasets_if_necessary(workloads, DATA_PATH)

Found 5 Dataset object(s).
<simple_datasets.Uncompressed_10000_Chunks object at 0x7f9347fa6210> already exists.
<simple_datasets.LZ4_100_Chunks object at 0x7f936c116250> already exists.
<simple_datasets.Uncompressed_100_Chunks object at 0x7f9347fa5490> already exists.
<simple_datasets.Uncompressed_1_Chunk object at 0x7f93641ad2d0> already exists.
<simple_datasets.LZ4_10000_Chunks object at 0x7f9347fa6110> already exists.


In [4]:
all_timers = run_workloads(workloads, keep_cache=False)

Running ZarrPythonLoadEntireArray 1 times on /home/jack/temp/perfcapture_data_path/Uncompressed_1_Chunk!
  Finished!
  Runtime: mean = 11.753 seconds; std = 0.000
  Gigabytes per second: mean = 0.340 GB/s; std = 0.000

Running ZarrPythonLoadEntireArray 1 times on /home/jack/temp/perfcapture_data_path/LZ4_100_Chunks!
  Finished!
  Runtime: mean = 12.615 seconds; std = 0.000
  Gigabytes per second: mean = 0.317 GB/s; std = 0.000

Running ZarrPythonLoadEntireArray 1 times on /home/jack/temp/perfcapture_data_path/Uncompressed_100_Chunks!
  Finished!
  Runtime: mean = 10.855 seconds; std = 0.000
  Gigabytes per second: mean = 0.369 GB/s; std = 0.000

Running ZarrPythonLoadEntireArray 1 times on /home/jack/temp/perfcapture_data_path/LZ4_10000_Chunks!
  Finished!
  Runtime: mean = 10.845 seconds; std = 0.000
  Gigabytes per second: mean = 0.369 GB/s; std = 0.000

Running ZarrPythonLoadEntireArray 1 times on /home/jack/temp/perfcapture_data_path/Uncompressed_10000_Chunks!
  Finished!
  Runtime

In [5]:
all_timers

{'ZarrPythonLoadEntireArray /home/jack/temp/perfcapture_data_path/Uncompressed_1_Chunk': <perfcapture.timer.Timer at 0x7f9364708cd0>,
 'ZarrPythonLoadEntireArray /home/jack/temp/perfcapture_data_path/LZ4_100_Chunks': <perfcapture.timer.Timer at 0x7f936da3b9d0>,
 'ZarrPythonLoadEntireArray /home/jack/temp/perfcapture_data_path/Uncompressed_100_Chunks': <perfcapture.timer.Timer at 0x7f936c1538d0>,
 'ZarrPythonLoadEntireArray /home/jack/temp/perfcapture_data_path/LZ4_10000_Chunks': <perfcapture.timer.Timer at 0x7f936473ee10>,
 'ZarrPythonLoadEntireArray /home/jack/temp/perfcapture_data_path/Uncompressed_10000_Chunks': <perfcapture.timer.Timer at 0x7f9347fecdd0>}