Requirements to run this notebook:

1. [Install `perfcapture` and its dependencies in a new virtual environment](https://github.com/zarr-developers/perfcapture).
2. Activate that venv.
3. Install additional Python dependencies within the venv: `pip install ipykernel zarr matplotlib`
4. Optionally, install `fio` to benchmark your local hard disk. (On Ubuntu: `sudo apt install fio`)

# Benchmark local hard disk using `fio`

[`fio`](https://fio.readthedocs.io/) is a disk benchmarking tool written by [Jens Axboe](https://en.wikipedia.org/wiki/Jens_Axboe) (Jens is the current Linux kernel maintainer of the block layer).





In [1]:
!mkdir -p ~/temp/fio

In [2]:
%%capture fio_json
!fio --name=read --size=1g --direct=1 --bs=64k --rw=read \
     --ioengine=libaio --iodepth=32 \
     --directory="$HOME/temp/fio/" --output-format=json

In [3]:
import json
fio_json = json.loads(fio_json.stdout)
max_gbytes_per_sec = fio_json['jobs'][0]['read']['bw_bytes'] / 1E9
print(f"This hard drive is capable of {max_gbytes_per_sec:.3f} gigabytes per second.")

This hard drive is capable of 2.049 gigabytes per second.


# Create datasets and run workloads

In [1]:
from perfcapture.workload import load_workloads_from_filename, run_workloads
from perfcapture.dataset import create_datasets_if_necessary
import pathlib
import os
import time

In [2]:
path = pathlib.Path('../recipes/simple_zarr_python_workloads.py').resolve().absolute()
os.chdir(path.parent)
workloads = load_workloads_from_filename(path)

Instantiating ZarrPythonLoadEntireArray


In [3]:
DATA_PATH = pathlib.Path("~/temp/perfcapture_data_path").expanduser()
created_at_least_1_dataset: bool = create_datasets_if_necessary(workloads, DATA_PATH)
if created_at_least_1_dataset:
    print("Waiting for data to be flushed to disk...")
    time.sleep(10)

Found 5 Dataset object(s).
LZ4_100_Chunks already exists.
Uncompressed_1_Chunk already exists.
Uncompressed_10000_Chunks already exists.
Uncompressed_100_Chunks already exists.
LZ4_10000_Chunks already exists.


In [4]:
all_timers = run_workloads(workloads, keep_cache=False)

Running ZarrPythonLoadEntireArray 3 times on Uncompressed_1_Chunk!
  Finished!
   Runtime in secs: mean =    13.592; std =     0.647
   GB/sec to numpy: mean =     0.295; std =     0.014

Disk IO:
         read_iops: mean =  2344.145; std =   152.532
        write_iops: mean =   189.996; std =   213.327
   read_GB_per_sec: mean =     0.298; std =     0.016
  write_GB_per_sec: mean =     0.013; std =     0.013


Running ZarrPythonLoadEntireArray 3 times on LZ4_100_Chunks!
  Finished!
   Runtime in secs: mean =     3.239; std =     0.679
   GB/sec to numpy: mean =     1.268; std =     0.238

Disk IO:
         read_iops: mean =  1921.483; std =   200.297
        write_iops: mean =    11.008; std =     5.790
   read_GB_per_sec: mean =     0.214; std =     0.037
  write_GB_per_sec: mean =     0.000; std =     0.000


Running ZarrPythonLoadEntireArray 3 times on Uncompressed_100_Chunks!
  Finished!
   Runtime in secs: mean =    10.933; std =     0.262
   GB/sec to numpy: mean =     0.366; st

In [5]:
all_timers

{('ZarrPythonLoadEntireArray',
  'Uncompressed_1_Chunk'): CounterManager(counters=[<perfcapture.performance_counters.Runtime object at 0x7f1506b3e5d0>, <perfcapture.performance_counters.BandwidthToNumpy object at 0x7f1542f35410>, <perfcapture.performance_counters.DiskIO object at 0x7f150710c610>]),
 ('ZarrPythonLoadEntireArray',
  'LZ4_100_Chunks'): CounterManager(counters=[<perfcapture.performance_counters.Runtime object at 0x7f1506b4bad0>, <perfcapture.performance_counters.BandwidthToNumpy object at 0x7f1506b4ba90>, <perfcapture.performance_counters.DiskIO object at 0x7f1506b53050>]),
 ('ZarrPythonLoadEntireArray',
  'Uncompressed_100_Chunks'): CounterManager(counters=[<perfcapture.performance_counters.Runtime object at 0x7f1506b64ad0>, <perfcapture.performance_counters.BandwidthToNumpy object at 0x7f1506b64dd0>, <perfcapture.performance_counters.DiskIO object at 0x7f1506b51590>]),
 ('ZarrPythonLoadEntireArray',
  'LZ4_10000_Chunks'): CounterManager(counters=[<perfcapture.performance