Requirements to run this notebook:

1. [Install `perfcapture` and its dependencies in a new virtual environment](https://github.com/zarr-developers/perfcapture).
2. Activate that venv.
3. Install additional Python dependencies within the venv: `pip install ipykernel zarr matplotlib`
4. Optionally, install `fio` to benchmark your local hard disk. (On Ubuntu: `sudo apt install fio`)

# Benchmark local hard disk using `fio`

[`fio`](https://fio.readthedocs.io/) is a disk benchmarking tool written by [Jens Axboe](https://en.wikipedia.org/wiki/Jens_Axboe) (Jens is the current Linux kernel maintainer of the block layer).





In [1]:
!mkdir -p ~/temp/fio

In [2]:
%%capture fio_json
!fio --name=read --size=1g --direct=1 --bs=64k --rw=read \
     --ioengine=libaio --iodepth=32 \
     --directory="$HOME/temp/fio/" --output-format=json

In [3]:
import json
fio_json = json.loads(fio_json.stdout)
max_gbytes_per_sec = fio_json['jobs'][0]['read']['bw_bytes'] / 1E9
print(f"This hard drive is capable of {max_gbytes_per_sec:.3f} gigabytes per second.")

This hard drive is capable of 2.049 gigabytes per second.


# Create datasets and run workloads

In [1]:
from perfcapture.workload import load_workloads_from_filename, run_workloads
from perfcapture.dataset import create_datasets_if_necessary
import pathlib
import os
import time

In [2]:
path = pathlib.Path('../recipes/simple_zarr_python_workloads.py').resolve().absolute()
os.chdir(path.parent)
workloads = load_workloads_from_filename(path)

Instantiating ZarrPythonLoadEntireArray


In [3]:
DATA_PATH = pathlib.Path("~/temp/perfcapture_data_path").expanduser()
created_at_least_1_dataset: bool = create_datasets_if_necessary(workloads, DATA_PATH)
if created_at_least_1_dataset:
    print("Waiting for data to be flushed to disk...")
    time.sleep(10)

Found 5 Dataset object(s).
LZ4_10000_Chunks already exists.
Uncompressed_1_Chunk already exists.
Uncompressed_100_Chunks already exists.
Uncompressed_10000_Chunks already exists.
LZ4_100_Chunks already exists.


In [4]:
all_results = run_workloads(workloads, keep_cache=False)

Running ZarrPythonLoadEntireArray 3 times on Uncompressed_1_Chunk!
Run 0 of 3...
Run 1 of 3...
Run 2 of 3...
  Finished!
                         mean         std
Runtime in secs     17.073791    5.995052
GB/sec to numpy      0.251496    0.073595
read_IOPS         2271.500126  590.957598
write_IOPS         607.954355  638.033437
read_GB_per_sec      0.262233    0.073946
write_GB_per_sec     0.025769    0.022747

Running ZarrPythonLoadEntireArray 3 times on LZ4_100_Chunks!
Run 0 of 3...
Run 1 of 3...
Run 2 of 3...
  Finished!
                         mean        std
Runtime in secs      3.694430   0.534608
GB/sec to numpy      1.099429   0.173316
read_IOPS         1767.177702  77.267883
write_IOPS           7.676411   6.497545
read_GB_per_sec      0.188700   0.025538
write_GB_per_sec     0.000069   0.000040

Running ZarrPythonLoadEntireArray 3 times on Uncompressed_100_Chunks!
Run 0 of 3...
Run 1 of 3...
Run 2 of 3...
  Finished!
                         mean         std
Runtime in secs

In [8]:
all_results

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Runtime in secs,GB/sec to numpy,read_IOPS,write_IOPS,read_GB_per_sec,write_GB_per_sec
workload,dataset,run_ID,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
ZarrPythonLoadEntireArray,Uncompressed_1_Chunk,1,23.990605,0.166732,1612.881376,1344.02613,0.177127,0.051499
ZarrPythonLoadEntireArray,Uncompressed_1_Chunk,2,13.858097,0.28864,2446.223316,212.799781,0.29881,0.008333
ZarrPythonLoadEntireArray,Uncompressed_1_Chunk,3,13.372671,0.299118,2755.395687,267.037154,0.310764,0.017474
ZarrPythonLoadEntireArray,LZ4_100_Chunks,1,4.040398,0.990001,1799.32769,0.990001,0.175257,2.8e-05
ZarrPythonLoadEntireArray,LZ4_100_Chunks,2,3.964204,1.00903,1679.0256,8.072238,0.172692,6.9e-05
ZarrPythonLoadEntireArray,LZ4_100_Chunks,3,3.078687,1.299255,1823.179817,13.966993,0.218151,0.000109
ZarrPythonLoadEntireArray,Uncompressed_100_Chunks,1,11.566039,0.34584,2698.071483,17.464925,0.346185,0.001161
ZarrPythonLoadEntireArray,Uncompressed_100_Chunks,2,12.210625,0.327584,2737.370118,81.568306,0.333564,0.00705
ZarrPythonLoadEntireArray,Uncompressed_100_Chunks,3,10.723113,0.373026,3140.226164,8.113316,0.375452,9.2e-05
ZarrPythonLoadEntireArray,LZ4_10000_Chunks,1,3.603084,1.11016,2818.696428,3.33048,0.070459,3.9e-05
