Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changes/3526.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Increased the default value of `async.concurrency` from 10 to 64 to improve parallelism and throughput for concurrent I/O operations. This change enables better performance out-of-the-box for most workloads. Users with specific resource constraints or when using many Dask threads may want to lower this value via the `ZARR_ASYNC_CONCURRENCY` environment variable or by setting `zarr.config.set({'async.concurrency': N})`.
79 changes: 78 additions & 1 deletion docs/user-guide/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,84 @@ Coming soon.

## Parallel computing and synchronization

Coming soon.
Zarr is designed to support parallel computing and enables concurrent reads and writes to arrays. This section covers how to optimize Zarr's concurrency settings for different parallel computing scenarios.

### Concurrent I/O operations

Zarr uses asynchronous I/O internally to enable concurrent reads and writes across multiple chunks. The level of concurrency is controlled by the `async.concurrency` configuration setting, which determines the maximum number of concurrent I/O operations.

The default value is 64, which provides good performance for most workloads. You can adjust this value based on your specific needs:

```python
import zarr

# Set concurrency for the current session
zarr.config.set({'async.concurrency': 128})

# Or use environment variable
# export ZARR_ASYNC_CONCURRENCY=128
```

Higher concurrency values can improve throughput when:
- Working with remote storage (e.g., S3, GCS) where network latency is high
- Reading/writing many small chunks in parallel
- The storage backend can handle many concurrent requests

Lower concurrency values may be beneficial when:
- Working with local storage with limited I/O bandwidth
- Memory is constrained (each concurrent operation requires buffer space)
- Using Zarr within a parallel computing framework (see below)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly this use case I ran into 😄 - thanks a lot for putting this together, it's a really nice summary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstansby Could you share a bit of specifics as to what you saw that might have indicated this setting as the culprit?

I like having examples, especially :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to run multiple parallel jobs, each one which writes to a shard, and manually run as many jobs as I had processors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, interesting, and this was causing consistency issues? I have some sharded dask writing but haven't noticed any issues. 2 jobs would try writing to the same shard?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it wasn't causing any data issues, just performance issues. I was manually writing to one shard per process, and because my data was local on a fast SSD async concurrency wouldn't have bought me anything, and the internal multithreading that zarr-python does was causing more threads than I had processors to spin up without me wanting/needing them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm interesting, I think zarrs just has a global pool (at least I think that's rayon's default) so that might explain why I haven't hit this issue. Good to know!


### Using Zarr with Dask

[Dask](https://www.dask.org/) is a popular parallel computing library that works well with Zarr for processing large arrays. When using Zarr with Dask, it's important to consider the interaction between Dask's thread pool and Zarr's concurrency settings.

**Important**: When using many Dask threads, you may need to reduce both Zarr's `async.concurrency` and `threading.max_workers` settings to avoid creating too many concurrent operations. The total number of concurrent I/O operations can be roughly estimated as:

```
total_concurrency ≈ dask_threads × zarr_async_concurrency
```

For example, if you're running Dask with 10 threads and Zarr's default concurrency of 64, you could potentially have up to 640 concurrent operations, which may overwhelm your storage system or cause memory issues.

**Recommendation**: When using Dask with many threads, configure Zarr's concurrency settings:

```python
import zarr
import dask.array as da

# If using Dask with many threads (e.g., 8-16), reduce Zarr's concurrency settings
zarr.config.set({
'async.concurrency': 4, # Limit concurrent async operations
'threading.max_workers': 4, # Limit Zarr's internal thread pool
})

# Open Zarr array
z = zarr.open_array('data/large_array.zarr', mode='r')

# Create Dask array from Zarr array
arr = da.from_array(z, chunks=z.chunks)

# Process with Dask
result = arr.mean(axis=0).compute()
```

**Configuration guidelines for Dask workloads**:

- `async.concurrency`: Controls the maximum number of concurrent async I/O operations. Start with a lower value (e.g., 4-8) when using many Dask threads.
- `threading.max_workers`: Controls Zarr's internal thread pool size for blocking operations (defaults to CPU count). Reduce this to avoid thread contention with Dask's scheduler.

You may need to experiment with different values to find the optimal balance for your workload. Monitor your system's resource usage and adjust these settings based on whether your storage system or CPU is the bottleneck.

### Thread safety and process safety

Zarr arrays are designed to be thread-safe for concurrent reads and writes from multiple threads within the same process. However, proper synchronization is required when writing to overlapping regions from multiple threads.

For multi-process parallelism, Zarr provides safe concurrent writes as long as:
- Different processes write to different chunks
- The storage backend supports atomic writes (most do)

When writing to the same chunks from multiple processes, you should use external synchronization mechanisms or ensure that writes are coordinated to avoid race conditions.

## Pickle support

Expand Down