Skip to content

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Oct 16, 2025

As requested in #3526, this PR adds some documentation about the change to and performance considerations of the async.concurrency config setting.

TODO:

  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@jhamman jhamman requested review from d-v-b and dcherian October 16, 2025 14:07
Lower concurrency values may be beneficial when:
- Working with local storage with limited I/O bandwidth
- Memory is constrained (each concurrent operation requires buffer space)
- Using Zarr within a parallel computing framework (see below)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly this use case I ran into 😄 - thanks a lot for putting this together, it's a really nice summary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstansby Could you share a bit of specifics as to what you saw that might have indicated this setting as the culprit?

I like having examples, especially :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to run multiple parallel jobs, each one which writes to a shard, and manually run as many jobs as I had processors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, interesting, and this was causing consistency issues? I have some sharded dask writing but haven't noticed any issues. 2 jobs would try writing to the same shard?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it wasn't causing any data issues, just performance issues. I was manually writing to one shard per process, and because my data was local on a fast SSD async concurrency wouldn't have bought me anything, and the internal multithreading that zarr-python does was causing more threads than I had processors to spin up without me wanting/needing them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm interesting, I think zarrs just has a global pool (at least I think that's rayon's default) so that might explain why I haven't hit this issue. Good to know!

@d-v-b d-v-b enabled auto-merge (squash) October 20, 2025 10:29
@d-v-b d-v-b merged commit 2adacb8 into zarr-developers:main Oct 20, 2025
31 checks passed
@jhamman jhamman deleted the update-concurrency-default branch October 20, 2025 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants