Skip to content

Conversation

@d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Feb 12, 2026

removes an expensive isinstance check inside BytesCodec._decode_single. Isinstance on runtime_checkable protocols is expensive and this particular check is in a hotspot. Without the check, we are slightly less type-safe, but users who somehow get a non-ndarray into this part of the code will get an immediate a runtime error.

@d-v-b d-v-b added the performance Potential issues with Zarr performance (I/O, memory, etc.) label Feb 12, 2026
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 12, 2026
@d-v-b d-v-b added the benchmark Code will be benchmarked in a CI job. label Feb 12, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 12, 2026

Merging this PR will improve performance by 35.48%

⚡ 32 improved benchmarks
✅ 16 untouched benchmarks
⏩ 6 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime test_slice_indexing[(50, 50, 50)-(slice(None, None, None), slice(None, None, None), slice(None, None, None))-memory] 545.4 ms 412.8 ms +32.11%
WallTime test_slice_indexing[(50, 50, 50)-(slice(None, None, None), slice(None, None, None), slice(None, None, None))-memory_get_latency] 562.7 ms 430.5 ms +30.71%
WallTime test_slice_indexing[None-(slice(None, None, None), slice(None, None, None), slice(None, None, None))-memory] 506.9 ms 375.8 ms +34.89%
WallTime test_slice_indexing[None-(slice(None, 10, None), slice(None, 10, None), slice(None, 10, None))-memory] 984.7 µs 866.4 µs +13.65%
WallTime test_slice_indexing[None-(slice(0, None, 4), slice(0, None, 4), slice(0, None, 4))-memory_get_latency] 557.7 ms 424.2 ms +31.49%
WallTime test_slice_indexing[None-(slice(None, None, None), slice(None, None, None), slice(None, None, None))-memory_get_latency] 560.1 ms 426.9 ms +31.22%
WallTime test_slice_indexing[(50, 50, 50)-(slice(None, 10, None), slice(None, 10, None), slice(None, 10, None))-memory] 2 ms 1.8 ms +11.86%
WallTime test_slice_indexing[None-(slice(None, None, None), slice(0, 3, 2), slice(0, 10, None))-memory] 4.7 ms 3.6 ms +30.18%
WallTime test_slice_indexing[(50, 50, 50)-(slice(10, -10, 4), slice(10, -10, 4), slice(10, -10, 4))-memory] 285 ms 215.4 ms +32.32%
WallTime test_slice_indexing[(50, 50, 50)-(slice(0, None, 4), slice(0, None, 4), slice(0, None, 4))-memory_get_latency] 557.1 ms 425.7 ms +30.89%
WallTime test_slice_indexing[None-(slice(0, None, 4), slice(0, None, 4), slice(0, None, 4))-memory] 503.6 ms 373.6 ms +34.79%
WallTime test_slice_indexing[(50, 50, 50)-(0, 0, 0)-memory] 1.9 ms 1.7 ms +11.8%
WallTime test_slice_indexing[(50, 50, 50)-(slice(0, None, 4), slice(0, None, 4), slice(0, None, 4))-memory] 540 ms 409.4 ms +31.91%
WallTime test_slice_indexing[(50, 50, 50)-(slice(10, -10, 4), slice(10, -10, 4), slice(10, -10, 4))-memory_get_latency] 312.6 ms 240.6 ms +29.93%
WallTime test_slice_indexing[None-(slice(10, -10, 4), slice(10, -10, 4), slice(10, -10, 4))-memory] 275.6 ms 203.9 ms +35.16%
WallTime test_slice_indexing[(50, 50, 50)-(slice(None, None, None), slice(0, 3, 2), slice(0, 10, None))-memory_get_latency] 8.4 ms 7 ms +20.8%
WallTime test_slice_indexing[None-(slice(None, None, None), slice(0, 3, 2), slice(0, 10, None))-memory_get_latency] 5.2 ms 4.1 ms +27.79%
WallTime test_slice_indexing[(50, 50, 50)-(slice(None, None, None), slice(0, 3, 2), slice(0, 10, None))-memory] 7.5 ms 6.1 ms +23.04%
WallTime test_read_array[memory-Layout(shape=(1000000,), chunks=(1000,), shards=(1000,))-None] 1,173.7 ms 971.4 ms +20.83%
WallTime test_slice_indexing[None-(slice(10, -10, 4), slice(10, -10, 4), slice(10, -10, 4))-memory_get_latency] 303.4 ms 232 ms +30.78%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing d-v-b:perf/remove-isinstance-check (ca35193) with main (e03cfc8)

Open in CodSpeed

Footnotes

  1. 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 12, 2026

35% perf improvement seems good

@d-v-b d-v-b requested a review from jhamman February 12, 2026 19:02
@d-v-b d-v-b changed the title perf/remove isinstance check perf:remove isinstance check Feb 12, 2026
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Feb 12, 2026
@d-v-b d-v-b requested a review from dcherian February 12, 2026 21:01
dtype = replace(chunk_spec.dtype, endianness=endian_str).to_native_dtype() # type: ignore[call-arg]
else:
dtype = chunk_spec.dtype.to_native_dtype()
as_array_like = chunk_bytes.as_array_like()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just have as_array_like become as_ndarray_like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean just rename the variable? We have to choose where the surprise is: at the as_array_like call (surprising if it returns an ndarraylike) or from_ndarray_like (surprising if it accepts an arraylike) call. I don't see much of a difference here, but happy to rename if you feel strongly

Copy link
Contributor

@dcherian dcherian Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is why I shouldn't reply when half-asleep at night)

Apologies for the confusion. Can we not have Buffer.as_ndarray_like() that makes the bytes ready for the codec pipeline in the form the codec pipeline needs it i.e. NDArrayLike? That way it is type safe and performant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry so a new method, that ensures that the contents of the buffer are an ndarray? yeah I think that should be easy to add!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it OK if we spin that out into a separate issue?

Copy link
Contributor

@dcherian dcherian Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but isn't the needed change just modifying as_array_like to call np.asanyarray in the CPU buffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this requires changing the as_array_like method for gpu buffer too, and/or the method on the abc?

def as_array_like(self) -> ArrayLike:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, up to you as to when you want to fix it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer is public api so I don't think we want to change that as part of a performance fix.

@dcherian
Copy link
Contributor

Also, this will close #3703 ?

@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 13, 2026

Also, this will close #3703 ?

it will close part of it, but not the underlying issue about our basic buffer / codec design.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I have a preference for a new as_ndarray_like method that removes this if from the hotpath.

@d-v-b d-v-b merged commit 23596c1 into zarr-developers:main Feb 13, 2026
26 checks passed
@d-v-b d-v-b deleted the perf/remove-isinstance-check branch February 13, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Code will be benchmarked in a CI job. performance Potential issues with Zarr performance (I/O, memory, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants