-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: dask chunking on frame level implemented #242
base: main
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #242 will not alter performanceComparing Summary
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #242 +/- ##
==========================================
+ Coverage 94.97% 95.03% +0.05%
==========================================
Files 18 18
Lines 2429 2458 +29
==========================================
+ Hits 2307 2336 +29
Misses 122 122 ☔ View full report in Codecov by Sentry. |
great! tests are starting to pass now. I'll take a closer look soon. Were you able to qualitatively confirm that this indeed speeds things up if you read a single subchunk of a massive xy image? |
are you actually calling If the former (i.e. if you're computing the full plane), then I wouldn't be surprised at all about the time increasing as chunk size decreases. since there are more and more operations to complete. what would be important to verify, though, is that it should take less time to read a single chunk as the chunk size decreases (i.e. just to confirm that it can indeed efficiently access a subset of the data, rather than reading all the data and just cropping it down after the fact) |
This is purely executing chunks = [
(1024,),
(512,) * 2,
(256,) * 4,
(128,) * 8,
(64,) * 16,
(32,) * 32,
(16,) * 64,
]
chunkstr = [
"(1024,)",
"(512,)*2",
"(256,)*4",
"(128,)*8",
"(64,)*16",
"(32,)*32",
"(16,)*64",
]
file = nd2.ND2File(path)
file.sizes # {'P': 26, 'Z': 1263, 'C': 3, 'Y': 1024, 'X': 1024}
times = []
for c in chunks:
start = timeit.default_timer()
file.to_dask(
frame_chunks=((3,), c, (1024,))
)
times.append(timeit.default_timer()-start)
fig, ax = plt.subplots(1, 1, figsize=[10, 6])
x = [len(c) for c in chunks]
ax.plot(x, times)
ax.set_ylabel("time of `to_dask` in s")
ax.set_xlabel("Chunks in y dimension")
ax.set_xticks(x, chunkstr, rotation=90) |
I made some more simple tests. Still not with the
%%timeit
with nd2.ND2File(path) as file:
for i in range(500):
new = file.read_frame(i).copy()
%%timeit
with nd2.ND2File(path) as file:
for i in range(500):
new = file.read_frame(i)[:128, :128].copy()
%%timeit
with nd2.ND2File(path) as file:
for i in range(file.shape[0]):
j = np.ravel_multi_index((i, 0), file._coord_shape)
new = file.read_frame(j).copy()
%%timeit
with nd2.ND2File(path) as file:
for i in range(file.shape[0]):
j = np.ravel_multi_index((i, 0), file._coord_shape)
new = file.read_frame(j)[:128, :128].copy()
%%timeit
with nd2.ND2File(path) as file:
for i in range(file.shape[1]):
j = np.ravel_multi_index((0, i), file._coord_shape)
new = file.read_frame(j).copy()
%%timeit
with nd2.ND2File(path) as file:
for i in range(file.shape[1]):
j = np.ravel_multi_index((0, i), file._coord_shape)
new = file.read_frame(j)[:128, :128].copy()
The cropping on the I am going to also compare the subindexing of a region with |
Hi,
I tried to implement the sub-frame chunking in the
to_dask
and_dask_block
functions as mentioned in #85I also added some new tests. It might need some more comments.
Best,
Niklas