issues with dask arrays returned by nd2: depend on ND2File being open, can't be pickled #19

VolkerH · 2021-11-10T09:35:40Z

Hi Talley,

a couple of observations regarding the dask arrays returned by nd2. None of this is terribly surprising (so you are probbably aware), and I am not expecting this can really be fixed. But maybe some lines in the readme file could save users some headaches.

keep ND2File object in open() state when working with dask array

For my use case I created a little helper function (pseudo-code):

def provide_tiles_as_daskarray_from_nd2(filepath):
      with nd2.ND2File(filepath) as f:
            arr = f.to_dask()
            # do a few more things on array: cropping, sorting dimensions
            return arr

I have similiar functions for other file types, e.g. something like provide_tiles_as_daskarray_from_folder_of_imgs(filepath).

On first sight, this works. I used this function from a Jupyter Notebook and it seemed as if it returns a working dask array.
Until you do anything that forces a compute(), in which case the python kernel crashes without a traceback.
The issue is that f is closed when returning out of the context manager. The same thing happens if you don't use a context manager and call f.close() before returning. Again, not terribly surprising. When running in the VS Code debugger I actually saw a trace back and saw that there is a segmentation fault when the nd2 library is trying to read data.

My workaround is to not close f which is a bit dirty as there will be open file handles hanging around possibly causing memory leaks.

cloudpickle

The dask arrays returned by nd2 can't be serialized using cloudpickle due to the ND2File object not being pickleable:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-8a76360da592> in <module>
----> 1 cloudpickle.dump(array,f)

~/miniconda3/envs/napari_latest/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py in dump(obj, file, protocol, buffer_callback)
     53         compatibility with older versions of Python.
     54         """
---> 55         CloudPickler(
     56             file, protocol=protocol, buffer_callback=buffer_callback
     57         ).dump(obj)

~/miniconda3/envs/napari_latest/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
    561     def dump(self, obj):
    562         try:
--> 563             return Pickler.dump(self, obj)
    564         except RuntimeError as e:
    565             if "recursion" in e.args[0]:

~/miniconda3/envs/napari_latest/lib/python3.9/site-packages/nd2/_sdk/latest.cpython-39-x86_64-linux-gnu.so in nd2._sdk.latest.ND2Reader.__reduce_cython__()

TypeError: no default __reduce__ due to non-trivial __cinit__

Again, not terribly surprising. However, the implication is (I haven't tried it, but am fairly certain this is the case) that the dask arrays returned by nd2 cannot be used in a dask.distributed context where the pickled array needs to be sent to the various worker processes.

A workaround for both would be to create a temporary .zarr file (pseudo-code):

def provide_tiles_as_daskarray_from_nd2(filepath):
      with nd2.ND2File(filepath) as f:
            arr = f.to_dask()
            # do a few more things on array: cropping, sorting dimensions
            arr.to_zarr("_tmpfile.zarr")
            f.close()
            return dask.array.from_zarr("_tmpfile.zarr")

with obvious drawback of duplicating the required storarge space.

I don't know whether there is much that can be done about this, but it may be useful to add a note of caution in the Readme.

The text was updated successfully, but these errors were encountered:

VolkerH · 2021-11-10T10:48:48Z

Actually, I just looked a bit into un/pickling and it seems that adding these to methods to the ND2File class solve the pickling problem:

    def __getstate__(self):
        state = self.__dict__.copy()
        del state['_rdr']
        return state 

    def __setstate__(self, d):
        print("I'm being unpickled with these values: " + repr(d))
        self.__dict__ = d    
        self._rdr = get_reader(self._path)
        if not self._closed:
            self.open()

I can now pickle the dask arrays and when I unpickle them I can run compute().

I will submit a PR.

tlambert03 · 2021-11-10T13:21:57Z

keep ND2File object in open() state when working with dask array

yeah, this is a tricky one... I actually wrote a wrapper over at aicsimageio for just this purpose:
https://github.com/AllenCellModeling/aicsimageio/blob/main/aicsimageio/utils/dask_proxy.py

could use it here as well?

Actually, I just looked a bit into un/pickling and it seems that adding these to methods to the ND2File class solve the pickling problem:

awesome thank you!

VolkerH · 2021-11-10T13:42:56Z

yeah, this is a tricky one... I actually wrote a wrapper over at aicsimageio for just this purpose: https://github.com/AllenCellModeling/aicsimageio/blob/main/aicsimageio/utils/dask_proxy.py

could use it here as well?

Ugh. I would have to do a bit more reading to fully understand how that proxy works, so not sure about whether this would work here. However, as aicsimageio also incorporates nd2, maybe I should just aicsimageio instead of nd2 (I'd prefer not to have that dependency though).

tlambert03 · 2021-11-10T13:47:29Z

the gist of that proxy is essentially these two lines:

    def compute(self, **kwargs: Any) -> np.ndarray:
        with self.__wrapped__._ctx_:
            return self.__wrapped__.compute(**kwargs)

    def __array__(self, dtype: str = None, **kwargs: Any) -> np.ndarray:
        with self.__wrapped__._ctx_:
            return self.__wrapped__.__array__(dtype, **kwargs)

that is, it's a dask array that, whenever you try to call compute or np.asarray, will re-open the underlying file (with self.__wrapped__._ctx_ is essentially just with ND2File()....)

It looks a tad bit risky at first, but I haven't run into any issues with it yet. In any case, I suspect the issue of trying to use a dask array after closing the file is far more common than whatever hidden issues there are with this proxy. I'm inclined to try it

VolkerH · 2021-11-10T14:00:13Z

It looks a tad bit risky at first, but I haven't run into any issues with it yet. In any case, I suspect the issue of trying to use a dask array after closing the file is far more common than whatever hidden issues there are with this proxy.

I agree.
Also, if you do try, you will currently just see a crash, most likely without any usable error message.

So if I understand you correctly, the idea would be that ND2File.to_dask() returns such an ObjectProxy instead of the dask.array?

tlambert03 · 2021-11-10T14:00:59Z

So if I understand you correctly, the idea would be that ND2File.to_dask() returns such an ObjectProxy instead of the dask.array?

yeah, exactly.

tlambert03 · 2021-11-12T19:15:04Z

I think #27 and #26 have improved the situation here. closing until we get another reproducible bug.

I tried on my ubuntu box to get the segfault you showed, but am still failing (even running it a few hundred times in case of race condition)... but let's keep an eye out for it

VolkerH · 2021-11-16T08:40:02Z

Hi @tlambert03,

didn't get around to it yesterday but installed the latest version including #27 from the main branch just now.
Unfortunately, on my machine this has not been an improvement.
The segfault issue when returning the dask array from context manager still persists.

In addition, I now also get dying kernels (probably segfault, but this is something I run in a notebook) when I run some more complex computation on the dask array that worked previously when avoiding returning from a context manager.

I will try and provide a reproducible example, but probably won't get around to it today.
For the time being I'm holding on to an older state of the repo (VolkerH@294e07a) where everything except returning from the context manager works for me.

tlambert03 · 2021-11-16T15:15:40Z

ugh 😔 i'm sorry. this is a really frustrating one.

I can't get it to segfault on any of my computers (tried mac, win, ubuntu20)... maybe it's a race condition that depends on dataset size?

It's certainly an issue though, since it looks like aicsimageio is also having CI issues with the new release. Will keep digging... but let me know if you hit on anything (and send files if necessary)

VolkerH · 2021-11-16T15:22:47Z

Will do. The newly observed segfault happens when I perform `map_blocks` on the dask array, presumably things are happening concurrently for different blocks. I hope to create a minimal example by the end of the week.

…

On Tue, 16 Nov 2021 at 16:15, Talley Lambert ***@***.***> wrote: ugh 😔 i'm sorry. this is a really frustrating one. I can't get it to segfault on any of my computers (tried mac, win, ubuntu20)... maybe it's a race condition that depends on dataset size? It's certainly an issue though, since it looks like aicsimageio is also having CI issues with the new release. Will keep digging... but let me know if you hit on anything (and send files if necessary) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#19 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQ3R7D4PXFVJIZS2AO67TLUMJYSNANCNFSM5HXM6MZA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

VolkerH · 2021-11-16T15:23:37Z

BTW, no need to be sorry and thanks for all the great support so far ! On Tue, 16 Nov 2021 at 16:22, Volker Hilsenstein < ***@***.***> wrote:

…

Will do. The newly observed segfault happens when I perform `map_blocks` on the dask array, presumably things are happening concurrently for different blocks. I hope to create a minimal example by the end of the week. On Tue, 16 Nov 2021 at 16:15, Talley Lambert ***@***.***> wrote: > ugh 😔 i'm sorry. this is a really frustrating one. > > I can't get it to segfault on any of my computers (tried mac, win, > ubuntu20)... maybe it's a race condition that depends on dataset size? > > It's certainly an issue though, since it looks like aicsimageio is also > having CI issues with the new release. Will keep digging... but let me know > if you hit on anything (and send files if necessary) > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#19 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAQ3R7D4PXFVJIZS2AO67TLUMJYSNANCNFSM5HXM6MZA> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. > >

VolkerH · 2021-11-26T16:46:19Z

Ok, I started debugging this. The problem appears to be that the pickling doesn't work correctly if there are additional operations in the compute graph. I don't have a stand-alone example yet but maybe the following will already be giving you enough information:

I have the following function that loads an nd2 and performs some re-sorting (using np.einsum) of dimensions before returning it

def load_tiles_nd2(file_path: PathLike) -> da.array:
    file_path = Path(file_path)
    if file_path.suffix.lower() != ".nd2":
        raise (ValueError, "Can only read nd2 files.")
    f = ND2File(file_path)
    tile_array = f.to_dask()
    dim_order_tuple = f.sizes.keys()
    dim_order = "".join(dim_order_tuple).lower()
    # the followig is basically np.einsum (f"{dim_order}->pcyx")
    ordered_tile_array = to_tczyx(tile_array, in_order=dim_order, out_order="pcyx")
    return ordered_tile_array

So when I use this as in

arr = load_tiles_nd2("samplefile.nd2")
arr[..., :10].compute()

there is no problem.

However, when I do

tmp = load_tiles_nd2("samplefile.nd2")
arr = pickle.loads(pickle.dumps(tmp))
arr[..., :10].compute()

I get a segmentation fault.

When I look at the dask graph of the ResourceBackedArray that hasn't been pickled looks like that:

In contrast, the dask task graph for the ResourceBackedArray that has been pickled and regenerated by unpickling looks like this:

So the layers 0 to 2 lose important attributes such as .shape and .chunksize when pickling/unpickling.

VolkerH · 2021-11-26T16:55:21Z

Arghh. Now I can't reproduce the SegFault with a simple [...,:10].compute() reliably anymore, I can only reproduce it with a more complicated map_blocks. Will try to narrow it down further, but maybe the differences in the compute graph layers are relevant.

VolkerH · 2021-11-26T17:03:53Z

So the segfault seems to depend on the slicing.

tile_arr[...,:10].compute() works. This slice covers multiple source chunks if i am not mistaken.
tiles_arr[0,0,...].compute() causes segfault, this slice only covers a single source chunk.

tlambert03 · 2021-11-26T17:06:11Z

Thanks for digging!
By the way I got a good response on SO about the general problem of adding tasks to a dask array graph if you're curious. , might try it here:

https://stackoverflow.com/a/70012999

VolkerH · 2021-11-26T17:15:36Z

Thanks, interesting, I didn't even think about graph optmiziation steps.
I sent you a screen recording of a debug session in action through the napari zulip chat. Unfortunately can't attach .webm to Github.

tlambert03 · 2022-06-06T21:02:05Z

I think this has been solved over the iterations. currently:

In [1]: import nd2

In [2]: f = nd2.ND2File('tests/data/dims_p1z5t3c2y32x32.nd2')

In [3]: d = f.to_dask()

In [4]: f.close()

In [5]: import cloudpickle

In [6]: p = cloudpickle.dumps(d)  # no problem

VolkerH mentioned this issue Nov 10, 2021

Add __getstate__ and __setstate__ methods to allow pickling. Addresses #19 #20

Merged

tlambert03 mentioned this issue Nov 10, 2021

add Dask-proxy #22

Merged

VolkerH mentioned this issue Nov 11, 2021

Hidden issues with DaskArrayProxy no longer hidden: fails to work with NEP18 dispatch mechanism, np forces compute #25

Closed

tlambert03 closed this as completed Nov 12, 2021

tlambert03 reopened this Nov 16, 2021

tlambert03 closed this as completed Jun 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issues with dask arrays returned by nd2: depend on ND2File being open, can't be pickled #19

issues with dask arrays returned by nd2: depend on ND2File being open, can't be pickled #19

VolkerH commented Nov 10, 2021 •

edited

Loading

VolkerH commented Nov 10, 2021 •

edited

Loading

tlambert03 commented Nov 10, 2021

VolkerH commented Nov 10, 2021

tlambert03 commented Nov 10, 2021

VolkerH commented Nov 10, 2021

tlambert03 commented Nov 10, 2021

tlambert03 commented Nov 12, 2021

VolkerH commented Nov 16, 2021 •

edited

Loading

tlambert03 commented Nov 16, 2021

VolkerH commented Nov 16, 2021 via email

VolkerH commented Nov 16, 2021 via email

VolkerH commented Nov 26, 2021

VolkerH commented Nov 26, 2021

VolkerH commented Nov 26, 2021 •

edited

Loading

tlambert03 commented Nov 26, 2021

VolkerH commented Nov 26, 2021

tlambert03 commented Jun 6, 2022 •

edited

Loading

issues with dask arrays returned by nd2: depend on ND2File being open, can't be pickled #19

issues with dask arrays returned by nd2: depend on ND2File being open, can't be pickled #19

Comments

VolkerH commented Nov 10, 2021 • edited Loading

keep ND2File object in open() state when working with dask array

cloudpickle

VolkerH commented Nov 10, 2021 • edited Loading

tlambert03 commented Nov 10, 2021

VolkerH commented Nov 10, 2021

tlambert03 commented Nov 10, 2021

VolkerH commented Nov 10, 2021

tlambert03 commented Nov 10, 2021

tlambert03 commented Nov 12, 2021

VolkerH commented Nov 16, 2021 • edited Loading

tlambert03 commented Nov 16, 2021

VolkerH commented Nov 16, 2021 via email

VolkerH commented Nov 16, 2021 via email

VolkerH commented Nov 26, 2021

VolkerH commented Nov 26, 2021

VolkerH commented Nov 26, 2021 • edited Loading

tlambert03 commented Nov 26, 2021

VolkerH commented Nov 26, 2021

tlambert03 commented Jun 6, 2022 • edited Loading

VolkerH commented Nov 10, 2021 •

edited

Loading

VolkerH commented Nov 10, 2021 •

edited

Loading

VolkerH commented Nov 16, 2021 •

edited

Loading

VolkerH commented Nov 26, 2021 •

edited

Loading

tlambert03 commented Jun 6, 2022 •

edited

Loading