Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hidden issues with DaskArrayProxy no longer hidden: fails to work with NEP18 dispatch mechanism, np forces compute #25

Closed
VolkerH opened this issue Nov 11, 2021 · 4 comments

Comments

@VolkerH
Copy link
Contributor

VolkerH commented Nov 11, 2021

  • nd2 version: main
  • Python version: 3.9
  • Operating System:

Description

In this #19 (comment), @tlambert03 wrote:

that is, it's a dask array that, whenever you try to call compute or np.asarray, will re-open the underlying file (with self.wrapped.ctx is essentially just with ND2File()....)

It looks a tad bit risky at first, but I haven't run into any issues with it yet. In any case, I suspect the issue of trying to use a dask array after closing the file is far more common than whatever hidden issues there are with this proxy. I'm inclined to try it

The hidden issues are coming out of hiding.
Where the NEP-18 mechanism would dispatch the dask array method corresponding to a numpy method when passing a dask array to the numpy method, this no longer works with the DaskArrayProxy. This triggers a compute() on the array underlying the proxy where no compute() would have happened on a non-proxied array. In my case (large array) that kills the Linux kernel.

To reproduce (here I use a 4d nd2-file):

test_nd2.py

from nd2 import ND2File

import numpy as np
import dask.array as da
dataset_nd2 = "/home/hilsenst/Documents/Luisa_Reference_HT/PreMaldi/Seq0000.nd2"


def test_nd2_dask_einsum():
    f = ND2File(dataset_nd2)
    arr = f.to_dask()
    print(f"Array shape {arr.shape}")
    reordered_dask = da.einsum('abcd->abcd', arr)
    print(reordered_dask[:1,:1,:1,:1].compute())


def test_synthetic_dask_einsum_via_nep18():
    arr = da.zeros([1000,1000,100,100])
    print(f"Array shape {arr.shape}")
    reordered_nep18 = np.einsum('abcd->abcd', arr)
    print(type(reordered_nep18))
    print(reordered_nep18[:1,:1,:1,:1].compute())


def test_nd2_dask_einsum_via_nep18_small():
    f = ND2File(dataset_nd2)
    arr = f.to_dask()
    arr = arr[:10,:10,:10,:10]
    print(f"Array shape {arr.shape}")
    print(f"arr has type {type(arr)}")
    reordered_nep18 = np.einsum('abcd->abcd', arr)
    print(type(reordered_nep18))
    print(reordered_nep18[:1,:1,:1,:1].compute())


def test_nd2_dask_einsum_via_nep18():
    f = ND2File(dataset_nd2)
    arr = f.to_dask()
    print(f"Array shape {arr.shape}")
    reordered_nep18 = np.einsum('abcd->abcd', arr)
    print(type(reordered_nep18))
    print(reordered_nep18[:1,:1,:1,:1].compute())

Running these tests shows the problem

(napari_latest) hilsenst@itservices-XPS-15-9500:~/GitlabEMBL/spacem-ht/src/spacem-mosaic$ pytest tests/test_nd2.py  --capture=no
=========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.9.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
PyQt5 5.15.4 -- Qt runtime 5.15.2 -- Qt compiled 5.15.2
rootdir: /home/hilsenst/GitlabEMBL/spacem-ht/src/spacem-mosaic
plugins: order-1.0.0, napari-0.4.11, timeout-1.4.2, anyio-3.3.0, napari-plugin-engine-0.1.9, qt-4.0.2, hypothesis-6.14.4
collected 4 items                                                                                                                                                                                         

tests/test_nd2.py Array shape (734, 2, 2060, 2044)
[[[[96]]]]
.Array shape (1000, 1000, 100, 100)
<class 'dask.array.core.Array'>
[[[[0.]]]]
.Array shape (10, 2, 10, 10)
arr has type <class 'nd2._dask_proxy.DaskArrayProxy'>
<class 'numpy.ndarray'>
FArray shape (734, 2, 2060, 2044)
Killed

For me, the convenience of using NEP-18 dispatch almost outweighs the problem of a few open file handles without the array proxy.
I guess the chances to get numpy to support ObjectProxies with NEP18 as well are fairly slim.

@tlambert03
Copy link
Owner

tlambert03 commented Nov 11, 2021

Challenge accepted! :). Thanks this is exactly what I was looking to find. Your tests are very helpful. Worse case scenario I can make returning the proxy an optional parameter

@VolkerH
Copy link
Contributor Author

VolkerH commented Nov 11, 2021

Worse case scenario I can make returning the proxy an optional parameter

I think that may be useful in any scenario.

@tlambert03
Copy link
Owner

I think i've got a good solution, will push soon

@tlambert03
Copy link
Owner

I think #26 fixes this. but of course, feel free to reopen if you find more numpy incompatibilities!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants