Skip to content

Usage: confused by np.where behaviour and fill_value #871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dcherian opened this issue May 15, 2025 · 4 comments
Open

Usage: confused by np.where behaviour and fill_value #871

dcherian opened this issue May 15, 2025 · 4 comments

Comments

@dcherian
Copy link

dcherian commented May 15, 2025

Please provide a description of what you'd like to do.

More dependable fill_value preservation in where

Example Code

import sparse
import numpy as np

array = sparse.COO.from_numpy(np.eye(3), fill_value=0)
assert np.where(array == 1, np.nan, array).fill_value == 0  # passes
assert np.where(array < 1, np.nan, array).fill_value == 0 # fails, result fill_value is np.nan

For mask=array==1, mask.fill_value=False and the array's fill_value is preserved in the np.where call.
For mask=array<1, mask.fill_value=True and the array's fill_value is NOT preserved in the np.where call.

This kind of behaviour is hard to rely on in a library. Is it a bug?

@dcherian dcherian added the usage Usage question label May 15, 2025
@dcherian
Copy link
Author

OK I understand what's happening now. np.where is basically applied to mask.fill_value , and the .data are filled appropriately.

It would be good to call this behaviour out in the docstring

@hameerabbasi
Copy link
Collaborator

hameerabbasi commented May 15, 2025

In general, out_fv = elemwise_fn(*in_fvs). So if array.fill_value = 0, and for a scalar, it is its own fill value.

(a.fill_value == 1) = False = (array == 1).fill_value
np.where(False, np.nan, a.fill_value) = np.nan = np.where((array == 1), np.nan, a).fill_value


(a.fill_value < 1) = True = (array < 1).fill_value
np.where(True np.nan, a.fill_value) = 0 = np.where((array < 1), np.nan, a).fill_value

The reasoning is the following: fill_values propagate through elemwise because we expect elemwise fns to produce mostly fill_values. If it was otherwise, we'd need to densify.

@dcherian
Copy link
Author

fill_values propagate through elemwise because we expect elemwise fns to produce mostly fill_values. If it was otherwise, we'd need to densify.

Got it. thanks! Feel free to close if you don't think the docstring needs to be updated.

@hameerabbasi hameerabbasi added documentation and removed usage Usage question labels May 15, 2025
@prady0t
Copy link
Contributor

prady0t commented May 27, 2025

If we still want to add this in the docstring, we can do so somewhere around here. We can also take a much simpler example to explain how fill_value is handled for element wise operations. Here's an example with + operation :

a = COO.from_numpy(np.eye(3), fill_value=1)
b = COO.from_numpy(np.eye(3), fill_value=2)

c = a + b
print(c.fill_value)  # 1.0 + 2.0 = 3.0

Mentioning "Since out_fv = elemwise_fn(*in_fvs) we only have to calculate fill_value once and hence sparse calculations saves memory".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants