Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explicitly cast the dtype of where's condition parameter to bool #10087

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Mar 2, 2025

(this is necessary because array-api-strict raises an error otherwise)

@@ -373,7 +373,7 @@ def sum_where(data, axis=None, dtype=None, where=None):
def where(condition, x, y):
"""Three argument where() with better dtype promotion rules."""
xp = get_array_namespace(condition, x, y)
return xp.where(condition, *as_shared_dtype([x, y], xp=xp))
return xp.where(xp.astype(condition, xp.bool), *as_shared_dtype([x, y], xp=xp))
Copy link
Collaborator Author

@keewis keewis Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it is not that simple... at some point we're trying to call

duck_array_ops.where(True, np.arange(5), np.float64(np.nan))

where numpy implements astype by forwarding to the ndarray method (and our own wrapper does the same).

To resolve this, we probably need to make cond a 0d array:

Suggested change
return xp.where(xp.astype(condition, xp.bool), *as_shared_dtype([x, y], xp=xp))
try:
cond_ = astype(condition, xp.bool)
except AttributeError:
cond_ = xp.asarray(condition, dtype=xp.bool)
return xp.where(cond_, *as_shared_dtype([x, y], xp=xp))

however, I'm not sure I like the way this looks. Another option would be to make astype work for this case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @keewis ! I got stuck on the same thing. What I did:

# Check array type
    if isinstance(condition, np.ndarray):
        condition = condition.astype(dtype=np.bool)

    elif isinstance(condition, xp_strict._array_object.Array):
        condition = xp_strict.asarray(condition, dtype=xp_strict.bool)
    else:
        raise TypeError("Unsupported array type")

    xp = get_array_namespace(condition, x, y)
    return xp.where(condition, *as_shared_dtype([x, y], xp=xp))

This passes test_array_api but test_computation fails with a bunch of TypeErrors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, that gave me some ideas on how to possibly fix this in a cleaner way (and I also found a potential bug in the groupby code)

@@ -524,7 +524,7 @@ def factorize(self) -> EncodedGroups:
# Restore these after the raveling
broadcasted_masks = broadcast(*masks)
mask = functools.reduce(np.logical_or, broadcasted_masks) # type: ignore[arg-type]
_flatcodes = where(mask, -1, _flatcodes)
_flatcodes = where(mask.data, -1, _flatcodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

dcherian added a commit to dcherian/xarray that referenced this pull request Mar 7, 2025
dcherian added a commit that referenced this pull request Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants