TYP: Type annotations overhaul, part 2 #291

crusaderky · 2025-03-26T16:42:38Z

Follow-up to #257, reverting leftover controversial changes re. python scalar data types (see discussion on the other PR).
@ev-br

ev-br · 2025-03-26T16:52:26Z

The next Consortium meeting is currently scheduled for the 3rd of April. In view of #257 (comment), it'd be great if someone brings up the topic of this and gh-288. Ideally, someone with an actual understanding and a horse in the race (so not me).

jorenham · 2025-03-26T19:09:46Z

As I've stated in #257 (comment), this is at odds with the typing spec. This has nothing to do with the array-api itself, it's just a matter of following the rules of Python itself.

It's ok if you disagree with those, and I even understand why you would. But then again, we're not the ones that are making these rules. The appropriate platform to discuss this is at https://discuss.python.org/c/typing/32. If you decide to open a thread there, you'd find me by your side @ev-br .

crusaderky · 2025-03-27T13:30:41Z

I fully agree that this is at odds with the typing spec, but unless you can convince @ev-br to follow it, this PR makes the whole array-api-compat and array-api-strict (data-apis/array-api-strict#135) internally coherent.
Both can be easily reverted.

NeilGirdhar · 2025-03-27T13:53:17Z

Do you guys think it would help to make list of points at least so that this controversy can be understood by everyone? From what I understand:

I think everyone agrees that there should be consistency between the type annotations in:

the Array API documentation,
array_api_compat,
array_api_extras, and
array_api_typing,

and that type annotations should be correct, and as clear as possible.

Using int | float | complex:

reflects uses of Array[int64] | Array[float64] | etc. in documentation, and
Produces documentation that is less misleading for people who aren't used to mentally making the substitutions that type checkers do.

Using complex:

Obeys the typing spec in Python,
Produces less complex type checking errors,
Is faster to type check, and
Causes inconsistent behaviour between type-checkers: TYP: Type annotations overhaul, part 1 #257 (comment)

There is also the question of how to annotate something like maximum. Either as

def maximum(x: int | Array, y: int | Array):

with an understanding for human readers that x and y cannot be Boolean since it was unspecified (but type checkers will let Boolean values through) or

def maximum(x: optype.Just[int] | Array, y: optype.Just[int] | Array):

which is clear to type checkers and human readers that x and y cannot be Boolean.

Did I miss anything?

Once we have everything down, it may be possible to work around some of the concerns. For example, if documentation is the issue, we may be able to generate documentation with the types exploded, or to add a note to every affected function.

My personal suggestion is to post this to the Array API forum, and ping some of the Python typing experts (like Shantanu or Jelle).

jorenham · 2025-03-27T14:09:51Z

Thanks for the write-up!

Did I miss anything?

It also causes inconsistent behaviour between type-checkers: #257 (comment)

ev-br · 2025-03-27T14:29:33Z

Using int | float | complex:

reflects uses of Array[int64] | Array[float64] | etc. in documentation, and
Produces documentation that may be easier to read for some novices.

At least one of some novices :-) has no idea what Array[int64] | Array[float64] is, but cannot help noticing that the issue is not about arrays, it's about python scalars. Which follow different promotion rules per https://data-apis.org/array-api/latest/API_specification/type_promotion.html

NeilGirdhar · 2025-03-27T14:42:42Z

At least one of some novices :-) has no idea what Array[int64] | Array[float64] is, but cannot help noticing that the issue is not about arrays, it's about python scalars. Which follow different promotion rules per https://data-apis.org/array-api/latest/API_specification/type_promotion.html

Didn't mean that pejoratively, and hope it didn't come across that way. I'll reword just in case. Please let me know if my edit is more acceptable.

(All I meant was that if you've read/written a lot of type annotations your mind gets used to seeing float | int where only float is written. In that sense, float | int may only be clearer for people that aren't used to doing this substitution.)

Is your point that:

since Python scalars appear in the promotion rules, that they should also be reflected in other parts of the documentation?
and/or since Python scalars appear in the promotion rules, they should also be reflected in the annotations (since the annotations are a kind of documentation)?

jorenham · 2025-03-27T14:59:21Z

it's about python scalars. Which follow different promotion rules per data-apis.org/array-api/latest/API_specification/type_promotion.html

np.float64 is also a python scalar, because it subclasses float. Same thing with np.complex128 <: complex. So @ev-br, are you suggesting that we should write each complex as

complex | np.complex128 | float | np.float64 | int | bool

?

Also note that the array-api promotion rules are separate from nominal subtyping relations: An int8 is promotable to int16, yet int8 is not a subtype or subclass of int16, and therefore is int8 not assignable to int16, because doing so would be type-unsafe.

ev-br · 2025-03-27T15:33:08Z

Didn't mean that pejoratively, and hope it didn't come across that way. I'll reword just in case.

Don't worry, my comment was tongue-in-cheek. We get all too serious discussing a too serious matter.

More seriously though, I sense there's a POV difference. You seem to expect that everybody is either a typing practitioner or a novice which needs to get to speed with types.
I beg to differ.

Is your point that:

since Python scalars appear in the promotion rules, that they should also be reflected in other parts of the documentation?

and/or since Python scalars appear in the promotion rules, they should also be reflected in the annotations (since the annotations are a kind of documentation)?

Yes. As long as annotations are visible to users and developers, I believe they should be consistent and no actively misleading. Which, I believe, using int for bool | int is.

np.float64 is also a python scalar, because it subclasses float. Same thing with np.complex128 <: complex. So @ev-br, are you suggesting that we should write each complex as
complex | np.complex128 | float | np.float64 | int | bool

Note I'm only discussing array-api-* and there is no np.float64 scalar here :-). Also no int8, int16 etc. There are only arrays and python scalars.

Your point is maybe about the full-fledged numpy types. That I have no opinion other than "please tuck it somewhere where human users opt in to see them".
Then I totally trust y'all to do the perfectly right job with all the type theory, python typing spec and so on and so forth.

So maybe the answer here is similar: if indeed,

a readable bool | int | Array is so wrong from the type perspective it needs fixing, and
the fix is unreadable,

Then let's just tuck all annotations somewhere ~~where the sun never shines~~ for typing experts and typing machinery to read, perfect and consume.

crusaderky · 2025-03-27T17:28:38Z

we may be able to generate documentation with the types exploded

np.float64 is also a python scalar, because it subclasses float. Same thing with np.complex128 <: complex. So @ev-br, are you suggesting that we should write each complex as
complex | np.complex128 | float | np.float64 | int | bool

np.generic's are Array objects. The above is unneded.
numpy defines two different classes that implement the Array protocol within the same namespace, np.ndarray and np.generic, and it is something that the Array API doesn't forbid.

NeilGirdhar · 2025-03-27T21:54:57Z

No can do; once the typing annotations just say complex a robot won't know if that should be exploded to float | complex, int | float | complex, or bool | int | float | complex.

It's Python: you can do anything you want 😄 . You can have a decorator or a method list that describes how you want it to be exploded. Anyway, in the case you list, you should completely explode it for consistency if that's what you want the docs to look like.

NeilGirdhar · 2025-03-28T06:17:24Z

More seriously though, I sense there's a POV difference. You seem to expect that everybody is either a typing practitioner or a novice which needs to get to speed with types. I beg to differ.

I don't think anyone "needs to get to speed". I do think that a consequence of working with the Python typing system is that your intuition changes.

Maybe it would help, to just look at what it means for complex to be equivalent to complex | float | int? The benefit is that any function that accepts, say,

def pow(base: float, exp: complex) -> Any: ...

must support being called with integers and floats, and
can be more easily annotated than if we had to do def pow(base: int | float, ....

This encodes how weird it is in the Python world for such a function to only work with floats. The type system itself is nudging you to either make it work for integers and floats, or go through extreme measures to annotate this function to only work with floats.

I think this is a good policy because it means that as a caller, you shouldn't have to think too much about whether you have to call the function with pow(x, -1) or pow(x, -1.0). In the Python world, where floats work, integers usually work as well.

I understand that you feel like the above annotation for pow (simplified from the typeshed) is "actively misleading". I agree that for some users it may mislead them. But I think once you get used to type annotations in Python, you won't find these things misleading.

Then let's just tuck all annotations somewhere where the sun never shines for typing experts and typing machinery to read, perfect and consume.

Sure, but that "somewhere" is the code, right? I agree that all of the code in all of the Array API projects should be consistent. But it should also follow the way typing works in Python too, right?

I think if people on the Array API team feel that this makes the documentation too misleading for users, then I think they should explore ways of making the documentation annotations broader.

crusaderky · 2025-03-28T08:40:10Z

This encodes how weird it is in the Python world for such a function to only work with floats. The type system itself is nudging you to either make it work for integers and floats, or go through extreme measures to annotate this function to only work with floats.

FWIW, the Array API doesn't have any use case where a function accepts float but not int, or complex but not float.
There are however cases where it accepts int but not bool, which are impossible to express to a type checker.
I am growing the opinion that it's a bad design and it should be revisited for the next iteration. Namely, sum(list[bool]) -> int and xp.sum(Array[bool]) -> Array[default int] are extremely common patterns that should be allowed. (mixed int & bool boolean logic needs to remain disallowed as it is ambiguous).

NeilGirdhar · 2025-03-28T08:49:34Z

There are however cases where it accepts int but not bool, which are impossible to express to a type checker.

Would you mind clarifying for me? Which cases are these? I'm not sure I understood the summation examples.

crusaderky · 2025-03-28T09:49:49Z

There are however cases where it accepts int but not bool, which are impossible to express to a type checker.

Would you mind clarifying for me? Which cases are these? I'm not sure I understood the summation examples.

For example, xp.maximum is one of several Array API functions that requires its inputs to be numeric - it accepts Array[int | float] | int | float but not Array[bool] nor bool.
Because Python typing has no such thing as "and" and "not" operators, you can't exclude a subclass from the annotation: (Array[int | float] | int | float) & ~(Array[bool] |bool), which is what the Array API requires, is unrecognized syntax.

An (in my head) straightforward solution would be to just standardize bool->int type promotion:

in numeric binary or ternary ops (e.g. __sum__), bool is promoted to whatever int dtype is the other argument, or to the default int if the other argument is a Python scalar;
in numeric single-argument ops (e.g. xp.sum), bool is promoted to the default int ahead of the operation;
in boolean binops (e.g. __and__), it remains forbidden to mix bools and ints. This is easy achievable in typing with @overload.

This would make a lot of sense at runtime, namely by enabling the commonplace pattern xp.sum(Array[bool]) to count the number of True elements in a bool array.

The downside I see is that you would need to think through what some operations do: is Array[bool] - Array[bool] a set subtraction with output Array[bool], or does it result in an Array[int] with values (-1, 0, 1)?

rgommers · 2025-03-28T09:54:05Z

The downside I see is that you would need to think through what some operations do, e.g. Array[bool] - Array[bool] is not a set subtraction, as it will result in an Array[int] with values (-1, 0, 1).

Exactly. Even np.sum(bool_array) is very questionable, because booleans are not numbers. tl;dr this is on purpose, Python's numerical tower - which is widely understood to have been a design mistake - doesn't make sense here. Typing also doesn't make sense here, for the same reason.

I'm fairly sure that we do not want to change this decision.

NeilGirdhar · 2025-03-28T09:54:26Z

Thanks for explaining!

I agree that Python scalar bool should be promoted to int wherever int is allowed. I don't like the promotion of Array[bool] to Array[int8] or Array[uint8] though. In other words, I think they should add a solid line between bool and int here. (Incidentally, the line between float and complex should probably be solid, since I don't think it can overflow.)

Yes, it would have been better had Python not been designed with this parent class, but this is unlikely to change.

ev-br · 2025-03-28T10:44:10Z

To add to what already said:

it would have been better had Python not been designed with this parent class, but this is unlikely to change.

The problem is not in Python itself, it's with static type checkers.

Sure, but that "somewhere" is the code, right?

Absolutely. Just not in the function signature which is the most visible place of all.

NeilGirdhar · 2025-03-28T10:52:11Z

The problem is not in Python itself, it's with static type checkers.

In the last few comments, we're talking about bool being a subclass of int. That has nothing to do with static type checkers. bool is a subclass of int in Python, and that is unlikely to change. (I've tried!) Type checkers are merely obeying the inheritance relationship.

Absolutely. Just not in the function signature which is the most visible place of all.

Why not? Who's going to look at the function signature except developers who understand type annotations?

crusaderky · 2025-03-28T11:08:05Z

In the last few comments, we're talking about bool being a subclass of int. That has nothing to do with static type checkers. bool is a subclass of int in Python, and that is unlikely to change. (I've tried!) Type checkers are merely obeying the inheritance relationship.

Type checkers don't have to obey the runtime's inheritance relationship. They are already disregarding it for int < float < complex.

NeilGirdhar · 2025-03-28T11:16:05Z

Type checkers don't have to obey the runtime's inheritance relationship. They are already disregarding it for int < float < complex.

I think there's a big difference between pretending that there's relationship, and forgetting about one.

There has been a prevailing opinion among type checker authors that legal code should be accepted. This informed a lot of debates. For example, when MyPy started accepting things that might implement the sequence protocol as iterable (despite having no __iter__ method).

Since bool can always be used as an int in the runtime, type checkers are unlikely to block their usage as such (see this giant debate about simply blocking ~False). That would create false positives.

On the other hand, pretending that float is a superclass of int doesn't create (many?) false positives. It may create some false negatives.

If you really want type checkers to change and start blocking the usage of bool as int, then please feel free to post a thread. I'm on your side on this issue, but from my observations, the core developers are unlikely to go for it.

ev-br · 2025-03-28T11:51:23Z

Who's going to look at the function signature except developers who understand type annotations?

Anybody. As already stated above, I do not buy the idea that "understand type annotations" is a prerequisite.

we're not the ones that are making these rules. The appropriate platform to discuss this is at https://discuss.python.org/c/typing/32. If you decide to open a thread there, you'd find me by your side @ev-br .

Thanks for the suggestion Joren! I believe the best chance for an upstream discussion to be productive is if it starts based on a deliberation by the Array API Consortium.

This thread starts getting long and repetitive. I'm going to recap my position (the majority of points were also made by multiple people here), and take a step back from the thread:

the fact that internally python bool is a subclass of int is an implementation detail;
as long as the annotation x: int implies that a bool scalar is a valid input, the following is wrong (Guido's example):

def maximum(x: int | Array, y: int | Array):
   ....

This is because True is not a valid input to xp.maximum according to the Array API promotion rules;

we should not place wrong/misleading information into function signatures;
long term, all type annotations should be shared across all array api projects, in a form of common collection of type stubs or something (not in individual function signatures);
until then, we keep using explicit bool | int | float | complex | Array,
unless there's an explicit decision to the contrary by the Array API Consortium.

In case it is not obvious, Consortium meetings are open to all interested parties. Please bring the topic in one of the Consortium meetings.

crusaderky · 2025-03-28T13:07:04Z

until then, we keep using explicit bool | int | float | complex | Array,

unless there's an explicit decision to the contrary by the Array API Consortium.

Given this, can we merge this PR as well as data-apis/array-api-strict#135? It's much easier to revert the relevant bits at a later date if the consortium decides that the short form is the way to go, than to keep these PRs open indefinitely.

ev-br · 2025-03-28T13:21:29Z

can we merge this PR as well as data-apis/array-api-strict#135?

The main thing that keeps me from landing these is #257 (comment)
Could you please coordinate with @jorenham between this PR and #288 ?

NeilGirdhar · 2025-03-28T13:33:30Z

@ev-br Thanks for clarifying your position. I've added your point about maximum to the summary I made earlier in the thread.

Edit: Just FYI:

the fact that internally python bool is a subclass of int is an implementation detail;

It is not an implementation detail; bool is a subclass of int by design. All Python implementations must implement this inheritance hierarchy.

jorenham · 2025-03-28T15:03:04Z

There are however cases where it accepts int but not bool, which are impossible to express to a type checker.

Unless you use optype.JustInt, or equivalently, optype.Just[int] :)

import optype as op

def assert_int(x: op.Just[int]) -> int:
    assert type(x) is int
    return x

assert_int(42)     # accepted
assert_int(False)  # rejected

lucascolley

merging data-apis/array-api-strict#135 (comment), thanks Guido. CI failures observed elsewhere

jorenham · 2025-03-31T23:55:38Z

I tried doing these replacements in scipy-stubs, and the results were somewhat unexpected (not even clickbait):
scipy/scipy-stubs#492

It required me to globally disable the flake8-pyi rule PYI041 in ruff, where it's also explained why this is not a good idea: https://docs.astral.sh/ruff/rules/redundant-numeric-union/

The type-checker slowdown isn't as bad as I had expected; running basedpyright takes ~5.75s before the change, and ~5.95s after. For mypy the slowdown was also approx. 200ms.

Before this change, mypy and (based)pyright did not report any errors. For (based)pyright, this was still the case after this change. But mypy now reports 17 errors. Two of those are failing type-tests, showing that this has (negatively) affected mypy's inference capabilities.

I also quickly tried doing this in the numpy stubs of https://github.com/numpy/numtype, but I gave up because the vscode mypy plugin kept crashing, and at the same time the basedpyright LSP was turning my laptop into a jet engine.

Static type-checkers have a lot more trouble to type-check the numpy-stubs than scipy-stubs. Running (based)pyright costs 20-40s, and mypy can take up to one minute (depending on the cache). I'm not sure why exactly this is, but it probably has to do with the complexity of the generic overloaded methods that are used.

Anyway, this suggests that the type-checker slowdown isn't a constant one. Given the constraint solving algorithms that type-checkers use, I wouldn't be surprised if the slowdown is exponential.

TYP: Type annotations overhaul, part 2

df8eb9c

lucascolley approved these changes Mar 31, 2025

View reviewed changes

lucascolley merged commit b2af137 into data-apis:main Mar 31, 2025
36 of 48 checks passed

crusaderky deleted the typing2 branch April 1, 2025 08:26

NeilGirdhar pushed a commit to NeilGirdhar/array-api-compat that referenced this pull request Apr 2, 2025

TYP: Type annotations overhaul, part 2 (data-apis#291)

2119a99

jorenham mentioned this pull request Apr 17, 2025

TYP: Type annotations overhaul, episode 2 #288

Merged

11 tasks

jorenham mentioned this pull request Apr 24, 2025

TYP: simplify redundant unions of builtin scalar types numpy/numpy#28819

Merged

TYP: Type annotations overhaul, part 2 #291

TYP: Type annotations overhaul, part 2 #291

Uh oh!

Conversation

crusaderky commented Mar 26, 2025

Uh oh!

ev-br commented Mar 26, 2025

Uh oh!

jorenham commented Mar 26, 2025

Uh oh!

crusaderky commented Mar 27, 2025

Uh oh!

NeilGirdhar commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorenham commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ev-br commented Mar 27, 2025

Uh oh!

NeilGirdhar commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorenham commented Mar 27, 2025

Uh oh!

ev-br commented Mar 27, 2025

Uh oh!

crusaderky commented Mar 27, 2025

Uh oh!

NeilGirdhar commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NeilGirdhar commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusaderky commented Mar 28, 2025

Uh oh!

NeilGirdhar commented Mar 28, 2025

Uh oh!

crusaderky commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgommers commented Mar 28, 2025

Uh oh!

NeilGirdhar commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ev-br commented Mar 28, 2025

Uh oh!

NeilGirdhar commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusaderky commented Mar 28, 2025

Uh oh!

NeilGirdhar commented Mar 28, 2025

Uh oh!

ev-br commented Mar 28, 2025

Uh oh!

crusaderky commented Mar 28, 2025

Uh oh!

ev-br commented Mar 28, 2025

Uh oh!

NeilGirdhar commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorenham commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucascolley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jorenham commented Mar 31, 2025

Uh oh!

Uh oh!

NeilGirdhar commented Mar 27, 2025 •

edited

Loading

jorenham commented Mar 27, 2025 •

edited

Loading

NeilGirdhar commented Mar 27, 2025 •

edited

Loading

NeilGirdhar commented Mar 27, 2025 •

edited

Loading

NeilGirdhar commented Mar 28, 2025 •

edited

Loading

crusaderky commented Mar 28, 2025 •

edited

Loading

NeilGirdhar commented Mar 28, 2025 •

edited

Loading

NeilGirdhar commented Mar 28, 2025 •

edited

Loading

NeilGirdhar commented Mar 28, 2025 •

edited

Loading

jorenham commented Mar 28, 2025 •

edited

Loading