Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: Type annotations overhaul, part 2 #291

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

crusaderky
Copy link
Contributor

Follow-up to #257, reverting leftover controversial changes re. python scalar data types (see discussion on the other PR).
@ev-br

@ev-br
Copy link
Member

ev-br commented Mar 26, 2025

The next Consortium meeting is currently scheduled for the 3rd of April. In view of #257 (comment), it'd be great if someone brings up the topic of this and gh-288. Ideally, someone with an actual understanding and a horse in the race (so not me).

@jorenham
Copy link

As I've stated in #257 (comment), this is at odds with the typing spec. This has nothing to do with the array-api itself, it's just a matter of following the rules of Python itself.

It's ok if you disagree with those, and I even understand why you would. But then again, we're not the ones that are making these rules. The appropriate platform to discuss this is at https://discuss.python.org/c/typing/32. If you decide to open a thread there, you'd find me by your side @ev-br .

@crusaderky
Copy link
Contributor Author

I fully agree that this is at odds with the typing spec, but unless you can convince @ev-br to follow it, this PR makes the whole array-api-compat and array-api-strict (data-apis/array-api-strict#135) internally coherent.
Both can be easily reverted.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Mar 27, 2025

Do you guys think it would help to make list of points at least so that this controversy can be understood by everyone? From what I understand:

I think everyone agrees that there should be consistency between the type annotations in:

  • the Array API documentation,
  • array_api_compat,
  • array_api_extras, and
  • array_api_typing,

and that type annotations should be correct, and as clear as possible.

Using int | float | complex:

  • reflects uses of Array[int64] | Array[float64] | etc. in documentation, and
  • Produces documentation that is less misleading for people who aren't used to mentally making the substitutions that type checkers do.

Using complex:

There is also the question of how to annotate something like maximum. Either as

def maximum(x: int | Array, y: int | Array):

with an understanding for human readers that x and y cannot be Boolean since it was unspecified (but type checkers will let Boolean values through) or

def maximum(x: optype.Just[int] | Array, y: optype.Just[int] | Array):

which is clear to type checkers and human readers that x and y cannot be Boolean.

Did I miss anything?

Once we have everything down, it may be possible to work around some of the concerns. For example, if documentation is the issue, we may be able to generate documentation with the types exploded, or to add a note to every affected function.

My personal suggestion is to post this to the Array API forum, and ping some of the Python typing experts (like Shantanu or Jelle).

@jorenham
Copy link

jorenham commented Mar 27, 2025

Thanks for the write-up!

Did I miss anything?

It also causes inconsistent behaviour between type-checkers: #257 (comment)

@ev-br
Copy link
Member

ev-br commented Mar 27, 2025

Using int | float | complex:

reflects uses of Array[int64] | Array[float64] | etc. in documentation, and
Produces documentation that may be easier to read for some novices.

At least one of some novices :-) has no idea what Array[int64] | Array[float64] is, but cannot help noticing that the issue is not about arrays, it's about python scalars. Which follow different promotion rules per https://data-apis.org/array-api/latest/API_specification/type_promotion.html

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Mar 27, 2025

At least one of some novices :-) has no idea what Array[int64] | Array[float64] is, but cannot help noticing that the issue is not about arrays, it's about python scalars. Which follow different promotion rules per https://data-apis.org/array-api/latest/API_specification/type_promotion.html

Didn't mean that pejoratively, and hope it didn't come across that way. I'll reword just in case. Please let me know if my edit is more acceptable.

(All I meant was that if you've read/written a lot of type annotations your mind gets used to seeing float | int where only float is written. In that sense, float | int may only be clearer for people that aren't used to doing this substitution.)

Is your point that:

  • since Python scalars appear in the promotion rules, that they should also be reflected in other parts of the documentation?
  • and/or since Python scalars appear in the promotion rules, they should also be reflected in the annotations (since the annotations are a kind of documentation)?

@jorenham
Copy link

it's about python scalars. Which follow different promotion rules per data-apis.org/array-api/latest/API_specification/type_promotion.html

np.float64 is also a python scalar, because it subclasses float. Same thing with np.complex128 <: complex. So @ev-br, are you suggesting that we should write each complex as

complex | np.complex128 | float | np.float64 | int | bool

?


Also note that the array-api promotion rules are separate from nominal subtyping relations: An int8 is promotable to int16, yet int8 is not a subtype or subclass of int16, and therefore is int8 not assignable to int16, because doing so would be type-unsafe.

@ev-br
Copy link
Member

ev-br commented Mar 27, 2025

Didn't mean that pejoratively, and hope it didn't come across that way. I'll reword just in case.

Don't worry, my comment was tongue-in-cheek. We get all too serious discussing a too serious matter.

More seriously though, I sense there's a POV difference. You seem to expect that everybody is either a typing practitioner or a novice which needs to get to speed with types.
I beg to differ.

Is your point that:

  • since Python scalars appear in the promotion rules, that they should also be reflected in other parts of the documentation?
  • and/or since Python scalars appear in the promotion rules, they should also be reflected in the annotations (since the annotations are a kind of documentation)?

Yes. As long as annotations are visible to users and developers, I believe they should be consistent and no actively misleading. Which, I believe, using int for bool | int is.

np.float64 is also a python scalar, because it subclasses float. Same thing with np.complex128 <: complex. So @ev-br, are you suggesting that we should write each complex as
complex | np.complex128 | float | np.float64 | int | bool

Note I'm only discussing array-api-* and there is no np.float64 scalar here :-). Also no int8, int16 etc. There are only arrays and python scalars.

Your point is maybe about the full-fledged numpy types. That I have no opinion other than "please tuck it somewhere where human users opt in to see them".
Then I totally trust y'all to do the perfectly right job with all the type theory, python typing spec and so on and so forth.

So maybe the answer here is similar: if indeed,

  • a readable bool | int | Array is so wrong from the type perspective it needs fixing, and
  • the fix is unreadable,

Then let's just tuck all annotations somewhere where the sun never shines for typing experts and typing machinery to read, perfect and consume.

@crusaderky
Copy link
Contributor Author

we may be able to generate documentation with the types exploded

No can do; once the typing annotations just say complex a robot won't know if that should be exploded to float | complex, int | float | complex, or bool | int | float | complex.

np.float64 is also a python scalar, because it subclasses float. Same thing with np.complex128 <: complex. So @ev-br, are you suggesting that we should write each complex as

complex | np.complex128 | float | np.float64 | int | bool

np.generic's are Array objects. The above is unneded.
numpy defines two different classes that implement the Array protocol within the same namespace, np.ndarray and np.generic, and it is something that the Array API doesn't forbid.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Mar 27, 2025

No can do; once the typing annotations just say complex a robot won't know if that should be exploded to float | complex, int | float | complex, or bool | int | float | complex.

It's Python: you can do anything you want 😄 . You can have a decorator or a method list that describes how you want it to be exploded. Anyway, in the case you list, you should completely explode it for consistency if that's what you want the docs to look like.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Mar 28, 2025

More seriously though, I sense there's a POV difference. You seem to expect that everybody is either a typing practitioner or a novice which needs to get to speed with types. I beg to differ.

I don't think anyone "needs to get to speed". I do think that a consequence of working with the Python typing system is that your intuition changes.

Maybe it would help, to just look at what it means for complex to be equivalent to complex | float | int? The benefit is that any function that accepts, say,

def pow(base: float, exp: complex) -> Any: ...
  • must support being called with integers and floats, and
  • can be more easily annotated than if we had to do def pow(base: int | float, ....

This encodes how weird it is in the Python world for such a function to only work with floats. The type system itself is nudging you to either make it work for integers and floats, or go through extreme measures to annotate this function to only work with floats.

I think this is a good policy because it means that as a caller, you shouldn't have to think too much about whether you have to call the function with pow(x, -1) or pow(x, -1.0). In the Python world, where floats work, integers usually work as well.

I understand that you feel like the above annotation for pow (simplified from the typeshed) is "actively misleading". I agree that for some users it may mislead them. But I think once you get used to type annotations in Python, you won't find these things misleading.

Then let's just tuck all annotations somewhere where the sun never shines for typing experts and typing machinery to read, perfect and consume.

Sure, but that "somewhere" is the code, right? I agree that all of the code in all of the Array API projects should be consistent. But it should also follow the way typing works in Python too, right?

I think if people on the Array API team feel that this makes the documentation too misleading for users, then I think they should explore ways of making the documentation annotations broader.

@crusaderky
Copy link
Contributor Author

This encodes how weird it is in the Python world for such a function to only work with floats. The type system itself is nudging you to either make it work for integers and floats, or go through extreme measures to annotate this function to only work with floats.

FWIW, the Array API doesn't have any use case where a function accepts float but not int, or complex but not float.
There are however cases where it accepts int but not bool, which are impossible to express to a type checker.
I am growing the opinion that it's a bad design and it should be revisited for the next iteration. Namely, sum(list[bool]) -> int and xp.sum(Array[bool]) -> Array[default int] are extremely common patterns that should be allowed. (mixed int & bool boolean logic needs to remain disallowed as it is ambiguous).

@NeilGirdhar
Copy link
Contributor

There are however cases where it accepts int but not bool, which are impossible to express to a type checker.

Would you mind clarifying for me? Which cases are these? I'm not sure I understood the summation examples.

@crusaderky
Copy link
Contributor Author

crusaderky commented Mar 28, 2025

There are however cases where it accepts int but not bool, which are impossible to express to a type checker.

Would you mind clarifying for me? Which cases are these? I'm not sure I understood the summation examples.

For example, xp.maximum is one of several Array API functions that requires its inputs to be numeric - it accepts Array[int | float] | int | float but not Array[bool] nor bool.
Because Python typing has no such thing as "and" and "not" operators, you can't exclude a subclass from the annotation: (Array[int | float] | int | float) & ~(Array[bool] |bool), which is what the Array API requires, is unrecognized syntax.

An (in my head) straightforward solution would be to just standardize bool->int type promotion:

  • in numeric binary or ternary ops (e.g. __sum__), bool is promoted to whatever int dtype is the other argument, or to the default int if the other argument is a Python scalar;
  • in numeric single-argument ops (e.g. xp.sum), bool is promoted to the default int ahead of the operation;
  • in boolean binops (e.g. __and__), it remains forbidden to mix bools and ints. This is easy achievable in typing with @overload.

This would make a lot of sense at runtime, namely by enabling the commonplace pattern xp.sum(Array[bool]) to count the number of True elements in a bool array.

The downside I see is that you would need to think through what some operations do: is Array[bool] - Array[bool] a set subtraction with output Array[bool], or does it result in an Array[int] with values (-1, 0, 1)?

@rgommers
Copy link
Member

The downside I see is that you would need to think through what some operations do, e.g. Array[bool] - Array[bool] is not a set subtraction, as it will result in an Array[int] with values (-1, 0, 1).

Exactly. Even np.sum(bool_array) is very questionable, because booleans are not numbers. tl;dr this is on purpose, Python's numerical tower - which is widely understood to have been a design mistake - doesn't make sense here. Typing also doesn't make sense here, for the same reason.

I'm fairly sure that we do not want to change this decision.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Mar 28, 2025

Thanks for explaining!

I agree that Python scalar bool should be promoted to int wherever int is allowed. I don't like the promotion of Array[bool] to Array[int8] or Array[uint8] though. In other words, I think they should add a solid line between bool and int here. (Incidentally, the line between float and complex should probably be solid, since I don't think it can overflow.)

Yes, it would have been better had Python not been designed with this parent class, but this is unlikely to change.

@ev-br
Copy link
Member

ev-br commented Mar 28, 2025

To add to what already said:

it would have been better had Python not been designed with this parent class, but this is unlikely to change.

The problem is not in Python itself, it's with static type checkers.

Sure, but that "somewhere" is the code, right?

Absolutely. Just not in the function signature which is the most visible place of all.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Mar 28, 2025

The problem is not in Python itself, it's with static type checkers.

In the last few comments, we're talking about bool being a subclass of int. That has nothing to do with static type checkers. bool is a subclass of int in Python, and that is unlikely to change. (I've tried!) Type checkers are merely obeying the inheritance relationship.

Absolutely. Just not in the function signature which is the most visible place of all.

Why not? Who's going to look at the function signature except developers who understand type annotations?

@crusaderky
Copy link
Contributor Author

In the last few comments, we're talking about bool being a subclass of int. That has nothing to do with static type checkers. bool is a subclass of int in Python, and that is unlikely to change. (I've tried!) Type checkers are merely obeying the inheritance relationship.

Type checkers don't have to obey the runtime's inheritance relationship. They are already disregarding it for int < float < complex.

@NeilGirdhar
Copy link
Contributor

Type checkers don't have to obey the runtime's inheritance relationship. They are already disregarding it for int < float < complex.

I think there's a big difference between pretending that there's relationship, and forgetting about one.

There has been a prevailing opinion among type checker authors that legal code should be accepted. This informed a lot of debates. For example, when MyPy started accepting things that might implement the sequence protocol as iterable (despite having no __iter__ method).

Since bool can always be used as an int in the runtime, type checkers are unlikely to block their usage as such (see this giant debate about simply blocking ~False). That would create false positives.

On the other hand, pretending that float is a superclass of int doesn't create (many?) false positives. It may create some false negatives.

If you really want type checkers to change and start blocking the usage of bool as int, then please feel free to post a thread. I'm on your side on this issue, but from my observations, the core developers are unlikely to go for it.

@ev-br
Copy link
Member

ev-br commented Mar 28, 2025

Who's going to look at the function signature except developers who understand type annotations?

Anybody. As already stated above, I do not buy the idea that "understand type annotations" is a prerequisite.

we're not the ones that are making these rules. The appropriate platform to discuss this is at https://discuss.python.org/c/typing/32. If you decide to open a thread there, you'd find me by your side @ev-br .

Thanks for the suggestion Joren! I believe the best chance for an upstream discussion to be productive is if it starts based on a deliberation by the Array API Consortium.


This thread starts getting long and repetitive. I'm going to recap my position (the majority of points were also made by multiple people here), and take a step back from the thread:

  • the fact that internally python bool is a subclass of int is an implementation detail;
  • as long as the annotation x: int implies that a bool scalar is a valid input, the following is wrong (Guido's example):
def maximum(x: int | Array, y: int | Array):
   ....

This is because True is not a valid input to xp.maximum according to the Array API promotion rules;

  • we should not place wrong/misleading information into function signatures;
  • long term, all type annotations should be shared across all array api projects, in a form of common collection of type stubs or something (not in individual function signatures);
  • until then, we keep using explicit bool | int | float | complex | Array,
  • unless there's an explicit decision to the contrary by the Array API Consortium.

In case it is not obvious, Consortium meetings are open to all interested parties. Please bring the topic in one of the Consortium meetings.

@crusaderky
Copy link
Contributor Author

  • until then, we keep using explicit bool | int | float | complex | Array,
  • unless there's an explicit decision to the contrary by the Array API Consortium.

Given this, can we merge this PR as well as data-apis/array-api-strict#135? It's much easier to revert the relevant bits at a later date if the consortium decides that the short form is the way to go, than to keep these PRs open indefinitely.

@ev-br
Copy link
Member

ev-br commented Mar 28, 2025

can we merge this PR as well as data-apis/array-api-strict#135?

The main thing that keeps me from landing these is #257 (comment)
Could you please coordinate with @jorenham between this PR and #288 ?

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Mar 28, 2025

@ev-br Thanks for clarifying your position. I've added your point about maximum to the summary I made earlier in the thread.

Edit: Just FYI:

the fact that internally python bool is a subclass of int is an implementation detail;

It is not an implementation detail; bool is a subclass of int by design. All Python implementations must implement this inheritance hierarchy.

@jorenham
Copy link

jorenham commented Mar 28, 2025

There are however cases where it accepts int but not bool, which are impossible to express to a type checker.

Unless you use optype.JustInt, or equivalently, optype.Just[int] :)

import optype as op

def assert_int(x: op.Just[int]) -> int:
    assert type(x) is int
    return x

assert_int(42)     # accepted
assert_int(False)  # rejected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants