Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag to allow more flexible variable redefinition #18727

Open
wants to merge 66 commits into
base: master
Choose a base branch
from

Conversation

JukkaL
Copy link
Collaborator

@JukkaL JukkaL commented Feb 24, 2025

Infer union types for simple variables from multiple assignments, if the variable isn't annotated. The feature is enabled via --allow-redefinition-new. --local-partial-types must also be enabled.

This is still experimental and has known issues, so it's not documented anywhere. It works well enough that it can be used for non-trivial experimentation, however.

Closes #6233. Closes #6232. Closes #18568. Fixes #18619.

In this example, the type of x is inferred as int | str when using the new behavior:

def f(i: int, s : str) -> int | str:
    if i > 5:
        x = i
    else:
        x = s  # No longer an error
    reveal_type(x)  # int | str
    return s

Here is a summary of how it works:

  • Assignment widens the inferred type of a variable and always narrows (when there is no annotation).
  • Simple variable lvalues are put into the binder on initial assignment when using the new feature. We need to be able to track whether a variable is defined or not to infer correct types (see Binder loses narrowed type of a variable if variable may be uninitialized #18619).
  • Assignment of None values are no longer special, and we don't use partial None if the feature is enabled for simple variables.
  • Lvalues other than simple variables (e.g. self.x) continue to work as in the past. Attribute types can't be widened, since they are externally visible and widening could cause confusion, but this is something we might relax in the future. Globals can be widened, however. This seems necessary for consistency.
  • If a loop body widens a variable type, we have to analyze the body again. However, we only do one extra pass, since the inferred type could be expanded without bound (consider x = 0 outside loop and x = [x] within the loop body).
  • We first infer the type of an rvalue without using the lvalue type as context, as otherwise the type context would often prevent redefinition. If the rvalue type isn't valid for inference (e.g. list item type can't be inferred), we fall back to the lvalue type context.

There are some other known bugs and limitations:

  • Annotated variables can't be freely redefined (but they can still be narrowed, of course). I may want to relax this in the future, but I'm not sure yet.
  • If there is a function definition between assignments to a variable, the inferred types may be incorrect.
  • There are few tests for nonlocal and some other features. We don't have good test coverage for deferrals, mypy daemon, and disabling strict optional.
  • Imported names can't be redefined in a consistent way. This needs further analysis.

In self check the feature generates 6 additional errors, which all seem correct -- we infer more precise types, which will generate additional errors due to invariant containers and fixing false negatives.

When type checking the largest internal codebase at Dropbox, this generated about 700 new errors, the vast majority of which seemed legitimate. Mostly they were due to inferring more precise types for variables that used to have Any types. I used a recent but not the latest version of the feature to type check the internal codebase.

@JukkaL JukkaL requested a review from ilevkivskyi February 24, 2025 14:17

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@cdce8p
Copy link
Collaborator

cdce8p commented Feb 26, 2025

Got an error when testing the branch with Home Assistant.

Error: /.../python3.13/site-packages/azure/servicebus/management/_generated/_serialization.py:655: error: INTERNAL ERROR
version: 1.16.0+dev.5bd6defcc6e2eb0f47e59f4bee35a51db5a27725
/.../python3.13/site-packages/azure/servicebus/management/_generated/_serialization.py:655: : note: use --pdb to drop into pdb
Traceback (most recent call last):
  File "mypy/checker.py", line 580, in accept
  File "mypy/nodes.py", line 1462, in accept
  File "mypy/checker.py", line 5263, in visit_for_stmt
  File "mypy/checker.py", line 629, in accept_loop
RuntimeError: Too many iterations when checking a loop

It seems to fail when checking azure-servicebus and in particular this function with the Home Assistant mypy config: https://github.com/Azure/azure-sdk-for-python/blob/azure-servicebus_7.10.0/sdk/servicebus/azure-servicebus/azure/servicebus/management/_generated/_serialization.py#L568

Was able to narrow it down to this

# mypy: local-partial-types, allow-redefinition-new, show-traceback
from typing import Any
def func() -> Any: ...

def serialize() -> Any:
    for _ in range(10):
        keys = func()
        new_attr = func()

        for k in reversed(keys):
            new_attr = {k: new_attr}

reveal_type(1)
Full traceback
$ mypy test.py 
test.py:10: error: INTERNAL ERROR
version: 1.16.0+dev.5bd6defcc6e2eb0f47e59f4bee35a51db5a27725
Traceback (most recent call last):
  File ".../venv/bin/mypy", line 8, in <module>
    sys.exit(console_entry())
  File ".../mypy/__main__.py", line 15, in console_entry
    main()
  File ".../mypy/main.py", line 126, in main
    res, messages, blockers = run_build(sources, options, fscache, t0, stdout, stderr)
  File ".../mypy/main.py", line 210, in run_build
    res = build.build(sources, options, None, flush_errors, fscache, stdout, stderr)
  File ".../mypy/build.py", line 191, in build
    result = _build(
  File ".../mypy/build.py", line 267, in _build
    graph = dispatch(sources, manager, stdout)
  File ".../mypy/build.py", line 2939, in dispatch
    process_graph(graph, manager)
  File ".../mypy/build.py", line 3337, in process_graph
    process_stale_scc(graph, scc, manager)
  File ".../mypy/build.py", line 3438, in process_stale_scc
    graph[id].type_check_first_pass()
  File ".../mypy/build.py", line 2311, in type_check_first_pass
    self.type_checker().check_first_pass()
  File ".../mypy/checker.py", line 473, in check_first_pass
    self.accept(d)
  File ".../mypy/checker.py", line 580, in accept
    stmt.accept(self)
  File ".../mypy/nodes.py", line 813, in accept
    return visitor.visit_func_def(self)
  File ".../mypy/checker.py", line 1097, in visit_func_def
    self._visit_func_def(defn)
  File ".../mypy/checker.py", line 1101, in _visit_func_def
    self.check_func_item(defn, name=defn.name)
  File ".../mypy/checker.py", line 1135, in check_func_item
    self.check_func_def(defn, typ, name, allow_empty)
  File ".../mypy/checker.py", line 1422, in check_func_def
    self.accept(item.body)
  File ".../mypy/checker.py", line 580, in accept
    stmt.accept(self)
  File ".../mypy/nodes.py", line 1281, in accept
    return visitor.visit_block(self)
  File ".../mypy/checker.py", line 3091, in visit_block
    self.accept(s)
  File ".../mypy/checker.py", line 580, in accept
    stmt.accept(self)
  File ".../mypy/nodes.py", line 1462, in accept
    return visitor.visit_for_stmt(self)
  File ".../mypy/checker.py", line 5263, in visit_for_stmt
    self.accept_loop(
  File ".../mypy/checker.py", line 616, in accept_loop
    self.accept(body)
  File ".../mypy/checker.py", line 580, in accept
    stmt.accept(self)
  File ".../mypy/nodes.py", line 1281, in accept
    return visitor.visit_block(self)
  File ".../mypy/checker.py", line 3091, in visit_block
    self.accept(s)
  File ".../mypy/checker.py", line 580, in accept
    stmt.accept(self)
    ~~~~~~~~~~~^^^^^^
  File ".../mypy/nodes.py", line 1462, in accept
    return visitor.visit_for_stmt(self)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File ".../mypy/checker.py", line 5263, in visit_for_stmt
    self.accept_loop(
    ~~~~~~~~~~~~~~~~^
        s.body,
        ^^^^^^^
    ...<3 lines>...
        ),
        ^^
    )
    ^
  File ".../mypy/checker.py", line 629, in accept_loop
    raise RuntimeError("Too many iterations when checking a loop")
RuntimeError: Too many iterations when checking a loop

@JukkaL
Copy link
Collaborator Author

JukkaL commented Feb 26, 2025

Yeah, looks like something is wrong with nested loops. Simplified the reproducer to this:

# mypy: local-partial-types, allow-redefinition-new

def a(): ...

def f() -> None:
    while int():
        x = a()

        while int():
            x = [x]

Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@cdce8p
Copy link
Collaborator

cdce8p commented Feb 27, 2025

Check Home Assistant again. The run now completes with 59 new errors. Most are either true-positives or can be resolved quite easily by adding an explicit type annotation. Noticed two cases which might require more attention

# mypy: local-partial-types, allow-redefinition-new, allow-empty-bodies

def func() -> None:
    l = None

    if int():
        l = []
        l.append(1)  # fine -> error 'Argument 1 to "append" of "list" has incompatible type "int"; expected "Never"'

This one came up quite a lot actually. Yes, it would be possible to resolve it by annotating l with list[Any] | None. However, if possible, I think it would be better if mypy was able to resolve it to list[Any] itself.

For the second example, I'm not really sure what's going on. After the if, x get's narrowed even though there wasn't any assignment. This only seems to happen with tuple unpacking.

# mypy: local-partial-types, allow-redefinition-new, allow-empty-bodies
from typing import Any
def get_details() -> tuple[Any, Any] | tuple[None, None]: ...

def func2() -> None:
    x, _ = get_details()
    reveal_type(x)
    if x and int():
        ...
    reveal_type(x)   # Any | None -> None  # ??
    if x and int():  # fine -> error 'Right operand of "and" is never evaluated'
        ...

@A5rocks
Copy link
Collaborator

A5rocks commented Feb 28, 2025

When running over trio, I found another issue:

def blah() -> tuple[int]:
    return (42,)

def main() -> None:
    x, *_ = blah()  # E: Need type annotation for "_" (hint: "_: list[<type>] = ...")

(note that if the main function isn't there then the error happens without this PR. Not sure what's happening)

@hauntsaninja
Copy link
Collaborator

Could be worth enabling it by default temporarily in this PR to see if mypy_primer surfaces stuff?

@ilevkivskyi
Copy link
Member

Could be worth enabling it by default temporarily in this PR to see if mypy_primer surfaces stuff?

I would strongly recommend this as well. It was really helpful e.g. when I worked on new inference. I will try to look at this PR soon (even before others in my queue).

In the meantime, the second example posted by @cdce8p is because binder has special logic for multi-assign from union. It doesn't put immediately, but instead accumulates assignments, and later puts them together, see e.g. accumulate_type_assignments().

@jorenham

This comment was marked as off-topic.

@ilevkivskyi
Copy link
Member

Most likely this is because this PR somehow breaks logic for overrides with custom properties (i.e. those where setter has different type).

@jorenham
Copy link

jorenham commented Mar 5, 2025

Most likely this is because this PR somehow breaks logic for overrides with custom properties (i.e. those where setter has different type).

That makes sense. And to be fair, it's a rather sneaky thing that I'm doing there, so I suppose you could consider it as a stress-test.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally had time to look at this. I think the approach looks very promising. I left a bunch of comments but they are minor. As it was proposed before I suggest again to run against mypy_primer with the flag set to True to get better idea of impact.

@@ -226,12 +233,15 @@ def update_from_options(self, frames: list[Frame]) -> bool:
for key in keys:
current_value = self._get(key)
resulting_values = [f.types.get(key, current_value) for f in frames]
if any(x is None for x in resulting_values):
old_semantics = not self.bind_all or extract_var_from_literal_hash(key) is None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite tricky and with time we may forget what old_semantics mean. I think it is worth adding a comment with motivating code example here.

@@ -316,6 +316,8 @@ class TypeChecker(NodeVisitor[None], CheckerPluginInterface):
# Vars for which partial type errors are already reported
# (to avoid logically duplicate errors with different error context).
partial_reported: set[Var]
# Short names of Var nodes whose previous inferred type has been widened via assignment
widened_vars: list[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is only used to track updates in a loop. If so, can this be replaced with a counter, or even a boolean flag? Otherwise people may accidentally start using this for something else, and short names are fragile w.r.t. name clashes.

if (
(partials_new == partials_old)
and (not self.binder.last_pop_changed or iter > 3)
and (widened_new == widened_old or iter > 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FMI, why these numbers (1 vs 3) are different? They look quite arbitrary.

if v.type is not None:
n = NameExpr(v.name)
n.node = v
self.binder.assign_type(n, v.type, v.type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use .put() here? Or is the TODO above about this? (If yes, it is not obvious there)

@@ -3286,7 +3317,10 @@ def check_assignment(
lvalue_type = make_optional_type(lvalue_type)
self.set_inferred_type(lvalue.node, lvalue, lvalue_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is old special-casing for inferring optional types from assignments in if statements. Shouldn't we make this conditional on new redefinition flag being False?

new_type = DeletedType(source=source)
var.node.type = new_type
if self.options.allow_redefinition_new:
self.binder.assign_type(var, new_type, new_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I fully understand the logic here. But again, it seems .put() may be used here.

reveal_type(C4().x) # N: Revealed type is "builtins.list[builtins.str]"
[builtins fixtures/list.pyi]

[case testZZZ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this test is doing here (also it doesn't set new flag).

except Exception:
# TODO: Too wide
reveal_type(x) # N: Revealed type is "Union[builtins.int, builtins.str]"
# TODO: Too wide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this problem specific to try/except? If yes, it is fine to leave it for a follow-up PR.

reveal_type(x) # N: Revealed type is "builtins.int"
else:
x = ''
reveal_type(x) # N: Revealed type is "builtins.str"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe my missed them, but I think it is important to add tests with deeper (like 2-level and maybe even 3-level) nesting of if statements, checking the revealed type on exiting each nesting level.

self.widened_vars.append(inferred.name)
self.set_inferred_type(inferred, lvalue, lvalue_type)
self.binder.put(lvalue, rvalue_type)
# TODO: hack, maybe integrate into put?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I am not sure why integrating the update of declaration into .put() would be better.

@jorenham
Copy link

jorenham commented Mar 7, 2025

I just tried running this on the (new) numpy stubs in numpy/numtype on the current main branch (3e9fc75). But hen I run it with uv run --no-editable mypy --tb src, I see the following:

      Built numtype @ file:///projects/numtype
Uninstalled 1 package in 0.40ms
Installed 1 package in 2ms
src/numpy-stubs/ma/core.pyi:308: error: INTERNAL ERROR -- Please try using mypy master on GitHub:
https://mypy.readthedocs.io/en/stable/common_issues.html#using-a-development-mypy-build
Please report a bug at https://github.com/python/mypy/issues
version: 1.16.0+dev.3e38e7d24ec06675d5b8b8df0b18c2e25d62f3c6
Traceback (most recent call last):
  File "/projects/numtype/.venv/bin/mypy", line 10, in <module>
    sys.exit(console_entry())
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/__main__.py", line 15, in console_entry
    main()
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/main.py", line 126, in main
    res, messages, blockers = run_build(sources, options, fscache, t0, stdout, stderr)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/main.py", line 210, in run_build
    res = build.build(sources, options, None, flush_errors, fscache, stdout, stderr)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/build.py", line 191, in build
    result = _build(
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/build.py", line 267, in _build
    graph = dispatch(sources, manager, stdout)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/build.py", line 2939, in dispatch
    process_graph(graph, manager)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/build.py", line 3337, in process_graph
    process_stale_scc(graph, scc, manager)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/build.py", line 3438, in process_stale_scc
    graph[id].type_check_first_pass()
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/build.py", line 2311, in type_check_first_pass
    self.type_checker().check_first_pass()
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 473, in check_first_pass
    self.accept(d)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 580, in accept
    stmt.accept(self)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/nodes.py", line 1200, in accept
    return visitor.visit_class_def(self)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 2597, in visit_class_def
    self.accept(defn.defs)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 580, in accept
    stmt.accept(self)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/nodes.py", line 1281, in accept
    return visitor.visit_block(self)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 3091, in visit_block
    self.accept(s)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 580, in accept
    stmt.accept(self)
    ~~~~~~~~~~~^^^^^^
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/nodes.py", line 580, in accept
    return visitor.visit_overloaded_func_def(self)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 659, in visit_overloaded_func_def
    self._visit_overloaded_func_def(defn)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 715, in _visit_overloaded_func_def
    found_method_base_classes = self.check_method_override(defn)
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 2050, in check_method_override
    result = self.check_method_or_accessor_override_for_base(
        defn, base, check_override_compatibility
    )
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 2088, in check_method_or_accessor_override_for_base
    if self.check_method_override_for_base_with_name(defn, name, base):
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 2222, in check_method_override_for_base_with_name
    self.check_setter_type_override(defn, base_attr, base)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 2112, in check_setter_type_override
    original_type, is_original_setter = get_raw_setter_type(base_node)
                                        ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/projects/numtype/.venv/lib/python3.13/site-packages/mypy/checker.py", line 9035, in get_raw_setter_type
    assert var.type is not None
           ^^^^^^^^^^^^^^^^^^^^
AssertionError: 
src/numpy-stubs/ma/core.pyi:308: : note: use --pdb to drop into pdb

Here's the relevant numpy-stubs/ma/core.pyi code: numpy/numtype@3e9fc75/src/numpy-stubs/ma/core.pyi#L304-L311

I removed the .mypy_cache before running it, and installed it by switching "mypy>=1.15.0" to "mypy @ git+https://github.com/python/mypy@new-allow-redefine" in the [dependency-groups] of the pyproject.toml (numpy/numtype@3e9fc75/pyproject.toml#L50) and then running uv sync. Is there something I'm doing wrong?

edit: I just checked again with the --allow-redefinition-new flag, and I'm getting the same traceback.

This crash seems to be unrelated to this PR: #18764

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants