Open
Description
I don't exactly have a minimal reproducer right now, but when tests improperly terminate (e.g. call sys.exit(1)
) we find that the replacing worker will try to access registered_collections
while it is not in it. For example when running with pytest -n 2 --dist loadgroup
we see the following:
....................................................................................... [ 25%]
....................................................................................... [ 25%]
....................................................................................... [ 26%]
...............................................................................[gw0] node down: Not properly terminated
F
replacing crashed worker gw0
collecting: 2/3 workers[gw1] node down: Not properly terminated
attempted to index with <WorkerController gw2>
replacing crashed worker gw1
collecting: 3/4 workersattempted to index with <WorkerController gw3>
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 283, in wrap_session
INTERNALERROR> session.exitstatus = doit(config, session) or 0
INTERNALERROR> ^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 337, in _main
INTERNALERROR> config.hook.pytest_runtestloop(session=session)
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 513, in __call__
INTERNALERROR> return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
INTERNALERROR> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120, in _hookexec
INTERNALERROR> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
INTERNALERROR> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 139, in _multicall
INTERNALERROR> raise exception.with_traceback(exception.__traceback__)
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 122, in _multicall
INTERNALERROR> teardown.throw(exception) # type: ignore[union-attr]
INTERNALERROR> ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/_pytest/logging.py", line 805, in pytest_runtestloop
INTERNALERROR> return (yield) # Run all the tests.
INTERNALERROR> ^^^^^
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 122, in _multicall
INTERNALERROR> teardown.throw(exception) # type: ignore[union-attr]
INTERNALERROR> ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/_pytest/terminal.py", line 673, in pytest_runtestloop
INTERNALERROR> result = yield
INTERNALERROR> ^^^^^
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103, in _multicall
INTERNALERROR> res = hook_impl.function(*args)
INTERNALERROR> ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/xdist/dsession.py", line 138, in pytest_runtestloop
INTERNALERROR> self.loop_once()
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/xdist/dsession.py", line 163, in loop_once
INTERNALERROR> call(**kwargs)
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/xdist/dsession.py", line 306, in worker_collectionfinish
INTERNALERROR> self.sched.schedule()
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/xdist/scheduler/loadscope.py", line 359, in schedule
INTERNALERROR> self._reschedule(node)
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/xdist/scheduler/loadscope.py", line 341, in _reschedule
INTERNALERROR> self._assign_work_unit(node)
INTERNALERROR> File "/Users/orlp/programming/rust/polars/.venv/lib/python3.11/site-packages/xdist/scheduler/loadscope.py", line 276, in _assign_work_unit
INTERNALERROR> worker_collection = self.registered_collections[node]
INTERNALERROR> ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
INTERNALERROR> KeyError: <WorkerController gw3>
Note the following lines:
attempted to index with <WorkerController gw2>
attempted to index with <WorkerController gw3>
These are the replacement workers for the crashed workers gw0
and gw1
. These lines were printed by the following debug statement I added in loadscope.py
:
# Ask the node to execute the workload
try:
worker_collection = self.registered_collections[node]
except:
print("attempted to index with", node)
raise
Turning off --dist loadgroup
fixes the issue. I've reproduced with pytest==8.3.5
.
Metadata
Metadata
Assignees
Labels
No labels