Skip to content

Add message with debugging info to Cancelled #3256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

jakkdl
Copy link
Member

@jakkdl jakkdl commented Apr 22, 2025

fixes #3232 by adding a CancelReason that's tracked by both CancelScope and CancelStatus and then used when constructing a Cancelled to be raised.

Remaining stuff:

  • I'm not 100% sure that both CancelScope and CancelStatus needs to have a CancelReason member - initially only CancelStatus had it but that wasn't enough, but I haven't tried to see if I can get away with only CancelScope having it.
    • It's now only tracked in CancelScope
  • I also found a bug as I was reviewing and adding a final test, where manually setting cs.deadline=-inf and then checkpointing somehow ends up as an explicit cancelled.
  • from_thread_check_cancelled needs some special handling
    • I managed to fix this.. but in a way that created cyclic garbage. So need a better fix

Copy link

codecov bot commented Apr 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00000%. Comparing base (9c909fa) to head (87a8e07).

Additional details and impacted files
@@               Coverage Diff                @@
##                 main        #3256    +/-   ##
================================================
  Coverage   100.00000%   100.00000%            
================================================
  Files             124          125     +1     
  Lines           19047        19182   +135     
  Branches         1287         1291     +4     
================================================
+ Hits            19047        19182   +135     
Files with missing lines Coverage Δ
src/trio/_channel.py 100.00000% <100.00000%> (ø)
src/trio/_core/_asyncgens.py 100.00000% <100.00000%> (ø)
src/trio/_core/_exceptions.py 100.00000% <100.00000%> (ø)
src/trio/_core/_run.py 100.00000% <100.00000%> (ø)
src/trio/_core/_tests/test_cancel.py 100.00000% <100.00000%> (ø)
src/trio/_core/_tests/test_run.py 100.00000% <100.00000%> (ø)
src/trio/_highlevel_generic.py 100.00000% <100.00000%> (ø)
src/trio/_highlevel_open_tcp_stream.py 100.00000% <100.00000%> (ø)
src/trio/_subprocess.py 100.00000% <100.00000%> (ø)
src/trio/_tests/test_threads.py 100.00000% <100.00000%> (ø)
... and 3 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@CoolCat467
Copy link
Member

So far looking good, I didn't see anything immediately that I would suggest changing.

Curious to hear about your immediate ideas about usecases for this

@jakkdl
Copy link
Member Author

jakkdl commented Apr 24, 2025

Curious to hear about your immediate ideas about usecases for this

see #3232 - if you're encountering a bunch of Cancelled and don't know why, having more info attached to them should make debugging easier

@jakkdl jakkdl marked this pull request as ready for review April 25, 2025 16:05
@jakkdl
Copy link
Member Author

jakkdl commented Apr 25, 2025

Fixed most of the problems, only remaining part is getting the right incantation for _attempt_delivery_of_any_pending_cancel that manages to keep the CancelReason alive in other threads, but not causing any cyclic garbage. Slapping another attribute onto trio._threads.PARENT_TASK_DATA might work, but it feels like it should be possible without it.
I've tried different permutations of finally: del [raise_cancel/cancelled], but need to dig deeper into understanding the actual problem unless somebody else want to chime in. @graingert?

Also the error message can probably be improved

@jakkdl jakkdl requested a review from Zac-HD April 25, 2025 16:11
@graingert
Copy link
Member

graingert commented Apr 25, 2025

I'll have a look, does fixing Outcome to delete its values fix it?

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really looking forward to using this, and to my colleagues having less to complain about when large complicated things get cancelled 😅

Many small comments below, but they're basically all tactical/local things, and I agree that once we get the gc stuff sorted out this will be basically ready to go. Thanks!

@jakkdl
Copy link
Member Author

jakkdl commented Apr 28, 2025

I'll have a look, does fixing Outcome to delete its values fix it?

nope

@jakkdl
Copy link
Member Author

jakkdl commented Apr 29, 2025

Okay I found a solution to gc+threads in 69c398b, not the prettiest but it seems to work at least? Very open to there being better ways of doing it

  1. I did not find a good way of binding a CancelReason or Cancelled object to raise_cancel as a function ... but by making it a callable class it was possible without _attempt_delivery_of_any_pending_cancel holding any references after passing it to _attempt_abort.
  2. I needed try: raise self.cancelled; finally: del self.cancelled to pass gc checks, but that led to crashes in thread tests as the same cancel func appears to get reused across multiple threads. The only way I figured out how to resolve this was to copy.copy the RaiseCancel object in _threads.py's abort function. I feel like there should be a cleaner way of doing this, but the object is small so copying it should be fairly cheap.

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor tweaks only; merge when ready!

Copy link
Contributor

@A5rocks A5rocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's some comments, mostly trivial/nitpicks. I haven't looked to the end yet since I don't immediately like some stuff I saw and I want to think about whether that's justified or not!

"KeyboardInterrupt", "deadline", "explicit", "nursery", "shutdown", "unknown"
]
# repr(Task), so as to avoid gc troubles from holding a reference
source_task: str | None = None
Copy link
Contributor

@A5rocks A5rocks Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure repr(Task) is actually that useful? Like yes, it says what function this was started in and maybe you can locate the task through some gc machinery through the id, but...

Maybe a weakref would be more useful so people can access attributes? I feel like "where was I spawned" is already answered by the stack trace. (nevermind, just remembered this is the canceller not the cancellee)


Nevermind, I didn't think this suggestion through. A weakref wouldn't work for the common case (a task cancelling a sibling task).

I'm not convinced a strong ref here would be bad -- a Task doesn't store the exception or the cancellation reason so there's no reference cycle I think? But a string here is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a Task contains a CancelScope and the scope gets cancelled within the same task, the scope will then have a strong ref to a CancelReason which will then point back to the Task. I think?

the repr(Task) on its own is perhaps not super useful, but in case you have multiple cancellations going on at the same time that are only distinguished by the source task then you can visually distinguish them even without other sources of the task id.
Though it does also contain the name of the function itself:
<Task 'trio._core._tests.test_run.test_Cancelled_str' at 0x\w*>
which could be very helpful if you have different functions spawned in a nursery.

Copy link
Contributor

@A5rocks A5rocks May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess task -> parent/child nursery -> cancel scope -> cancel reason -> task is a loop, yeah. That's annoying. (Or maybe CoroutineType stores frames as a strongref? That too.)

@@ -1192,8 +1240,10 @@ def parent_task(self) -> Task:
"(`~trio.lowlevel.Task`): The Task that opened this nursery."
return self._parent_task

def _add_exc(self, exc: BaseException) -> None:
def _add_exc(self, exc: BaseException, reason: CancelReason | None) -> None:
Copy link
Contributor

@A5rocks A5rocks Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only callers of this are internal, IMO it would be cleaner to have them set the reason inline. Also, to avoid multiple comments for similar things, why doesn't this unconditionally set _cancel_reason = reason if it isn't None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesn't this unconditionally set _cancel_reason = reason if it isn't None?

in case we get multiple sources of cancellation I don't want to override the first one. In other places it's more critical, but here I could see a scenario where:

  1. Something causes a cancellation, be it a deadline or a crashing task or whatever
  2. a different task B gets cancelled, but they have an except Cancelled, and inside that handler they raise a different exception
  3. without if self.cancel_scope._cancel_reason is None: the cause would now get set to task B raising an exception

so I'm pretty sure we need the if, which means we'd need to write the if three times if we did it in-line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not relevant anymore

Copy link
Contributor

@A5rocks A5rocks May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the code to see why this isn't relevant anymore, but I already typed up a response comment to this:

I'm not entirely convinced avoiding this is a good thing:

async def crasher():
  await trio.sleep(2)
  raise ValueError("...")

with trio.open_nursery() as nursery:
  try:
    nursery.start_soon(crasher)
    try:
      await trio.sleep(10)  # app code
    finally:
      with trio.move_on_after(2, shield=True):
        await trio.sleep(3)  # cleanup code
  except trio.Cancelled as c:
    # what should c's cancel reason be
    raise

This might matter for instance in code for shutting down stuff on exceptions, moving on after 2 seconds. The cancel reason if the clean up code ran over its 2 seconds would presumably (?) be that the exceptions happened, not that the timeout happened. I think it would make more sense if the reason was instead about the timeout.

(I haven't played with this PR yet so I'm not sure that's actually what will happen)


Would it make sense to establish some sort of causal mechanism? I.e. a field on CancelReason that points to the old CancelReason. (I guess Cancelled could store another Cancelled? But that might be bad for cycles.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah current behavior is that the crasher becomes the reason.

Storing a chain of Cancelled sounds tricky and very likely to induce gc problems. I'm pretty sure we'd have to store any raised Cancelled in the scope itself in order to be able to refer back to them.

... although the crash cancellation should be accessible somehow in the finally scope to be set as __context__. I wonder where that is getting lost

But storing a chain of reasons would be fairly straightforward and sounds like it might have some good use

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I found a repro where sys.exc_info() got cleared, but I might have mistaken myself and idr the repro anymore.

But going back to your example:
I have a pretty strong intuition that the reason the nursery scope is canceled is because a child crashed. The deadline is the reason the inner scope inside the finally is canceled, but that cancellation will be swallowed by move_on_after and even in a world where we stored a chain of reasons the nursery scope would never see the deadline cancellation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point: what's the cancel reason visible inside the move_on_after then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it was deadline, with the child-crashing cancelled in its __context__, because of the shielding. I can add a test case for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, yeah that behavior sounds nice. Returning to the earliest response you have, is that (nursery cancel -> raise a different exception in one of the tasks) the only case where things try to overwrite the cancellation reason? If so, I think it would be nicer to make nurseries not try to cancel if they are already cancelled (which would prevent the cancellation reason from being overwritten).


I also see that changing the deadline can potentially overwrite. I don't see why that would try to cancel anything if things are already cancelled... I guess just code simplicity.

I guess it makes sense to try to handle it in one place with a check on the cancellation reason, then. I just don't like it!

@jakkdl
Copy link
Member Author

jakkdl commented May 1, 2025

pypy fail is unrelated

+ python -m pip install -U pip uv -c test-requirements.txt
Install dependencies
  Requirement already satisfied: pip in c:\hostedtoolcache\windows\pypy\3.10.16\x86\lib\site-packages (25.1)
  ERROR: Cannot install uv because these package versions have conflicting dependencies.
  
  The conflict is caused by:
      The user requested uv
      The user requested (constraint) uv==0.7.2
  
  To fix this you could try to:
  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip to attempt to solve the dependency conflict
  
  ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
  Error: Process completed with exit code 1.

@jakkdl jakkdl requested a review from A5rocks May 1, 2025 11:06
@CoolCat467
Copy link
Member

I've seen that pypy failure before, and it usually means that it's using an outdated package cache. Clearing package cache will fix it (though I don't know how to do that myself)

@Zac-HD
Copy link
Member

Zac-HD commented May 1, 2025

usually clicking the 🗑️ icons in https://github.com/python-trio/trio/actions/caches

Copy link
Contributor

@A5rocks A5rocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine actually. Just a bunch of trivial comments.

@@ -1192,8 +1240,10 @@ def parent_task(self) -> Task:
"(`~trio.lowlevel.Task`): The Task that opened this nursery."
return self._parent_task

def _add_exc(self, exc: BaseException) -> None:
def _add_exc(self, exc: BaseException, reason: CancelReason | None) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, yeah that behavior sounds nice. Returning to the earliest response you have, is that (nursery cancel -> raise a different exception in one of the tasks) the only case where things try to overwrite the cancellation reason? If so, I think it would be nicer to make nurseries not try to cancel if they are already cancelled (which would prevent the cancellation reason from being overwritten).


I also see that changing the deadline can potentially overwrite. I don't see why that would try to cancel anything if things are already cancelled... I guess just code simplicity.

I guess it makes sense to try to handle it in one place with a check on the cancellation reason, then. I just don't like it!

raise Cancelled._create()

self._attempt_abort(raise_cancel)
self._attempt_abort(RaiseCancel(self._cancel_status._scope._cancel_reason))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this class, though I guess it's temporary if #3233 is merged....

I am confused about the exact justification, which probably makes sense but...:

I did not find a good way of binding a CancelReason or Cancelled object to raise_cancel as a function ...

Can't you make a cancel reason / cancelled be a nonlocal? I'm not sure what more binding you need.


I can see why it's not possible to have Cancelled created outside, but could self._cancel_status._scope._cancel_reason be used? Why not?

run_sync_soon_nursery.cancel_scope._cancel(
CancelReason(
source="shutdown",
reason="main task done, shut down run_sync_soon callbacks",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reason="main task done, shut down run_sync_soon callbacks",
reason="main task done, shutting down run_sync_soon callbacks",

to match with above

and task._runner.ki_pending
):
task._cancel_status._scope._cancel_reason = CancelReason(
source="KeyboardInterrupt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of strange to me to set this here. Is there no way for the thing raising KeyboardInterrupt to set this cancel reason?

@@ -0,0 +1,151 @@
# there's tons of cancellation testing in test_run, but that file is 3k lines long already...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if you want to split the test files, could you move some of the cancellation tests here? (It doesn't have to be every one of them, just a quick search + copy paste)

cs.cancel(reason="hello")
with pytest.raises(
Cancelled,
match=rf"^cancelled due to explicit with reason 'hello' from task {current_task()!r}$",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this wording. IMO even leaving out the cancellation reason if it's explicit would be preferable. Like I would rather read:

cancelled with reason 'hello' from task {current_task()!r}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also could this test also check the introspection attributes?

await nursery.start(cancelled_task, fail_task)


async def test_cancel_reason_nursery2() -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining the difference? It took me a bit to see the difference is whether the task is cancelled before trio.sleep_forever or after.

with RaisesGroup(ValueError):
async with trio.open_nursery() as nursery:
nursery.start_soon(child)
await ev.wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way I think this and others could use trio.testing.wait_all_tasks_blocked instead (and avoid having an event).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One missing test: how about a task that is stated via nursery.start and raises before task_status.started()?

assert str(cancelled) == "cancelled due to explicit"
assert re.fullmatch(
r"cancelled due to deadline from task "
r"<Task 'trio._core._tests.test_run.test_Cancelled_str' at 0x\w*>",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r"<Task 'trio._core._tests.test_run.test_Cancelled_str' at 0x\w*>",
rf"{trio.lowlevel.current_task()!r}",

? Same elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add exception message for easier debugging with Cancelled
6 participants