-
-
Notifications
You must be signed in to change notification settings - Fork 353
Add message with debugging info to Cancelled #3256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3256 +/- ##
================================================
Coverage 100.00000% 100.00000%
================================================
Files 124 125 +1
Lines 19047 19182 +135
Branches 1287 1291 +4
================================================
+ Hits 19047 19182 +135
🚀 New features to boost your workflow:
|
So far looking good, I didn't see anything immediately that I would suggest changing. Curious to hear about your immediate ideas about usecases for this |
see #3232 - if you're encountering a bunch of |
Fixed most of the problems, only remaining part is getting the right incantation for Also the error message can probably be improved |
I'll have a look, does fixing Outcome to delete its values fix it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really looking forward to using this, and to my colleagues having less to complain about when large complicated things get cancelled 😅
Many small comments below, but they're basically all tactical/local things, and I agree that once we get the gc stuff sorted out this will be basically ready to go. Thanks!
nope |
Okay I found a solution to gc+threads in 69c398b, not the prettiest but it seems to work at least? Very open to there being better ways of doing it
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor tweaks only; merge when ready!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's some comments, mostly trivial/nitpicks. I haven't looked to the end yet since I don't immediately like some stuff I saw and I want to think about whether that's justified or not!
"KeyboardInterrupt", "deadline", "explicit", "nursery", "shutdown", "unknown" | ||
] | ||
# repr(Task), so as to avoid gc troubles from holding a reference | ||
source_task: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure repr(Task)
is actually that useful? Like yes, it says what function this was started in and maybe you can locate the task through some gc machinery through the id
, but...
Maybe a weakref
would be more useful so people can access attributes? I feel like "where was I spawned" is already answered by the stack trace. (nevermind, just remembered this is the canceller not the cancellee)
Nevermind, I didn't think this suggestion through. A weakref wouldn't work for the common case (a task cancelling a sibling task).
I'm not convinced a strong ref here would be bad -- a Task doesn't store the exception or the cancellation reason so there's no reference cycle I think? But a string here is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a Task
contains a CancelScope
and the scope gets cancelled within the same task, the scope will then have a strong ref to a CancelReason
which will then point back to the Task
. I think?
the repr(Task)
on its own is perhaps not super useful, but in case you have multiple cancellations going on at the same time that are only distinguished by the source task then you can visually distinguish them even without other sources of the task id.
Though it does also contain the name of the function itself:
<Task 'trio._core._tests.test_run.test_Cancelled_str' at 0x\w*>
which could be very helpful if you have different functions spawned in a nursery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess task -> parent/child nursery -> cancel scope -> cancel reason -> task is a loop, yeah. That's annoying. (Or maybe CoroutineType stores frames as a strongref? That too.)
@@ -1192,8 +1240,10 @@ def parent_task(self) -> Task: | |||
"(`~trio.lowlevel.Task`): The Task that opened this nursery." | |||
return self._parent_task | |||
|
|||
def _add_exc(self, exc: BaseException) -> None: | |||
def _add_exc(self, exc: BaseException, reason: CancelReason | None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the only callers of this are internal, IMO it would be cleaner to have them set the reason inline. Also, to avoid multiple comments for similar things, why doesn't this unconditionally set _cancel_reason = reason
if it isn't None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why doesn't this unconditionally set _cancel_reason = reason if it isn't None?
in case we get multiple sources of cancellation I don't want to override the first one. In other places it's more critical, but here I could see a scenario where:
- Something causes a cancellation, be it a deadline or a crashing task or whatever
- a different task B gets cancelled, but they have an
except Cancelled
, and inside that handler they raise a different exception - without
if self.cancel_scope._cancel_reason is None:
the cause would now get set to task B raising an exception
so I'm pretty sure we need the if
, which means we'd need to write the if
three times if we did it in-line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not relevant anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the code to see why this isn't relevant anymore, but I already typed up a response comment to this:
I'm not entirely convinced avoiding this is a good thing:
async def crasher():
await trio.sleep(2)
raise ValueError("...")
with trio.open_nursery() as nursery:
try:
nursery.start_soon(crasher)
try:
await trio.sleep(10) # app code
finally:
with trio.move_on_after(2, shield=True):
await trio.sleep(3) # cleanup code
except trio.Cancelled as c:
# what should c's cancel reason be
raise
This might matter for instance in code for shutting down stuff on exceptions, moving on after 2 seconds. The cancel reason if the clean up code ran over its 2 seconds would presumably (?) be that the exceptions happened, not that the timeout happened. I think it would make more sense if the reason was instead about the timeout.
(I haven't played with this PR yet so I'm not sure that's actually what will happen)
Would it make sense to establish some sort of causal mechanism? I.e. a field on CancelReason that points to the old CancelReason. (I guess Cancelled could store another Cancelled? But that might be bad for cycles.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah current behavior is that the crasher becomes the reason.
Storing a chain of Cancelled
sounds tricky and very likely to induce gc problems. I'm pretty sure we'd have to store any raised Cancelled
in the scope itself in order to be able to refer back to them.
... although the crash cancellation should be accessible somehow in the finally
scope to be set as __context__
. I wonder where that is getting lost
But storing a chain of reasons would be fairly straightforward and sounds like it might have some good use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I found a repro where sys.exc_info()
got cleared, but I might have mistaken myself and idr the repro anymore.
But going back to your example:
I have a pretty strong intuition that the reason the nursery scope is canceled is because a child crashed. The deadline is the reason the inner scope inside the finally
is canceled, but that cancellation will be swallowed by move_on_after
and even in a world where we stored a chain of reasons the nursery scope would never see the deadline cancellation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point: what's the cancel reason visible inside the move_on_after then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure it was deadline
, with the child-crashing cancelled in its __context__
, because of the shielding. I can add a test case for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, yeah that behavior sounds nice. Returning to the earliest response you have, is that (nursery cancel -> raise a different exception in one of the tasks) the only case where things try to overwrite the cancellation reason? If so, I think it would be nicer to make nurseries not try to cancel if they are already cancelled (which would prevent the cancellation reason from being overwritten).
I also see that changing the deadline can potentially overwrite. I don't see why that would try to cancel anything if things are already cancelled... I guess just code simplicity.
I guess it makes sense to try to handle it in one place with a check on the cancellation reason, then. I just don't like it!
pypy fail is unrelated
|
I've seen that pypy failure before, and it usually means that it's using an outdated package cache. Clearing package cache will fix it (though I don't know how to do that myself) |
usually clicking the 🗑️ icons in https://github.com/python-trio/trio/actions/caches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine actually. Just a bunch of trivial comments.
@@ -1192,8 +1240,10 @@ def parent_task(self) -> Task: | |||
"(`~trio.lowlevel.Task`): The Task that opened this nursery." | |||
return self._parent_task | |||
|
|||
def _add_exc(self, exc: BaseException) -> None: | |||
def _add_exc(self, exc: BaseException, reason: CancelReason | None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, yeah that behavior sounds nice. Returning to the earliest response you have, is that (nursery cancel -> raise a different exception in one of the tasks) the only case where things try to overwrite the cancellation reason? If so, I think it would be nicer to make nurseries not try to cancel if they are already cancelled (which would prevent the cancellation reason from being overwritten).
I also see that changing the deadline can potentially overwrite. I don't see why that would try to cancel anything if things are already cancelled... I guess just code simplicity.
I guess it makes sense to try to handle it in one place with a check on the cancellation reason, then. I just don't like it!
raise Cancelled._create() | ||
|
||
self._attempt_abort(raise_cancel) | ||
self._attempt_abort(RaiseCancel(self._cancel_status._scope._cancel_reason)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this class, though I guess it's temporary if #3233 is merged....
I am confused about the exact justification, which probably makes sense but...:
I did not find a good way of binding a CancelReason or Cancelled object to raise_cancel as a function ...
Can't you make a cancel reason / cancelled be a nonlocal? I'm not sure what more binding you need.
I can see why it's not possible to have Cancelled created outside, but could self._cancel_status._scope._cancel_reason
be used? Why not?
run_sync_soon_nursery.cancel_scope._cancel( | ||
CancelReason( | ||
source="shutdown", | ||
reason="main task done, shut down run_sync_soon callbacks", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reason="main task done, shut down run_sync_soon callbacks", | |
reason="main task done, shutting down run_sync_soon callbacks", |
to match with above
and task._runner.ki_pending | ||
): | ||
task._cancel_status._scope._cancel_reason = CancelReason( | ||
source="KeyboardInterrupt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of strange to me to set this here. Is there no way for the thing raising KeyboardInterrupt
to set this cancel reason?
@@ -0,0 +1,151 @@ | |||
# there's tons of cancellation testing in test_run, but that file is 3k lines long already... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if you want to split the test files, could you move some of the cancellation tests here? (It doesn't have to be every one of them, just a quick search + copy paste)
cs.cancel(reason="hello") | ||
with pytest.raises( | ||
Cancelled, | ||
match=rf"^cancelled due to explicit with reason 'hello' from task {current_task()!r}$", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this wording. IMO even leaving out the cancellation reason if it's explicit would be preferable. Like I would rather read:
cancelled with reason 'hello' from task {current_task()!r}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also could this test also check the introspection attributes?
await nursery.start(cancelled_task, fail_task) | ||
|
||
|
||
async def test_cancel_reason_nursery2() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment explaining the difference? It took me a bit to see the difference is whether the task is cancelled before trio.sleep_forever
or after.
with RaisesGroup(ValueError): | ||
async with trio.open_nursery() as nursery: | ||
nursery.start_soon(child) | ||
await ev.wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way I think this and others could use trio.testing.wait_all_tasks_blocked
instead (and avoid having an event).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One missing test: how about a task that is stated via nursery.start
and raises before task_status.started()
?
assert str(cancelled) == "cancelled due to explicit" | ||
assert re.fullmatch( | ||
r"cancelled due to deadline from task " | ||
r"<Task 'trio._core._tests.test_run.test_Cancelled_str' at 0x\w*>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r"<Task 'trio._core._tests.test_run.test_Cancelled_str' at 0x\w*>", | |
rf"{trio.lowlevel.current_task()!r}", |
? Same elsewhere.
fixes #3232 by adding a
CancelReason
that's tracked by bothCancelScope
andCancelStatus
and then used when constructing aCancelled
to be raised.Remaining stuff:
CancelScope
andCancelStatus
needs to have aCancelReason
member - initially onlyCancelStatus
had it but that wasn't enough, but I haven't tried to see if I can get away with onlyCancelScope
having it.CancelScope
cs.deadline=-inf
and then checkpointing somehow ends up as anexplicit
cancelled.from_thread_check_cancelled
needs some special handling