Skip to content

[Bug] spurious "exception in shielded future" warnings on Python 3.11+ #1504

@Buggytheclown

Description

@Buggytheclown

run_child() does not call task.uncancel() after catching CancelledError, causing spurious "exception in shielded future" warnings on Python 3.11+

Expected Behavior

Calling handle.cancel() on a ChildWorkflowHandle should send exactly one
RequestCancelExternalWorkflow command to the Temporal server. The awaiter of the handle
should subsequently receive a ChildWorkflowError (caused by CancelledError) with no
additional log output or side effects.

Actual Behavior

On Python 3.11+, calling handle.cancel() on a running ChildWorkflowHandle causes
multiple RequestCancelExternalWorkflow commands to be sent to the Temporal server
instead of one. The worker also emits one or more ERROR-level log lines per
cancellation from temporalio.worker._workflow_instance:

ChildWorkflowError exception in shielded future
future: <Future finished exception=ChildWorkflowError('Child Workflow execution cancelled')>
temporalio.exceptions.CancelledError: Cancelled

The above exception was the direct cause of the following exception:

temporalio.exceptions.ChildWorkflowError: Child Workflow execution cancelled

Workflow execution and state are not affected — the awaiter still receives
ChildWorkflowError as expected, but the duplicate commands and log noise indicate
incorrect internal behavior.

Root Cause

run_child() in temporalio/worker/_workflow_instance.py catches CancelledError
without calling task.uncancel():

async def run_child() -> Any:
    while True:
        try:
            return await asyncio.shield(handle._result_fut)
        except asyncio.CancelledError:
            apply_child_cancel_error()
            # missing: task.uncancel()

In Python 3.11, asyncio.Task introduced a cancellation counter (Task.cancelling() /
Task.uncancel()). When task.cancel() is called and the coroutine catches
CancelledError without calling task.uncancel(), the cancelling counter remains at 1.
Python re-throws CancelledError at every subsequent await, so run_child() loops:

  1. await asyncio.shield(handle._result_fut)CancelledError thrown immediately
  2. except CancelledError catches it, calls apply_child_cancel_error() again
  3. loop → step 1

Each asyncio.shield() call creates a new outer future that registers a done-callback
on handle._result_fut. When handle._result_fut eventually resolves, every registered
callback fires and logs a warning — one per loop iteration. This is observable across
Python 3.11–3.14; the exact internal mechanism varies by CPython version.

The loop does not hang forever only because asyncio.shield() has an early-exit path:
once handle._result_fut is done, shield() returns it directly without suspending,
so _must_cancel is never re-checked and the loop exits normally:

def shield(arg):
    inner = ensure_future(arg)
    if inner.done():
        return inner  # no suspension → _must_cancel not re-checked

Proposed fix — call task.uncancel() after catching CancelledError:

async def run_child() -> Any:
    while True:
        try:
            return await asyncio.shield(handle._result_fut)
        except asyncio.CancelledError:
            apply_child_cancel_error()
            if (t := asyncio.current_task()) is not None and hasattr(t, "uncancel"):
                t.uncancel()  # clear cancelling counter on Python 3.11+

After uncancel() the cancellation counter is 0, so the next await asyncio.shield(...)
blocks normally until handle._result_fut resolves rather than re-raising CancelledError
immediately. The while True loop is then the correct structure: it waits for the result
after the cancel has been handled.

Steps to Reproduce the Problem

  1. Create a parent workflow that starts a child workflow via asyncio.create_task and
    cancels the child handle from a concurrent update handler.
  2. Run on Python 3.11+.
  3. Send the update that calls handle.cancel().
  4. Observe ERROR log lines from temporalio.worker._workflow_instance.

Minimal reproduction:

import asyncio
import logging
import uuid

from temporalio import workflow
from temporalio.client import Client
from temporalio.worker import Worker


@workflow.defn
class ChildWorkflow:
    @workflow.run
    async def run(self) -> None:
        await asyncio.sleep(9999)


@workflow.defn
class ParentWorkflow:
    def __init__(self) -> None:
        self._handle: workflow.ChildWorkflowHandle | None = None
        self._cancelled = False

    @workflow.run
    async def run(self) -> None:
        self._handle = await workflow.start_child_workflow(
            ChildWorkflow.run,
            id=f"child-{workflow.info().workflow_id}",
            cancellation_type=workflow.ChildWorkflowCancellationType.WAIT_CANCELLATION_COMPLETED,
        )
        asyncio.create_task(self._run_child())
        await workflow.wait_condition(lambda: self._cancelled)

    async def _run_child(self) -> None:
        try:
            await self._handle
        except Exception:
            pass
        finally:
            self._cancelled = True

    @workflow.update
    async def cancel_child(self) -> None:
        assert self._handle is not None
        self._handle.cancel()


async def main() -> None:
    logging.basicConfig(level=logging.DEBUG)

    client = await Client.connect("localhost:7233")

    async with Worker(
        client,
        task_queue="test",
        workflows=[ParentWorkflow, ChildWorkflow],
    ):
        handle = await client.start_workflow(
            ParentWorkflow.run,
            id=f"parent-{uuid.uuid4()}",
            task_queue="test",
        )
        await handle.execute_update(ParentWorkflow.cancel_child)
        await handle.result()
        # observe ERROR lines from temporalio.worker._workflow_instance in the log


asyncio.run(main())

Specifications

  • SDK version: temporalio 1.18.0
  • Python version: 3.14.4
  • Platform: Darwin 24.6.0 (macOS)
  • Affected: Python ≥ 3.11 (any version that introduced Task.uncancel())
  • Not affected: Python ≤ 3.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions