Skip to content

Conversation

tconley1428
Copy link
Contributor

What was changed

Instead of tracking the startActivity span across the duration of the activity, leave that up to the workflow. StartActivity is finished upon scheduling

Why?

Checklist

  1. Closes [Bug] Langfuse Tracing Not Working with Temporal OpenAI Agents Plugin #1136

  2. How was this tested:

  1. Any docs updates needed?

@tconley1428 tconley1428 requested a review from a team as a code owner October 20, 2025 20:01
Comment on lines 374 to 379
set_header_from_context(input, temporalio.workflow.payload_converter())
if trace:
with custom_span(
name="temporal:signalChildWorkflow",
data={"workflowId": input.child_workflow_id},
):
set_header_from_context(input, temporalio.workflow.payload_converter())
Copy link
Member

@cretz cretz Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this code changed here and in a few places. IIRC, it was intentional that we did set_header_from_context inside the with custom_span so the header has that current span serialized (and therefore spans that occur inside workflow will have this as a parent).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was, but I think it wasn't actually correct to do. Since the signal span would more or less immediately end, I think it is misleading to parent the child workflow execution to the signal span. Rather, the child workflow is parented to the parent workflow's span (or whatever user defined span is active at that point in the workflow).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same change I am making for startActivity essentially, the activity execution isn't parented to the startActivity, it's parented to the workflow, because really the start has already completed.

Copy link
Member

@cretz cretz Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is misleading to parent the child workflow execution to the signal span

It's not parenting a child workflow execution, it's parenting the child signal handler in this case (though in some cases it does). Not sure I agree here necessarily. At least in OTel at TracingWorkflowInboundInterceptor.handle_signal, we have chosen to model the signal span as "linked", not the parent, but in the absence of a concept of linking, I think the hierarchy should represent the most targeted span out there regardless of span duration.

It doesn't make sense to me that you have a bunch of orphan outbound spans just on a common workflow span and orphan inbound spans just on a common workflow span that don't relate to each other when they actually do relate to each other. It is totally reasonable in tracing contexts to have a starting span be the parent of the thing it starts even if the start was quick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that execution just isn't a child of the start in the normal sense. If we wanted to make a new span which covered the entire lifetime of the activity and was the parent of both start and execute, we could, though I don't really see the benefit that would provide. Doing so will also get us back in the detach handling game.

Copy link
Member

@cretz cretz Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a child in the normal sense, a child when you start workflow (top-level from client or child workflow from workflow), a child when you start activity (top-level on standalone activities or regular activity from workflow), a child when you start Nexus operation, etc, etc. This is how OTel tracing is done.

I think in this case, the consistency of matching SDK behavior elsewhere is valuable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Would you agree with the other portion of the change that even if startActivity is the parent of executeActivity, it should terminate when the scheduling is completed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In OTel, since we can't have an in-workflow duration, it's completed even before scheduling completed. There is no concept of "start completed", so IMO it makes sense for startActivity to span the entire run of the activity including all attempts, and executeActivity is actually per attempt (you can have 0 or many executeActivity spans in a startActivity).

So it's hard to say since OpenAI tracing is unique in that it lets us represent durations accurately. May be worth checking what TypeScript does here in OTel since that's the only other SDK I know of that lets us track span durations accurately. IMO, spans for starting activity, child, nexus op, signaling, etc should last their entire duration if they can, even though it is called "start" in the name to match our OTel span naming conventions. But it's not a strong opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Langfuse Tracing Not Working with Temporal OpenAI Agents Plugin

3 participants