-
Notifications
You must be signed in to change notification settings - Fork 136
Agents SDK: startActivity spans should finish when the activity has been sent to the server #1172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
set_header_from_context(input, temporalio.workflow.payload_converter()) | ||
if trace: | ||
with custom_span( | ||
name="temporal:signalChildWorkflow", | ||
data={"workflowId": input.child_workflow_id}, | ||
): | ||
set_header_from_context(input, temporalio.workflow.payload_converter()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this code changed here and in a few places. IIRC, it was intentional that we did set_header_from_context
inside the with custom_span
so the header has that current span serialized (and therefore spans that occur inside workflow will have this as a parent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was, but I think it wasn't actually correct to do. Since the signal span would more or less immediately end, I think it is misleading to parent the child workflow execution to the signal span. Rather, the child workflow is parented to the parent workflow's span (or whatever user defined span is active at that point in the workflow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same change I am making for startActivity essentially, the activity execution isn't parented to the startActivity, it's parented to the workflow, because really the start has already completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is misleading to parent the child workflow execution to the signal span
It's not parenting a child workflow execution, it's parenting the child signal handler in this case (though in some cases it does). Not sure I agree here necessarily. At least in OTel at TracingWorkflowInboundInterceptor.handle_signal
, we have chosen to model the signal span as "linked", not the parent, but in the absence of a concept of linking, I think the hierarchy should represent the most targeted span out there regardless of span duration.
It doesn't make sense to me that you have a bunch of orphan outbound spans just on a common workflow span and orphan inbound spans just on a common workflow span that don't relate to each other when they actually do relate to each other. It is totally reasonable in tracing contexts to have a starting span be the parent of the thing it starts even if the start was quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that execution just isn't a child of the start in the normal sense. If we wanted to make a new span which covered the entire lifetime of the activity and was the parent of both start and execute, we could, though I don't really see the benefit that would provide. Doing so will also get us back in the detach handling game.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a child in the normal sense, a child when you start workflow (top-level from client or child workflow from workflow), a child when you start activity (top-level on standalone activities or regular activity from workflow), a child when you start Nexus operation, etc, etc. This is how OTel tracing is done.
I think in this case, the consistency of matching SDK behavior elsewhere is valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Would you agree with the other portion of the change that even if startActivity
is the parent of executeActivity
, it should terminate when the scheduling is completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In OTel, since we can't have an in-workflow duration, it's completed even before scheduling completed. There is no concept of "start completed", so IMO it makes sense for startActivity
to span the entire run of the activity including all attempts, and executeActivity
is actually per attempt (you can have 0 or many executeActivity
spans in a startActivity
).
So it's hard to say since OpenAI tracing is unique in that it lets us represent durations accurately. May be worth checking what TypeScript does here in OTel since that's the only other SDK I know of that lets us track span durations accurately. IMO, spans for starting activity, child, nexus op, signaling, etc should last their entire duration if they can, even though it is called "start" in the name to match our OTel span naming conventions. But it's not a strong opinion.
What was changed
Instead of tracking the startActivity span across the duration of the activity, leave that up to the workflow. StartActivity is finished upon scheduling
Why?
Checklist
Closes [Bug] Langfuse Tracing Not Working with Temporal OpenAI Agents Plugin #1136
How was this tested: