Skip to content

Conversation

@VegetarianOrc
Copy link
Contributor

What was changed

This PR adds code generation of the RPC call wrapper in the bridge client to ensure ease the maintenance of manually updating the methods when core has a proto update. This generation has been added to the end of the gen_protos_docker script.

Some changes were included to make the generation possible:

  1. To eliminate a code generation path, the rpc_call and rpc_call_on_trait macros have been refactored to only include an exported macro named rpc_call that uses the fully qualified call syntax that was previously in the body of rpc_call_on_trait.
  2. The multiple-pymethods feature of PyO3 was enabled to allow the ClientRef impl to be split into two modules.
  3. The visibility of some elements of client module was elevated to pub(crate) for referencing from the client_rpc_generated module.

Why?

This allows us to easily keep the bridge client up to date with the RPC methods exposed via core.

Checklist

  1. Closes [Bug] Batch operation feature in the Temporal Python SDK doesn't work #927

  2. How was this tested:

The generated client was tested against the existing test suite and used to ensure the repro described in #927 correctly functions against a Temporal dev server.

  1. Any docs updates needed?

@VegetarianOrc VegetarianOrc requested a review from a team as a code owner September 25, 2025 21:27
@CLAassistant
Copy link

CLAassistant commented Sep 25, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might also want to generate all of that mess in temporalio/service.py. Off the top of my head, what I would suggest is to generate some file at temporalio/bridge/client_generated.py.

Could do the simple way of just having an init_workflow_service_client_calls(client: ServiceClient) that just does what it does today which is makes a _new_call for each known service name. Alternatively, a cleaner approach may be to just move the WorkflowService (and operator and cloud equivalents) to this generated file and have explicit methods created for each call that do as expected and then alias the service class publicly (or have the public WorkflowService class extend the internal/bridge form).

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just little things

import temporalio.api.workflowservice.v1
import temporalio.bridge.client
import temporalio.bridge.proto.health.v1
import temporalio.bridge.services_generated as svc_gen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import temporalio.bridge.services_generated as svc_gen
import temporalio.bridge.services_generated

Pedantic and not technically required, but usually we fully qualify each use when referring to ourselves in these files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated these to be fully qualified. I also moved the generated python to be in bridge/generated so I could add a pydocstyle exclusion for the file. If that doesn't seem reasonable, I can see about spitting out some autogen'd doc strings.

Copy link
Member

@cretz cretz Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that' be nice. Yeah, one issue about moving the base class into the bridge area is that it won't be as clear on https://python.temporal.io (we consider the bridge private code and don't show it in API docs). This will be the first time user-facing code references the bridge module (in this case for a class extension). But I don't think it's that big of a deal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not huge, but I don't see any good reason to move the code to bridge rather than another generated file. Some autgenerated doc strings seem fine.

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking, though we may want to consider not putting user-facing components in internal bridge module. @tconley1428 - thoughts?

@VegetarianOrc VegetarianOrc force-pushed the bridge/generate_rpc_calls branch from a971dcf to 2812a80 Compare September 30, 2025 17:59
Copy link
Contributor

@tconley1428 tconley1428 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider

file_descriptors: list[FileDescriptor],
output_file: str = "temporalio/bridge/src/client_rpc_generated.rs",
):
print("generating bridge rpc calls")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe one print for the whole thing, but I'm not sure we need all the steps. That said, marginal

futures = "0.3"
prost = "0.13"
pyo3 = { version = "0.25", features = [
"extension-module",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good place to put this comment, but I think we should consider adding a test for the actual user issue, just to have the coverage we fixed it.

Copy link
Contributor Author

@VegetarianOrc VegetarianOrc Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test that executes all the possible rpc calls and ensures that none result in the "Unknown RPC call" exception. The calls are made with empty request objects so exceptions are expected, but the absence of the "Unknown RPC call" exception should indicate that all the rpc calls are wired up from Python -> Bridge -> Core. I confirmed that removing one of the RPC calls from the generated Rust source produces a test failure which was the cause of the user issue.

https://github.com/temporalio/sdk-python/pull/1123/files#diff-d7fb70d71c01e6c6a932249a22ad45e9e8e5dd800f8d9260922b7726ada72402R170-R238

Also open to other approaches on getting the coverage!

import temporalio.api.workflowservice.v1
import temporalio.bridge.client
import temporalio.bridge.proto.health.v1
import temporalio.bridge.services_generated as svc_gen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not huge, but I don't see any good reason to move the code to bridge rather than another generated file. Some autgenerated doc strings seem fine.

"httpx>=0.28.1",
"pytest-pretty>=1.3.0",
"openai-agents[litellm]>=0.3,<0.4",
"googleapis-common-protos==1.70.0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tconley1428 , want to confirm with you that adding this googleapis-common-protos to this dev dependecy list won't cause any adverse effects. The new test that imports to proto descriptors relies on it, but if it's not reasonable to add, I'll modify the test to take a different approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any adverse impact. It would only affect folks working on the repo, not library users. If it is needed to improve the test coverage, worth it I'd say.

rpc_call = getattr(target_service, method_name)
try:
await rpc_call(request, timeout=timedelta(milliseconds=1))
except Exception as err:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take a quick look that the exceptions which do occur are all after the call is made? If any of them fail with like, parameter checking or something before that, then they wouldn't be covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The potential failures happen in the _BridgeServiceClient._rpc_call where there are 3 outcomes leading to exceptions that I'm aware of.

  1. The call to connect the rpc client fails. The connect method calls into the bridge source where it potentially raises a RuntimeError
  2. The rpc call gets rejected by bridge before actually being dispatched due to the unknown RPC call error which is raised as a ValueError in the generated service clients on the Rust side.
  3. The rpc call is proxied through bridge and results in a Bridge RPCError which gets caught and re-raised as a python RPCError by the _BridgeServiceClient.

With those listed out, I've updated the test to explicitly fail on the ValueError and explicitly pass on the RPCError which is what is returned when the call is proxied successfully but is rejected by the server.

I've left the RuntimeError unhandled as the test should have a client that's able to connect based on the existing test structure.

Comment on lines 66 to 67
gen-protos = "uv run scripts/gen_protos.py"
gen-protos-docker = "uv run scripts/gen_protos_docker.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VegetarianOrc - to confirm, we decided to move the simple uv-run type of steps out of gen_protos_docker any as poe sub-commands to these two commands right?

@VegetarianOrc VegetarianOrc merged commit 26e2e61 into main Oct 6, 2025
26 of 27 checks passed
@VegetarianOrc VegetarianOrc deleted the bridge/generate_rpc_calls branch October 6, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Batch operation feature in the Temporal Python SDK doesn't work

5 participants