Summary
I'd like to contribute a Google Cloud Storage external-storage driver to temporalio/contrib, mirroring the structure and conventions of temporalio.contrib.aws.s3driver.
The s3driver landed via #1388 (#1390 was the proposal issue) and has been the obvious template for any cloud-provider object store. Filing this issue first per the contributor guidance — happy to defer the implementation until the design questions below have a maintainer-accepted answer.
What
A new contrib module at temporalio/contrib/gcp/gcsdriver/ that:
- Stores and retrieves Temporal payloads in GCS, gated by
ExternalStorage.payload_size_threshold.
- Splits the same way s3driver does —
_client.py ABC (no GCS dependency), _driver.py driver, plus a built-in concrete client implementation (see design Q2 below for which library).
- Ships under a
temporalio[google-cloud-storage] (or similar) extra so the GCS dependency is opt-in.
- Adds a parallel setup snippet to
temporalio/features.
- Marked Experimental on first release, same as s3driver.
Why
We're running on Google Cloud and need GCS for external storage. With no first-party driver, we built one — the prototype linked below started as our own implementation. Rather than keep maintaining it as a one-off, we'd like to get it into contrib alongside s3driver so the rest of the GCP community doesn't have to repeat the work. Today the options for Temporal users on GCP are roll your own driver or run an S3-compatible proxy in front of GCS; a first-party GCS driver matches what s3driver already provides for AWS — content-addressed dedup, integrity verification, atomic writes, async-safe under concurrent workflows.
Existing prototype
I've built a working version that I plan to fully rework against your conventions: https://github.com/gamepop/adk-temporal-demo/blob/main/src/storage_gcs.py
The structural choices that match s3driver: ABC + adapter split, content-addressed keys, SHA-256 integrity verification on retrieve, custom user-agent and blob metadata for forensic tooling. The structural choices that don't match s3driver are the basis of the design questions below — I'd rather align with your patterns than ship divergence.
Design questions for maintainers
I'd appreciate guidance on these before I open a PR:
1. Key structure: should we adopt s3driver's namespace/workflow scoping?
s3driver writes under v0/ns/{namespace}/wfi/{workflow_id}/d/sha256/{hash}. My prototype uses pure content addressing (v0/d/sha256/{hash}) which dedups across workflows and namespaces. The s3driver pattern preserves namespace isolation — important in multi-tenant deployments. I'd default to matching s3driver unless there's a reason GCS should diverge.
2. Built-in client: sync google-cloud-storage + asyncio.to_thread, or async gcloud-aio-storage?
s3driver uses aioboto3 (natively async). The official Google library google-cloud-storage is sync-only; the community library gcloud-aio-storage is fully async and well-maintained but isn't a Google project. My prototype wraps the sync library in asyncio.to_thread — works but adds thread-pool pressure under heavy concurrency. Strong preference here from the team?
3. Test infrastructure: storage-testbench for integration tests?
Following s3driver's pattern (moto for integration, in-process fake for unit tests), I'd plan to use storage-testbench as the integration layer. It's maintained by the googleapis org, used by Google's own client libraries for their CI, and supports STORAGE_EMULATOR_HOST for the official Python library. It also offers failure injection via x-goog-emulator-instructions headers (e.g. return-503-after-256K/retry-1) — the parallel of moto's retry-test API, which we'd want for testing DEFAULT_RETRY_IF_GENERATION_SPECIFIED behavior under transient failures. fake-gcs-server is a more widely-known community alternative; I'd treat it as a fallback if storage-testbench has CI friction, but it doesn't explicitly document the if_generation_match semantics the driver relies on. For unit tests we'd keep an in-process fake client, mirroring s3driver's split.
What I'd commit to
- Sign the CLA.
- Open the SDK PR with the file structure, key layout, and built-in client choice that you've signed off on here.
- Open the parallel snippet PR against
temporalio/features.
- Maintain the driver — happy to be on the hook for follow-up issues post-merge.
Thanks!
Summary
I'd like to contribute a Google Cloud Storage external-storage driver to
temporalio/contrib, mirroring the structure and conventions oftemporalio.contrib.aws.s3driver.The s3driver landed via #1388 (#1390 was the proposal issue) and has been the obvious template for any cloud-provider object store. Filing this issue first per the contributor guidance — happy to defer the implementation until the design questions below have a maintainer-accepted answer.
What
A new contrib module at
temporalio/contrib/gcp/gcsdriver/that:ExternalStorage.payload_size_threshold._client.pyABC (no GCS dependency),_driver.pydriver, plus a built-in concrete client implementation (see design Q2 below for which library).temporalio[google-cloud-storage](or similar) extra so the GCS dependency is opt-in.temporalio/features.Why
We're running on Google Cloud and need GCS for external storage. With no first-party driver, we built one — the prototype linked below started as our own implementation. Rather than keep maintaining it as a one-off, we'd like to get it into contrib alongside s3driver so the rest of the GCP community doesn't have to repeat the work. Today the options for Temporal users on GCP are roll your own driver or run an S3-compatible proxy in front of GCS; a first-party GCS driver matches what s3driver already provides for AWS — content-addressed dedup, integrity verification, atomic writes, async-safe under concurrent workflows.
Existing prototype
I've built a working version that I plan to fully rework against your conventions: https://github.com/gamepop/adk-temporal-demo/blob/main/src/storage_gcs.py
The structural choices that match s3driver: ABC + adapter split, content-addressed keys, SHA-256 integrity verification on retrieve, custom user-agent and blob metadata for forensic tooling. The structural choices that don't match s3driver are the basis of the design questions below — I'd rather align with your patterns than ship divergence.
Design questions for maintainers
I'd appreciate guidance on these before I open a PR:
1. Key structure: should we adopt s3driver's namespace/workflow scoping?
s3driver writes under
v0/ns/{namespace}/wfi/{workflow_id}/d/sha256/{hash}. My prototype uses pure content addressing (v0/d/sha256/{hash}) which dedups across workflows and namespaces. The s3driver pattern preserves namespace isolation — important in multi-tenant deployments. I'd default to matching s3driver unless there's a reason GCS should diverge.2. Built-in client: sync
google-cloud-storage+asyncio.to_thread, or asyncgcloud-aio-storage?s3driver uses
aioboto3(natively async). The official Google librarygoogle-cloud-storageis sync-only; the community librarygcloud-aio-storageis fully async and well-maintained but isn't a Google project. My prototype wraps the sync library inasyncio.to_thread— works but adds thread-pool pressure under heavy concurrency. Strong preference here from the team?3. Test infrastructure:
storage-testbenchfor integration tests?Following s3driver's pattern (moto for integration, in-process fake for unit tests), I'd plan to use
storage-testbenchas the integration layer. It's maintained by thegoogleapisorg, used by Google's own client libraries for their CI, and supportsSTORAGE_EMULATOR_HOSTfor the official Python library. It also offers failure injection viax-goog-emulator-instructionsheaders (e.g.return-503-after-256K/retry-1) — the parallel of moto's retry-test API, which we'd want for testingDEFAULT_RETRY_IF_GENERATION_SPECIFIEDbehavior under transient failures.fake-gcs-serveris a more widely-known community alternative; I'd treat it as a fallback ifstorage-testbenchhas CI friction, but it doesn't explicitly document theif_generation_matchsemantics the driver relies on. For unit tests we'd keep an in-process fake client, mirroring s3driver's split.What I'd commit to
temporalio/features.Thanks!