From bf477a72078aad673194a0d5be7b1c80c4084842 Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Tue, 24 Mar 2026 19:14:49 -0700 Subject: [PATCH 01/14] Add Data handling section for Python and TypeScript SDKs New section with overview, data conversion, data encryption, and large payload storage pages for both SDKs. Keeps existing converters-and-encryption pages in place. Co-Authored-By: Claude Opus 4.6 --- .../python/data-handling/data-conversion.mdx | 160 ++++++++++ .../python/data-handling/data-encryption.mdx | 77 +++++ docs/develop/python/data-handling/index.mdx | 30 ++ .../data-handling/large-payload-storage.mdx | 232 ++++++++++++++ .../data-handling/data-conversion.mdx | 290 ++++++++++++++++++ .../data-handling/data-encryption.mdx | 176 +++++++++++ .../typescript/data-handling/index.mdx | 30 ++ .../data-handling/large-payload-storage.mdx | 55 ++++ sidebars.js | 28 ++ 9 files changed, 1078 insertions(+) create mode 100644 docs/develop/python/data-handling/data-conversion.mdx create mode 100644 docs/develop/python/data-handling/data-encryption.mdx create mode 100644 docs/develop/python/data-handling/index.mdx create mode 100644 docs/develop/python/data-handling/large-payload-storage.mdx create mode 100644 docs/develop/typescript/data-handling/data-conversion.mdx create mode 100644 docs/develop/typescript/data-handling/data-encryption.mdx create mode 100644 docs/develop/typescript/data-handling/index.mdx create mode 100644 docs/develop/typescript/data-handling/large-payload-storage.mdx diff --git a/docs/develop/python/data-handling/data-conversion.mdx b/docs/develop/python/data-handling/data-conversion.mdx new file mode 100644 index 0000000000..398f348317 --- /dev/null +++ b/docs/develop/python/data-handling/data-conversion.mdx @@ -0,0 +1,160 @@ +--- +id: data-conversion +title: Data conversion - Python SDK +sidebar_label: Data conversion +slug: /develop/python/data-handling/data-conversion +toc_max_heading_level: 2 +tags: + - Data Converters + - Python SDK + - Temporal SDKs +description: Customize how Temporal serializes application objects using Payload Converters in the Python SDK, including Pydantic and custom type examples. +--- + +Payload Converters serialize your application objects into a `Payload` and deserialize them back. +A `Payload` is a binary form with metadata that Temporal uses to transport data. + +By default, Temporal uses a `DefaultPayloadConverter` that handles `None`, `bytes`, protobuf messages, and anything JSON-serializable. +You only need a custom Payload Converter when your application uses types that aren't natively supported. + +## Default supported types + +The default Data Converter supports converting multiple types including: + +- `None` +- `bytes` +- `google.protobuf.message.Message` — As JSON when encoding, but can decode binary proto from other languages +- Anything that can be converted to JSON including: + - Anything that [`json.dump`](https://docs.python.org/3/library/json.html#json.dump) supports natively + - [dataclasses](https://docs.python.org/3/library/dataclasses.html) + - Iterables including ones JSON dump may not support by default, e.g. `set` + - [IntEnum, StrEnum](https://docs.python.org/3/library/enum.html) based enumerates + - [UUID](https://docs.python.org/3/library/uuid.html) + +Although Workflows, Updates, Signals, and Queries can all be defined with multiple input parameters, users are strongly +encouraged to use a single `dataclass` or Pydantic model parameter so that fields with defaults can be easily added +without breaking compatibility. +Similar advice applies to return values. + +Classes with generics may not have the generics properly resolved. +The current implementation does not have generic type resolution. +Users should use concrete types. + +## Use Pydantic models + +To use Pydantic model instances, install Pydantic and set the Pydantic Data Converter when creating Client instances: + +```python +from temporalio.contrib.pydantic import pydantic_data_converter + +client = Client(data_converter=pydantic_data_converter, ...) +``` + +This Data Converter supports conversion of all [types supported by Pydantic](https://docs.pydantic.dev/latest/api/standard_library_types/) to and from JSON. +In addition to Pydantic models, supported types include: + +- Everything that [`json.dumps()`](https://docs.python.org/3/library/json.html#py-to-json-table) supports by default. +- Several standard library types that `json.dumps()` does not support, including dataclasses, types from the datetime module, sets, UUID, etc. +- Custom types composed of any of these, with any degree of nesting. + For example, a list of Pydantic models with `datetime` fields. + +See the [Pydantic documentation](https://docs.pydantic.dev/latest/api/standard_library_types/) for full details. + +:::note + +Pydantic v1 isn't supported by this Data Converter. +If you aren't yet able to upgrade from Pydantic v1, see https://github.com/temporalio/samples-python/tree/main/pydantic_converter/v1 for limited v1 support. + +::: + +`datetime.date`, `datetime.time`, and `datetime.datetime` can only be used with the Pydantic Data Converter. + +## How the default converter works + +The default converter is a `CompositePayloadConverter` that tries each encoding converter in order until one handles the value. +Upon serialization, each `EncodingPayloadConverter` is used in order until one succeeds. + +Payload Converters can be customized independently of a Payload Codec. + +## Custom Payload Converters + +To handle custom data types, create a new `EncodingPayloadConverter`. +For example, to support `IPv4Address` types: + +```python +class IPv4AddressEncodingPayloadConverter(EncodingPayloadConverter): + @property + def encoding(self) -> str: + return "text/ipv4-address" + + def to_payload(self, value: Any) -> Optional[Payload]: + if isinstance(value, ipaddress.IPv4Address): + return Payload( + metadata={"encoding": self.encoding.encode()}, + data=str(value).encode(), + ) + else: + return None + + def from_payload(self, payload: Payload, type_hint: Optional[Type] = None) -> Any: + assert not type_hint or type_hint is ipaddress.IPv4Address + return ipaddress.IPv4Address(payload.data.decode()) + +class IPv4AddressPayloadConverter(CompositePayloadConverter): + def __init__(self) -> None: + # Just add ours as first before the defaults + super().__init__( + IPv4AddressEncodingPayloadConverter(), + *DefaultPayloadConverter.default_encoding_payload_converters, + ) + +my_data_converter = dataclasses.replace( + DataConverter.default, + payload_converter_className=IPv4AddressPayloadConverter, +) +``` + +### Customize the JSON converter for custom types + +If you need your custom type to work in lists, unions, and other collections, customize the existing JSON converter instead of adding a new encoding converter. +The JSON converter is the last in the list, so it handles any otherwise unknown type. + +Customize serialization with a custom `json.JSONEncoder` and deserialization with a custom `JSONTypeConverter`: + +```python +class IPv4AddressJSONEncoder(AdvancedJSONEncoder): + def default(self, o: Any) -> Any: + if isinstance(o, ipaddress.IPv4Address): + return str(o) + return super().default(o) + +class IPv4AddressJSONTypeConverter(JSONTypeConverter): + def to_typed_value( + self, hint: Type, value: Any + ) -> Union[Optional[Any], _JSONTypeConverterUnhandled]: + if issubclass(hint, ipaddress.IPv4Address): + return ipaddress.IPv4Address(value) + return JSONTypeConverter.Unhandled + +class IPv4AddressPayloadConverter(CompositePayloadConverter): + def __init__(self) -> None: + # Replace default JSON plain with our own that has our encoder and type + # converter + json_converter = JSONPlainPayloadConverter( + encoder=IPv4AddressJSONEncoder, + custom_type_converters=[IPv4AddressJSONTypeConverter()], + ) + super().__init__( + *[ + c if not isinstance(c, JSONPlainPayloadConverter) else json_converter + for c in DefaultPayloadConverter.default_encoding_payload_converters + ] + ) + +my_data_converter = dataclasses.replace( + DataConverter.default, + payload_converter_className=IPv4AddressPayloadConverter, +) +``` + +Now `IPv4Address` can be used in type hints including collections, optionals, etc. diff --git a/docs/develop/python/data-handling/data-encryption.mdx b/docs/develop/python/data-handling/data-encryption.mdx new file mode 100644 index 0000000000..7d359a8238 --- /dev/null +++ b/docs/develop/python/data-handling/data-encryption.mdx @@ -0,0 +1,77 @@ +--- +id: data-encryption +title: Data encryption - Python SDK +sidebar_label: Data encryption +slug: /develop/python/data-handling/data-encryption +toc_max_heading_level: 2 +tags: + - Security + - Encryption + - Codec Server + - Python SDK + - Temporal SDKs +description: Encrypt data sent to and from the Temporal Service using a custom Payload Codec in the Python SDK. +--- + +Payload Codecs transform `Payload` bytes after serialization (by the Payload Converter) and before the data is sent to the Temporal Service. +Unlike Payload Converters, codecs run outside the Workflow sandbox, so they can use non-deterministic operations and call external services. + +The most common use case is encryption: encrypting payloads before they reach the Temporal Service so that sensitive data is never stored in plaintext. + +## PayloadCodec interface + +Implement a `PayloadCodec` with `encode()` and `decode()` methods. +These should loop through all of a Workflow's payloads, perform your marshaling, compression, or encryption steps in order, and set an `"encoding"` metadata field. + +In this example, the `encode` method compresses a payload using Python's [cramjam](https://github.com/milesgranger/cramjam) library to provide `snappy` compression. +The `decode()` function implements the `encode()` logic in reverse: + +```python +import cramjam +from temporalio.api.common.v1 import Payload +from temporalio.converter import PayloadCodec + +class EncryptionCodec(PayloadCodec): + async def encode(self, payloads: Iterable[Payload]) -> List[Payload]: + return [ + Payload( + metadata={ + "encoding": b"binary/snappy", + }, + data=(bytes(cramjam.snappy.compress(p.SerializeToString()))), + ) + for p in payloads + ] + + async def decode(self, payloads: Iterable[Payload]) -> List[Payload]: + ret: List[Payload] = [] + for p in payloads: + if p.metadata.get("encoding", b"").decode() != "binary/snappy": + ret.append(p) + continue + ret.append(Payload.FromString(bytes(cramjam.snappy.decompress(p.data)))) + return ret +``` + +## Configure the codec on the Data Converter + +Add a `data_converter` parameter to your `Client.connect()` options that overrides the default converter with your Payload Codec: + +```python +from codec import EncryptionCodec + +client = await Client.connect( + "localhost:7233", + data_converter=dataclasses.replace( + temporalio.converter.default(), payload_codec=EncryptionCodec() + ), +) +``` + +For reference, see the [Encryption](https://github.com/temporalio/samples-python/tree/main/encryption) sample. + +## Codec Server + +A Codec Server is an HTTP server that runs your `PayloadCodec` remotely, so that the Temporal Web UI and CLI can decode encrypted payloads for display. + +For more information, see [Codec Server](/codec-server). diff --git a/docs/develop/python/data-handling/index.mdx b/docs/develop/python/data-handling/index.mdx new file mode 100644 index 0000000000..2e03cfcab4 --- /dev/null +++ b/docs/develop/python/data-handling/index.mdx @@ -0,0 +1,30 @@ +--- +id: data-handling +title: Data handling - Python SDK +sidebar_label: Data handling +slug: /develop/python/data-handling +description: Learn how Temporal handles data through the Data Converter pipeline, including payload conversion, encryption, and large payload storage. +toc_max_heading_level: 2 +tags: + - Python SDK + - Temporal SDKs + - Data Converters +--- + +All data sent to and from the Temporal Service passes through the **Data Converter** pipeline. +The pipeline has three layers that handle different concerns: + +``` +User code → PayloadConverter → PayloadCodec → ExternalStorage → Temporal Service +``` + +| | [PayloadConverter](/develop/python/data-handling/data-conversion) | [PayloadCodec](/develop/python/data-handling/data-encryption) | [ExternalStorage](/develop/python/data-handling/large-payload-storage) | +| --- | --- | --- | --- | +| **Purpose** | Serialize types to bytes | Transform encoded payloads (encrypt, compress) | Offload large payloads to external store | +| **Must be deterministic** | Yes | No | No | +| **Default** | JSON serialization | None (passthrough) | None (passthrough) | + +By default, Temporal uses JSON serialization with no codec and no external storage. +You only need to customize these layers when your application requires non-JSON types, encryption, or payload offloading. + +For a deeper conceptual explanation, see the [Data Conversion encyclopedia](/dataconversion). diff --git a/docs/develop/python/data-handling/large-payload-storage.mdx b/docs/develop/python/data-handling/large-payload-storage.mdx new file mode 100644 index 0000000000..5e82e3b982 --- /dev/null +++ b/docs/develop/python/data-handling/large-payload-storage.mdx @@ -0,0 +1,232 @@ +--- +id: large-payload-storage +title: Large payload storage - Python SDK +sidebar_label: Large payload storage +slug: /develop/python/data-handling/large-payload-storage +toc_max_heading_level: 2 +tags: + - Python SDK + - Temporal SDKs + - Data Converters +description: Offload large payloads to external storage using the claim check pattern in the Python SDK. +--- + +:::caution Experimental + +External payload storage is an experimental feature. +The API may change in future releases. + +::: + +The Temporal Service enforces a ~2 MB per payload limit. +When your Workflows or Activities handle data larger than this, you can offload payloads to external storage (such as S3) and pass a small reference token through the event history instead. +This is sometimes called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). + +External storage sits at the end of the data pipeline, after both the Payload Converter and the Payload Codec: + +``` +User code → PayloadConverter → PayloadCodec → External Storage → Wire → Temporal Service +``` + +When a payload exceeds a configurable size threshold (default 256 KiB), the storage driver uploads it to your external store and replaces it with a lightweight reference. +Payloads below the threshold stay inline in the event history. +On the way back, reference payloads are retrieved from external storage before the codec decodes them. + +Because external storage runs after the codec, payloads are already encrypted (if you use an encryption codec) before they're uploaded to your store. + +## Store and retrieve large payloads using external storage + +To offload large payloads, implement a `StorageDriver` and configure it on your `DataConverter`. +The driver needs a `store()` method to upload payloads and a `retrieve()` method to fetch them back. + +### Implement a storage driver + +Extend the `StorageDriver` abstract class: + +```python +from temporalio.converter import ( + StorageDriver, + StorageDriverClaim, + StorageDriverStoreContext, + StorageDriverRetrieveContext, +) +from temporalio.api.common.v1 import Payload +from typing import Sequence + + +class S3StorageDriver(StorageDriver): + def __init__(self, bucket: str) -> None: + self._bucket = bucket + self._s3 = boto3.client("s3") + + def name(self) -> str: + return "s3" + + async def store(self, context, payloads): + # Upload payloads, return claims (see below) + ... + + async def retrieve(self, context, claims): + # Download payloads using claims (see below) + ... +``` + +### Store payloads + +The `store()` method receives a sequence of payloads and must return exactly one `StorageDriverClaim` per payload. +A claim is a set of string key-value pairs that the driver uses to locate the payload later — typically a storage key or URL. + +```python +async def store( + self, + context: StorageDriverStoreContext, + payloads: Sequence[Payload], +) -> list[StorageDriverClaim]: + claims = [] + for payload in payloads: + key = f"payloads/{uuid.uuid4()}" + self._s3.put_object( + Bucket=self._bucket, + Key=key, + Body=payload.SerializeToString(), + ) + claims.append(StorageDriverClaim(claim_data={"key": key})) + return claims +``` + +### Retrieve payloads + +The `retrieve()` method receives the claims that `store()` produced and must return the original payloads: + +```python +async def retrieve( + self, + context: StorageDriverRetrieveContext, + claims: Sequence[StorageDriverClaim], +) -> list[Payload]: + payloads = [] + for claim in claims: + response = self._s3.get_object( + Bucket=self._bucket, + Key=claim.claim_data["key"], + ) + payload = Payload() + payload.ParseFromString(response["Body"].read()) + payloads.append(payload) + return payloads +``` + +### Configure external storage on the Data Converter + +Pass an `ExternalStorage` instance to your `DataConverter`: + +```python +from temporalio.converter import DataConverter, ExternalStorage + +converter = DataConverter( + external_storage=ExternalStorage( + drivers=[S3StorageDriver("my-bucket")], + payload_size_threshold=256 * 1024, # 256 KiB (default) + ), +) +``` + +Use this converter when creating your Client and Worker: + +```python +from temporalio.client import Client + +client = await Client.connect( + "localhost:7233", + data_converter=converter, +) +``` + +### Adjust the size threshold + +The `payload_size_threshold` controls which payloads get offloaded. +Payloads smaller than this value stay inline in the event history. + +```python +ExternalStorage( + drivers=[driver], + payload_size_threshold=100 * 1024, # 100 KiB +) +``` + +Set it to `None` to externalize all payloads regardless of size. + +### Use multiple storage drivers + +When you have multiple drivers (for example, hot and cold storage tiers), provide a `driver_selector` function that chooses which driver handles each payload: + +```python +hot_driver = S3StorageDriver("hot-bucket") +cold_driver = S3StorageDriver("cold-bucket") + +ExternalStorage( + drivers=[hot_driver, cold_driver], + driver_selector=lambda context, payload: ( + cold_driver if payload.ByteSize() > 1_000_000 else hot_driver + ), + payload_size_threshold=100 * 1024, +) +``` + +Return `None` from the selector to keep a specific payload inline. + +## Organize stored payloads by Workflow + +Without additional context, stored payloads end up keyed by random identifiers, making cleanup and debugging difficult. +The `StorageDriverStoreContext` includes a `serialization_context` field that provides the identity of the Workflow or Activity that owns the data: + +```python +async def store( + self, + context: StorageDriverStoreContext, + payloads: Sequence[Payload], +) -> list[StorageDriverClaim]: + claims = [] + prefix = "payloads" + + # Use workflow identity for the storage key when available + sc = context.serialization_context + if sc is not None: + prefix = f"payloads/{sc.namespace}/{sc.workflow_id}" + + for payload in payloads: + key = f"{prefix}/{uuid.uuid4()}" + self._s3.put_object( + Bucket=self._bucket, + Key=key, + Body=payload.SerializeToString(), + ) + claims.append(StorageDriverClaim(claim_data={"key": key})) + return claims +``` + +This groups stored objects by namespace and Workflow ID, so you can find, inspect, or clean up payloads for a specific Workflow execution. + +## View externally stored payloads in the UI + +When payloads are offloaded, the Temporal Web UI shows reference tokens instead of actual data. +To make the UI display the real payload content, run a [Codec Server](/codec-server) that knows how to retrieve payloads from your external store. + +## Use a gRPC proxy instead of a codec + +Instead of implementing storage in each SDK client, you can run a centralized gRPC proxy between your workers and the Temporal Service that intercepts and offloads large payloads transparently. +The [`go.temporal.io/api/proxy`](https://pkg.go.dev/go.temporal.io/api/proxy) package provides the building blocks for this approach. + +If you use a gRPC proxy that alters payload sizes, disable the worker's eager payload size validation so it doesn't reject payloads that the proxy will shrink before they reach the server: + +```python +from temporalio.worker import Worker + +worker = Worker( + client, + task_queue="my-task-queue", + workflows=[MyWorkflow], + activities=[my_activity], + disable_payload_error_limit=True, +) +``` diff --git a/docs/develop/typescript/data-handling/data-conversion.mdx b/docs/develop/typescript/data-handling/data-conversion.mdx new file mode 100644 index 0000000000..c524d3d265 --- /dev/null +++ b/docs/develop/typescript/data-handling/data-conversion.mdx @@ -0,0 +1,290 @@ +--- +id: data-conversion +title: Data conversion - TypeScript SDK +sidebar_label: Data conversion +slug: /develop/typescript/data-handling/data-conversion +toc_max_heading_level: 2 +tags: + - Data Converters + - TypeScript SDK + - Temporal SDKs +description: Customize how Temporal serializes application objects using Payload Converters in the TypeScript SDK, including EJSON and protobuf examples. +--- + +Payload Converters serialize your application objects into a `Payload` and deserialize them back. +A `Payload` is a binary form with metadata that Temporal uses to transport data. + +By default, Temporal uses a `DefaultPayloadConverter` that handles `undefined`, binary data, and JSON. +You only need a custom Payload Converter when your application uses types that aren't natively JSON-serializable, such as `BigInt`, `Date`, or protobufs. + +## How the default converter works + +The default converter is a `CompositePayloadConverter` that tries each converter in order until one handles the value: + +```typescript +export class DefaultPayloadConverter extends CompositePayloadConverter { + constructor() { + super( + new UndefinedPayloadConverter(), + new BinaryPayloadConverter(), + new JsonPayloadConverter(), + ); + } +} +``` + +During serialization, the Data Converter tries each Payload Converter in sequence until one returns a non-null `Payload`. + +## Custom Payload Converters + +To handle custom data types, implement the [`PayloadConverter`](https://typescript.temporal.io/api/interfaces/common.PayloadConverter) interface: + +```typescript +interface PayloadConverter { + toPayload(value: T): Payload; + fromPayload(payload: Payload): T; +} +``` + +Then provide it to the Worker and Client through a `CompositePayloadConverter`: + +```typescript +export const payloadConverter = new CompositePayloadConverter( + new UndefinedPayloadConverter(), + new EjsonPayloadConverter(), +); +``` + +```typescript +// Worker +const worker = await Worker.create({ + workflowsPath: require.resolve('./workflows'), + taskQueue: 'ejson', + dataConverter: { + payloadConverterPath: require.resolve('./payload-converter'), + }, +}); +``` + +```typescript +// Client +const client = new Client({ + dataConverter: { + payloadConverterPath: require.resolve('./payload-converter'), + }, +}); +``` + +### EJSON example + +The [samples-typescript/ejson](https://github.com/temporalio/samples-typescript/tree/main/ejson) sample creates a custom `PayloadConverter` using EJSON, which supports types like `Date`, `RegExp`, `Infinity`, and `Uint8Array`. + + +[ejson/src/ejson-payload-converter.ts](https://github.com/temporalio/samples-typescript/blob/main/ejson/src/ejson-payload-converter.ts) +```ts +import { + EncodingType, + METADATA_ENCODING_KEY, + Payload, + PayloadConverterWithEncoding, + PayloadConverterError, +} from '@temporalio/common'; +import EJSON from 'ejson'; +import { decode, encode } from '@temporalio/common/lib/encoding'; + +/** + * Converts between values and [EJSON](https://docs.meteor.com/api/ejson.html) Payloads. + */ +export class EjsonPayloadConverter implements PayloadConverterWithEncoding { + // Use 'json/plain' so that Payloads are displayed in the UI + public encodingType = 'json/plain' as EncodingType; + + public toPayload(value: unknown): Payload | undefined { + if (value === undefined) return undefined; + let ejson; + try { + ejson = EJSON.stringify(value); + } catch (e) { + throw new UnsupportedEjsonTypeError( + `Can't run EJSON.stringify on this value: ${value}. Either convert it (or its properties) to EJSON-serializable values (see https://docs.meteor.com/api/ejson.html ), or create a custom data converter. EJSON.stringify error message: ${errorMessage( + e, + )}`, + e as Error, + ); + } + + return { + metadata: { + [METADATA_ENCODING_KEY]: encode('json/plain'), + // Include an additional metadata field to indicate that this is an EJSON payload + format: encode('extended'), + }, + data: encode(ejson), + }; + } + + public fromPayload(content: Payload): T { + return content.data ? EJSON.parse(decode(content.data)) : content.data; + } +} + +export class UnsupportedEjsonTypeError extends PayloadConverterError { + public readonly name: string = 'UnsupportedJsonTypeError'; + + constructor( + message: string | undefined, + public readonly cause?: Error, + ) { + super(message ?? undefined); + } +} +``` + + +### Protobufs + +To serialize values as [Protocol Buffers](https://protobuf.dev/): + +- Use [protobufjs](https://protobufjs.github.io/protobuf.js/). +- Use runtime-loaded messages (not generated classes) and `MessageClass.create` (not `new MessageClass()`). +- Generate `json-module.js`: + + ```sh + pbjs -t json-module --workflow-id commonjs -o protos/json-module.js protos/*.proto + ``` + +- Patch `json-module.js`: + + +[protobufs/protos/root.js](https://github.com/temporalio/samples-typescript/blob/main/protobufs/protos/root.js) +```js +const { patchProtobufRoot } = require('@temporalio/common/lib/protobufs'); +const unpatchedRoot = require('./json-module'); +module.exports = patchProtobufRoot(unpatchedRoot); +``` + + +- Generate `root.d.ts`: + + ```sh + pbjs -t static-module protos/*.proto | pbts -o protos/root.d.ts - + ``` + +- Create a [`DefaultPayloadConverterWithProtobufs`](https://typescript.temporal.io/api/classes/protobufs.DefaultPayloadConverterWithProtobufs/): + + +[protobufs/src/payload-converter.ts](https://github.com/temporalio/samples-typescript/blob/main/protobufs/src/payload-converter.ts) +```ts +import { DefaultPayloadConverterWithProtobufs } from '@temporalio/common/lib/protobufs'; +import root from '../protos/root'; + +export const payloadConverter = new DefaultPayloadConverterWithProtobufs({ protobufRoot: root }); +``` + + +For binary-encoded protobufs (saves space but can't be viewed in the Web UI): + +```ts +import { ProtobufBinaryPayloadConverter } from '@temporalio/common/lib/protobufs'; +import root from '../protos/root'; + +export const payloadConverter = new ProtobufBinaryPayloadConverter(root); +``` + +For binary-encoded protobufs alongside other default types: + +```ts +import { + BinaryPayloadConverter, + CompositePayloadConverter, + JsonPayloadConverter, + UndefinedPayloadConverter, +} from '@temporalio/common'; +import { ProtobufBinaryPayloadConverter } from '@temporalio/common/lib/protobufs'; +import root from '../protos/root'; + +export const payloadConverter = new CompositePayloadConverter( + new UndefinedPayloadConverter(), + new BinaryPayloadConverter(), + new ProtobufBinaryPayloadConverter(root), + new JsonPayloadConverter(), +); +``` + +Provide the converter to the Worker and Client: + + +[protobufs/src/worker.ts](https://github.com/temporalio/samples-typescript/blob/main/protobufs/src/worker.ts) +```ts +const worker = await Worker.create({ + workflowsPath: require.resolve('./workflows'), + activities, + taskQueue: 'protobufs', + dataConverter: { payloadConverterPath: require.resolve('./payload-converter') }, +}); +``` + + + +[protobufs/src/client.ts](https://github.com/temporalio/samples-typescript/blob/main/protobufs/src/client.ts) +```ts +import { Client, Connection } from '@temporalio/client'; +import { loadClientConnectConfig } from '@temporalio/envconfig'; +import { v4 as uuid } from 'uuid'; +import { foo, ProtoResult } from '../protos/root'; +import { example } from './workflows'; + +async function run() { + const config = loadClientConnectConfig(); + const connection = await Connection.connect(config.connectionOptions); + const client = new Client({ + connection, + dataConverter: { payloadConverterPath: require.resolve('./payload-converter') }, + }); + + const handle = await client.workflow.start(example, { + args: [foo.bar.ProtoInput.create({ name: 'Proto', age: 2 })], + // can't do: + // args: [new foo.bar.ProtoInput({ name: 'Proto', age: 2 })], + taskQueue: 'protobufs', + workflowId: 'my-business-id-' + uuid(), + }); + + console.log(`Started workflow ${handle.workflowId}`); + + const result: ProtoResult = await handle.result(); + console.log(result.toJSON()); +} +``` + + +Use protobufs in your Workflows and Activities: + + +[protobufs/src/workflows.ts](https://github.com/temporalio/samples-typescript/blob/main/protobufs/src/workflows.ts) +```ts +import { proxyActivities } from '@temporalio/workflow'; +import { foo, ProtoResult } from '../protos/root'; +import type * as activities from './activities'; + +const { protoActivity } = proxyActivities({ + startToCloseTimeout: '1 minute', +}); + +export async function example(input: foo.bar.ProtoInput): Promise { + const result = await protoActivity(input); + return result; +} +``` + + + +[protobufs/src/activities.ts](https://github.com/temporalio/samples-typescript/blob/main/protobufs/src/activities.ts) +```ts +import { foo, ProtoResult } from '../protos/root'; + +export async function protoActivity(input: foo.bar.ProtoInput): Promise { + return ProtoResult.create({ sentence: `${input.name} is ${input.age} years old.` }); +} +``` + diff --git a/docs/develop/typescript/data-handling/data-encryption.mdx b/docs/develop/typescript/data-handling/data-encryption.mdx new file mode 100644 index 0000000000..932b5cc7d1 --- /dev/null +++ b/docs/develop/typescript/data-handling/data-encryption.mdx @@ -0,0 +1,176 @@ +--- +id: data-encryption +title: Data encryption - TypeScript SDK +sidebar_label: Data encryption +slug: /develop/typescript/data-handling/data-encryption +toc_max_heading_level: 2 +tags: + - Security + - Encryption + - Codec Server + - TypeScript SDK + - Temporal SDKs +description: Encrypt data sent to and from the Temporal Service using a custom Payload Codec in the TypeScript SDK. +--- + +Payload Codecs transform `Payload` bytes after serialization (by the Payload Converter) and before the data is sent to the Temporal Service. +Unlike Payload Converters, codecs run outside the Workflow sandbox, so they can call external services and use non-deterministic modules. + +The most common use case is encryption: encrypting payloads before they reach the Temporal Service so that sensitive data is never stored in plaintext. + +## PayloadCodec interface + +Implement the [`PayloadCodec`](https://typescript.temporal.io/api/interfaces/common.PayloadCodec) interface: + +```ts +interface PayloadCodec { + encode(payloads: Payload[]): Promise; + decode(payloads: Payload[]): Promise; +} +``` + +`encode` is called before data is sent to the Temporal Service. +`decode` is called when data is received from the Temporal Service. + +## Encryption example + +The following example implements AES-GCM encryption as a `PayloadCodec`: + + +[encryption/src/encryption-codec.ts](https://github.com/temporalio/samples-typescript/blob/main/encryption/src/encryption-codec.ts) +```ts +import { webcrypto as crypto } from 'node:crypto'; +import { METADATA_ENCODING_KEY, Payload, PayloadCodec, ValueError } from '@temporalio/common'; +import { temporal } from '@temporalio/proto'; +import { decode, encode } from '@temporalio/common/lib/encoding'; +import { decrypt, encrypt } from './crypto'; + +const ENCODING = 'binary/encrypted'; +const METADATA_ENCRYPTION_KEY_ID = 'encryption-key-id'; + +export class EncryptionCodec implements PayloadCodec { + constructor( + protected readonly keys: Map, + protected readonly defaultKeyId: string, + ) {} + + static async create(keyId: string): Promise { + const keys = new Map(); + keys.set(keyId, await fetchKey(keyId)); + return new this(keys, keyId); + } + + async encode(payloads: Payload[]): Promise { + return Promise.all( + payloads.map(async (payload) => ({ + metadata: { + [METADATA_ENCODING_KEY]: encode(ENCODING), + [METADATA_ENCRYPTION_KEY_ID]: encode(this.defaultKeyId), + }, + // Encrypt entire payload, preserving metadata + data: await encrypt( + temporal.api.common.v1.Payload.encode(payload).finish(), + this.keys.get(this.defaultKeyId)!, // eslint-disable-line @typescript-eslint/no-non-null-assertion + ), + })), + ); + } + + async decode(payloads: Payload[]): Promise { + return Promise.all( + payloads.map(async (payload) => { + if (!payload.metadata || decode(payload.metadata[METADATA_ENCODING_KEY]) !== ENCODING) { + return payload; + } + if (!payload.data) { + throw new ValueError('Payload data is missing'); + } + + const keyIdBytes = payload.metadata[METADATA_ENCRYPTION_KEY_ID]; + if (!keyIdBytes) { + throw new ValueError('Unable to decrypt Payload without encryption key id'); + } + + const keyId = decode(keyIdBytes); + let key = this.keys.get(keyId); + if (!key) { + key = await fetchKey(keyId); + this.keys.set(keyId, key); + } + const decryptedPayloadBytes = await decrypt(payload.data, key); + console.log('Decrypting payload.data:', payload.data); + return temporal.api.common.v1.Payload.decode(decryptedPayloadBytes); + }), + ); + } +} + +async function fetchKey(_keyId: string): Promise { + // In production, fetch key from a key management system (KMS). You may want to memoize requests if you'll be decoding + // Payloads that were encrypted using keys other than defaultKeyId. + const key = Buffer.from('test-key-test-key-test-key-test!'); + const cryptoKey = await crypto.subtle.importKey( + 'raw', + key, + { + name: 'AES-GCM', + }, + true, + ['encrypt', 'decrypt'], + ); + + return cryptoKey; +} +``` + + +Provide the codec to the Client and Worker through the Data Converter: + + +[encryption/src/client.ts](https://github.com/temporalio/samples-typescript/blob/main/encryption/src/client.ts) +```ts +const client = new Client({ + connection, + dataConverter: await getDataConverter(), +}); + +const handle = await client.workflow.start(example, { + args: ['Alice: Private message for Bob.'], + taskQueue: 'encryption', + workflowId: `my-business-id-${uuid()}`, +}); + +console.log(`Started workflow ${handle.workflowId}`); +console.log(await handle.result()); +``` + + + +[encryption/src/worker.ts](https://github.com/temporalio/samples-typescript/blob/main/encryption/src/worker.ts) +```ts +const worker = await Worker.create({ + workflowsPath: require.resolve('./workflows'), + taskQueue: 'encryption', + dataConverter: await getDataConverter(), +}); +``` + + +When the Client sends `'Alice: Private message for Bob.'` to the Workflow, it gets encrypted on the Client and decrypted in the Worker. +The Workflow receives the decrypted message and appends another message. +When it returns that longer string, the string gets encrypted by the Worker and decrypted by the Client. + + +[encryption/src/workflows.ts](https://github.com/temporalio/samples-typescript/blob/main/encryption/src/workflows.ts) +```ts +export async function example(message: string): Promise { + return `${message}\nBob: Hi Alice, I'm Workflow Bob.`; +} +``` + + +## Codec Server + +A Codec Server is an HTTP server that runs your `PayloadCodec` remotely, so that the Temporal Web UI and CLI can decode encrypted payloads for display. + +For more information, see [Codec Server](/codec-server). diff --git a/docs/develop/typescript/data-handling/index.mdx b/docs/develop/typescript/data-handling/index.mdx new file mode 100644 index 0000000000..c6edf46510 --- /dev/null +++ b/docs/develop/typescript/data-handling/index.mdx @@ -0,0 +1,30 @@ +--- +id: data-handling +title: Data handling - TypeScript SDK +sidebar_label: Data handling +slug: /develop/typescript/data-handling +description: Learn how Temporal handles data through the Data Converter pipeline, including payload conversion, encryption, and large payload storage. +toc_max_heading_level: 2 +tags: + - TypeScript SDK + - Temporal SDKs + - Data Converters +--- + +All data sent to and from the Temporal Service passes through the **Data Converter** pipeline. +The pipeline has three layers that handle different concerns: + +``` +User code → PayloadConverter → PayloadCodec → ExternalStorage → Temporal Service +``` + +| | [PayloadConverter](/develop/typescript/data-handling/data-conversion) | [PayloadCodec](/develop/typescript/data-handling/data-encryption) | [ExternalStorage](/develop/typescript/data-handling/large-payload-storage) | +| --- | --- | --- | --- | +| **Purpose** | Serialize types to bytes | Transform encoded payloads (encrypt, compress) | Offload large payloads to external store | +| **Must be deterministic** | Yes | No | No | +| **Default** | JSON serialization | None (passthrough) | None (passthrough) | + +By default, Temporal uses JSON serialization with no codec and no external storage. +You only need to customize these layers when your application requires non-JSON types, encryption, or payload offloading. + +For a deeper conceptual explanation, see the [Data Conversion encyclopedia](/dataconversion). diff --git a/docs/develop/typescript/data-handling/large-payload-storage.mdx b/docs/develop/typescript/data-handling/large-payload-storage.mdx new file mode 100644 index 0000000000..e3229687ae --- /dev/null +++ b/docs/develop/typescript/data-handling/large-payload-storage.mdx @@ -0,0 +1,55 @@ +--- +id: large-payload-storage +title: Large payload storage - TypeScript SDK +sidebar_label: Large payload storage +slug: /develop/typescript/data-handling/large-payload-storage +toc_max_heading_level: 2 +tags: + - TypeScript SDK + - Temporal SDKs + - Data Converters +description: Offload large payloads to external storage using the claim check pattern in the TypeScript SDK. +--- + +:::info This feature is in development + +Large payload storage support is not yet available in the TypeScript SDK. +This page will be updated when the feature is released. +See the [Python SDK](/develop/python/data-handling/large-payload-storage) for a working implementation. + +::: + +The Temporal Service enforces a ~2 MB per payload limit. +When your Workflows or Activities handle data larger than this, you can offload payloads to external storage (such as S3) and pass a small reference token through the event history instead. +This is sometimes called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). + +External storage sits at the end of the data pipeline, after both the Payload Converter and the Payload Codec: + +``` +User code → PayloadConverter → PayloadCodec → External Storage → Wire → Temporal Service +``` + +When a payload exceeds a configurable size threshold (default 256 KiB), the storage driver uploads it to your external store and replaces it with a lightweight reference. +Payloads below the threshold stay inline in the event history. +On the way back, reference payloads are retrieved from external storage before the codec decodes them. + +Because external storage runs after the codec, payloads are already encrypted (if you use an encryption codec) before they're uploaded to your store. + +## Store large payloads in external storage + +_This section will document how to configure external storage on the TypeScript `DataConverter` once the feature ships._ + +## Organize stored payloads by Workflow + +_This section will document how to use serialization context to key stored objects by Workflow identity instead of random identifiers._ + +## View externally stored payloads in the UI + +_This section will document Codec Server considerations for reference payloads._ + +## Use a gRPC proxy instead of a codec + +Instead of implementing storage in each SDK client, you can run a centralized gRPC proxy between your workers and the Temporal Service that intercepts and offloads large payloads transparently. +The [`go.temporal.io/api/proxy`](https://pkg.go.dev/go.temporal.io/api/proxy) package provides the building blocks for this approach. + +If you use a gRPC proxy that alters payload sizes, you may need to disable the worker's eager payload size validation so it doesn't reject payloads that the proxy will shrink before they reach the server. diff --git a/sidebars.js b/sidebars.js index 4e601b9b3b..786154ca06 100644 --- a/sidebars.js +++ b/sidebars.js @@ -212,6 +212,20 @@ module.exports = { 'develop/python/debugging', 'develop/python/schedules', 'develop/python/converters-and-encryption', + { + type: 'category', + label: 'Data handling', + collapsed: true, + link: { + type: 'doc', + id: 'develop/python/data-handling/data-handling', + }, + items: [ + 'develop/python/data-handling/data-conversion', + 'develop/python/data-handling/data-encryption', + 'develop/python/data-handling/large-payload-storage', + ], + }, 'develop/python/timers', 'develop/python/nexus', 'develop/python/child-workflows', @@ -254,6 +268,20 @@ module.exports = { 'develop/typescript/debugging', 'develop/typescript/schedules', 'develop/typescript/converters-and-encryption', + { + type: 'category', + label: 'Data handling', + collapsed: true, + link: { + type: 'doc', + id: 'develop/typescript/data-handling/data-handling', + }, + items: [ + 'develop/typescript/data-handling/data-conversion', + 'develop/typescript/data-handling/data-encryption', + 'develop/typescript/data-handling/large-payload-storage', + ], + }, 'develop/typescript/timers', 'develop/typescript/nexus', 'develop/typescript/child-workflows', From d83a28f4f619fdb5dc8eacc50263cfbf74077a17 Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Tue, 24 Mar 2026 19:31:27 -0700 Subject: [PATCH 02/14] docs: page structure edit --- .../data-handling/large-payload-storage.mdx | 107 ++---------------- .../data-handling/large-payload-storage.mdx | 31 +++-- 2 files changed, 28 insertions(+), 110 deletions(-) diff --git a/docs/develop/python/data-handling/large-payload-storage.mdx b/docs/develop/python/data-handling/large-payload-storage.mdx index 5e82e3b982..a3eff6250a 100644 --- a/docs/develop/python/data-handling/large-payload-storage.mdx +++ b/docs/develop/python/data-handling/large-payload-storage.mdx @@ -11,13 +11,6 @@ tags: description: Offload large payloads to external storage using the claim check pattern in the Python SDK. --- -:::caution Experimental - -External payload storage is an experimental feature. -The API may change in future releases. - -::: - The Temporal Service enforces a ~2 MB per payload limit. When your Workflows or Activities handle data larger than this, you can offload payloads to external storage (such as S3) and pass a small reference token through the event history instead. This is sometimes called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). @@ -25,7 +18,7 @@ This is sometimes called the [claim check pattern](https://en.wikipedia.org/wiki External storage sits at the end of the data pipeline, after both the Payload Converter and the Payload Codec: ``` -User code → PayloadConverter → PayloadCodec → External Storage → Wire → Temporal Service +User code → PayloadConverter → PayloadCodec → External Storage → Temporal Service ``` When a payload exceeds a configurable size threshold (default 256 KiB), the storage driver uploads it to your external store and replaces it with a lightweight reference. @@ -77,21 +70,7 @@ The `store()` method receives a sequence of payloads and must return exactly one A claim is a set of string key-value pairs that the driver uses to locate the payload later — typically a storage key or URL. ```python -async def store( - self, - context: StorageDriverStoreContext, - payloads: Sequence[Payload], -) -> list[StorageDriverClaim]: - claims = [] - for payload in payloads: - key = f"payloads/{uuid.uuid4()}" - self._s3.put_object( - Bucket=self._bucket, - Key=key, - Body=payload.SerializeToString(), - ) - claims.append(StorageDriverClaim(claim_data={"key": key})) - return claims +Sample implementation: ``` ### Retrieve payloads @@ -99,21 +78,7 @@ async def store( The `retrieve()` method receives the claims that `store()` produced and must return the original payloads: ```python -async def retrieve( - self, - context: StorageDriverRetrieveContext, - claims: Sequence[StorageDriverClaim], -) -> list[Payload]: - payloads = [] - for claim in claims: - response = self._s3.get_object( - Bucket=self._bucket, - Key=claim.claim_data["key"], - ) - payload = Payload() - payload.ParseFromString(response["Body"].read()) - payloads.append(payload) - return payloads +Sample implementation: ``` ### Configure external storage on the Data Converter @@ -134,11 +99,11 @@ converter = DataConverter( Use this converter when creating your Client and Worker: ```python -from temporalio.client import Client - -client = await Client.connect( - "localhost:7233", - data_converter=converter, +converter = DataConverter( + external_storage=ExternalStorage( + drivers=[LocalDiskStorageDriver()], + payload_size_threshold=1_000, # 1KB — low threshold for testing + ), ) ``` @@ -174,59 +139,3 @@ ExternalStorage( ``` Return `None` from the selector to keep a specific payload inline. - -## Organize stored payloads by Workflow - -Without additional context, stored payloads end up keyed by random identifiers, making cleanup and debugging difficult. -The `StorageDriverStoreContext` includes a `serialization_context` field that provides the identity of the Workflow or Activity that owns the data: - -```python -async def store( - self, - context: StorageDriverStoreContext, - payloads: Sequence[Payload], -) -> list[StorageDriverClaim]: - claims = [] - prefix = "payloads" - - # Use workflow identity for the storage key when available - sc = context.serialization_context - if sc is not None: - prefix = f"payloads/{sc.namespace}/{sc.workflow_id}" - - for payload in payloads: - key = f"{prefix}/{uuid.uuid4()}" - self._s3.put_object( - Bucket=self._bucket, - Key=key, - Body=payload.SerializeToString(), - ) - claims.append(StorageDriverClaim(claim_data={"key": key})) - return claims -``` - -This groups stored objects by namespace and Workflow ID, so you can find, inspect, or clean up payloads for a specific Workflow execution. - -## View externally stored payloads in the UI - -When payloads are offloaded, the Temporal Web UI shows reference tokens instead of actual data. -To make the UI display the real payload content, run a [Codec Server](/codec-server) that knows how to retrieve payloads from your external store. - -## Use a gRPC proxy instead of a codec - -Instead of implementing storage in each SDK client, you can run a centralized gRPC proxy between your workers and the Temporal Service that intercepts and offloads large payloads transparently. -The [`go.temporal.io/api/proxy`](https://pkg.go.dev/go.temporal.io/api/proxy) package provides the building blocks for this approach. - -If you use a gRPC proxy that alters payload sizes, disable the worker's eager payload size validation so it doesn't reject payloads that the proxy will shrink before they reach the server: - -```python -from temporalio.worker import Worker - -worker = Worker( - client, - task_queue="my-task-queue", - workflows=[MyWorkflow], - activities=[my_activity], - disable_payload_error_limit=True, -) -``` diff --git a/docs/develop/typescript/data-handling/large-payload-storage.mdx b/docs/develop/typescript/data-handling/large-payload-storage.mdx index e3229687ae..950b998453 100644 --- a/docs/develop/typescript/data-handling/large-payload-storage.mdx +++ b/docs/develop/typescript/data-handling/large-payload-storage.mdx @@ -26,7 +26,7 @@ This is sometimes called the [claim check pattern](https://en.wikipedia.org/wiki External storage sits at the end of the data pipeline, after both the Payload Converter and the Payload Codec: ``` -User code → PayloadConverter → PayloadCodec → External Storage → Wire → Temporal Service +User code → PayloadConverter → PayloadCodec → External Storage → Temporal Service ``` When a payload exceeds a configurable size threshold (default 256 KiB), the storage driver uploads it to your external store and replaces it with a lightweight reference. @@ -35,21 +35,30 @@ On the way back, reference payloads are retrieved from external storage before t Because external storage runs after the codec, payloads are already encrypted (if you use an encryption codec) before they're uploaded to your store. -## Store large payloads in external storage +## Store and retrieve large payloads using external storage -_This section will document how to configure external storage on the TypeScript `DataConverter` once the feature ships._ +_This section will document how to implement a storage driver and configure external storage on the TypeScript `DataConverter` once the feature ships._ -## Organize stored payloads by Workflow +### Implement a storage driver -_This section will document how to use serialization context to key stored objects by Workflow identity instead of random identifiers._ +_Stub: storage driver interface and example._ -## View externally stored payloads in the UI +### Store payloads -_This section will document Codec Server considerations for reference payloads._ +_Stub: store method implementation._ -## Use a gRPC proxy instead of a codec +### Retrieve payloads -Instead of implementing storage in each SDK client, you can run a centralized gRPC proxy between your workers and the Temporal Service that intercepts and offloads large payloads transparently. -The [`go.temporal.io/api/proxy`](https://pkg.go.dev/go.temporal.io/api/proxy) package provides the building blocks for this approach. +_Stub: retrieve method implementation._ -If you use a gRPC proxy that alters payload sizes, you may need to disable the worker's eager payload size validation so it doesn't reject payloads that the proxy will shrink before they reach the server. +### Configure external storage on the Data Converter + +_Stub: DataConverter configuration._ + +### Adjust the size threshold + +_Stub: payload_size_threshold configuration._ + +### Use multiple storage drivers + +_Stub: driver_selector configuration._ From 60f40194002336ecab49217596cab4e38dd68a3f Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Tue, 24 Mar 2026 19:36:51 -0700 Subject: [PATCH 03/14] docs: change page names --- docs/develop/python/data-handling/data-conversion.mdx | 4 ++-- docs/develop/python/data-handling/data-encryption.mdx | 4 ++-- docs/develop/python/data-handling/index.mdx | 6 +++--- docs/develop/typescript/data-handling/data-conversion.mdx | 4 ++-- docs/develop/typescript/data-handling/data-encryption.mdx | 4 ++-- docs/develop/typescript/data-handling/index.mdx | 6 +++--- 6 files changed, 14 insertions(+), 14 deletions(-) diff --git a/docs/develop/python/data-handling/data-conversion.mdx b/docs/develop/python/data-handling/data-conversion.mdx index 398f348317..9d0f665a29 100644 --- a/docs/develop/python/data-handling/data-conversion.mdx +++ b/docs/develop/python/data-handling/data-conversion.mdx @@ -1,7 +1,7 @@ --- id: data-conversion -title: Data conversion - Python SDK -sidebar_label: Data conversion +title: Payload conversion - Python SDK +sidebar_label: Payload conversion slug: /develop/python/data-handling/data-conversion toc_max_heading_level: 2 tags: diff --git a/docs/develop/python/data-handling/data-encryption.mdx b/docs/develop/python/data-handling/data-encryption.mdx index 7d359a8238..fffcb3c035 100644 --- a/docs/develop/python/data-handling/data-encryption.mdx +++ b/docs/develop/python/data-handling/data-encryption.mdx @@ -1,7 +1,7 @@ --- id: data-encryption -title: Data encryption - Python SDK -sidebar_label: Data encryption +title: Payload encryption - Python SDK +sidebar_label: Payload encryption slug: /develop/python/data-handling/data-encryption toc_max_heading_level: 2 tags: diff --git a/docs/develop/python/data-handling/index.mdx b/docs/develop/python/data-handling/index.mdx index 2e03cfcab4..5f9f880867 100644 --- a/docs/develop/python/data-handling/index.mdx +++ b/docs/develop/python/data-handling/index.mdx @@ -3,7 +3,7 @@ id: data-handling title: Data handling - Python SDK sidebar_label: Data handling slug: /develop/python/data-handling -description: Learn how Temporal handles data through the Data Converter pipeline, including payload conversion, encryption, and large payload storage. +description: Learn how Temporal handles data through the Data Converter, including payload conversion, encryption, and large payload storage. toc_max_heading_level: 2 tags: - Python SDK @@ -11,8 +11,8 @@ tags: - Data Converters --- -All data sent to and from the Temporal Service passes through the **Data Converter** pipeline. -The pipeline has three layers that handle different concerns: +All data sent to and from the Temporal Service passes through the **Data Converter**. +The Data Converter has three layers that handle different concerns: ``` User code → PayloadConverter → PayloadCodec → ExternalStorage → Temporal Service diff --git a/docs/develop/typescript/data-handling/data-conversion.mdx b/docs/develop/typescript/data-handling/data-conversion.mdx index c524d3d265..478292685d 100644 --- a/docs/develop/typescript/data-handling/data-conversion.mdx +++ b/docs/develop/typescript/data-handling/data-conversion.mdx @@ -1,7 +1,7 @@ --- id: data-conversion -title: Data conversion - TypeScript SDK -sidebar_label: Data conversion +title: Payload conversion - TypeScript SDK +sidebar_label: Payload conversion slug: /develop/typescript/data-handling/data-conversion toc_max_heading_level: 2 tags: diff --git a/docs/develop/typescript/data-handling/data-encryption.mdx b/docs/develop/typescript/data-handling/data-encryption.mdx index 932b5cc7d1..0d05293356 100644 --- a/docs/develop/typescript/data-handling/data-encryption.mdx +++ b/docs/develop/typescript/data-handling/data-encryption.mdx @@ -1,7 +1,7 @@ --- id: data-encryption -title: Data encryption - TypeScript SDK -sidebar_label: Data encryption +title: Payload encryption - TypeScript SDK +sidebar_label: Payload encryption slug: /develop/typescript/data-handling/data-encryption toc_max_heading_level: 2 tags: diff --git a/docs/develop/typescript/data-handling/index.mdx b/docs/develop/typescript/data-handling/index.mdx index c6edf46510..ec56b5337e 100644 --- a/docs/develop/typescript/data-handling/index.mdx +++ b/docs/develop/typescript/data-handling/index.mdx @@ -3,7 +3,7 @@ id: data-handling title: Data handling - TypeScript SDK sidebar_label: Data handling slug: /develop/typescript/data-handling -description: Learn how Temporal handles data through the Data Converter pipeline, including payload conversion, encryption, and large payload storage. +description: Learn how Temporal handles data through the Data Converter, including payload conversion, encryption, and large payload storage. toc_max_heading_level: 2 tags: - TypeScript SDK @@ -11,8 +11,8 @@ tags: - Data Converters --- -All data sent to and from the Temporal Service passes through the **Data Converter** pipeline. -The pipeline has three layers that handle different concerns: +All data sent to and from the Temporal Service passes through the **Data Converter**. +The Data Converter has three layers that handle different concerns: ``` User code → PayloadConverter → PayloadCodec → ExternalStorage → Temporal Service From 62903abe43cb7c5279981593a1d2fc1ce4a48467 Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Tue, 24 Mar 2026 19:39:51 -0700 Subject: [PATCH 04/14] docs: trim opening paragraph --- docs/develop/python/data-handling/index.mdx | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/develop/python/data-handling/index.mdx b/docs/develop/python/data-handling/index.mdx index 5f9f880867..bb65d9bc9c 100644 --- a/docs/develop/python/data-handling/index.mdx +++ b/docs/develop/python/data-handling/index.mdx @@ -15,9 +15,11 @@ All data sent to and from the Temporal Service passes through the **Data Convert The Data Converter has three layers that handle different concerns: ``` -User code → PayloadConverter → PayloadCodec → ExternalStorage → Temporal Service +Application data → PayloadConverter → PayloadCodec → ExternalStorage → Temporal Service ``` +Of these three layers, only the PayloadConverter is required. Temporal uses a default PayloadConverter that handles JSON serialization. The PayloadCodec and ExternalStorage layers are optional. + | | [PayloadConverter](/develop/python/data-handling/data-conversion) | [PayloadCodec](/develop/python/data-handling/data-encryption) | [ExternalStorage](/develop/python/data-handling/large-payload-storage) | | --- | --- | --- | --- | | **Purpose** | Serialize types to bytes | Transform encoded payloads (encrypt, compress) | Offload large payloads to external store | From 7a45b8aa3d04feaacb21443cb44538455effbab6 Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Wed, 25 Mar 2026 18:01:16 -0700 Subject: [PATCH 05/14] docs: add encyclopedia page --- docs/develop/python/data-handling/index.mdx | 27 ++++---- .../data-handling/large-payload-storage.mdx | 37 +++++----- .../data-conversion/dataconversion.mdx | 52 ++++++++------ .../data-conversion/external-storage.mdx | 67 +++++++++++++++++++ .../troubleshooting/blob-size-limit-error.mdx | 18 ++--- sidebars.js | 1 + 6 files changed, 146 insertions(+), 56 deletions(-) create mode 100644 docs/encyclopedia/data-conversion/external-storage.mdx diff --git a/docs/develop/python/data-handling/index.mdx b/docs/develop/python/data-handling/index.mdx index bb65d9bc9c..01380204d1 100644 --- a/docs/develop/python/data-handling/index.mdx +++ b/docs/develop/python/data-handling/index.mdx @@ -3,30 +3,31 @@ id: data-handling title: Data handling - Python SDK sidebar_label: Data handling slug: /develop/python/data-handling -description: Learn how Temporal handles data through the Data Converter, including payload conversion, encryption, and large payload storage. -toc_max_heading_level: 2 +description: + Learn how Temporal handles data through the Data Converter, including payload conversion, encryption, and large + payload storage. +toc_max_heading_level: 3 tags: - Python SDK - Temporal SDKs - Data Converters --- -All data sent to and from the Temporal Service passes through the **Data Converter**. -The Data Converter has three layers that handle different concerns: +All data sent to and from the Temporal Service passes through the **Data Converter**. The Data Converter has three +layers that handle different concerns: ``` Application data → PayloadConverter → PayloadCodec → ExternalStorage → Temporal Service ``` -Of these three layers, only the PayloadConverter is required. Temporal uses a default PayloadConverter that handles JSON serialization. The PayloadCodec and ExternalStorage layers are optional. +Of these three layers, only the PayloadConverter is required. Temporal uses a default PayloadConverter that handles JSON +serialization. The PayloadCodec and ExternalStorage layers are optional. You only need to customize these layers when +your application requires non-JSON types, encryption, or payload offloading. -| | [PayloadConverter](/develop/python/data-handling/data-conversion) | [PayloadCodec](/develop/python/data-handling/data-encryption) | [ExternalStorage](/develop/python/data-handling/large-payload-storage) | -| --- | --- | --- | --- | -| **Purpose** | Serialize types to bytes | Transform encoded payloads (encrypt, compress) | Offload large payloads to external store | -| **Must be deterministic** | Yes | No | No | -| **Default** | JSON serialization | None (passthrough) | None (passthrough) | - -By default, Temporal uses JSON serialization with no codec and no external storage. -You only need to customize these layers when your application requires non-JSON types, encryption, or payload offloading. +| | [PayloadConverter](/develop/python/data-handling/data-conversion) | [PayloadCodec](/develop/python/data-handling/data-encryption) | [ExternalStorage](/develop/python/data-handling/large-payload-storage) | +| ------------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------- | ---------------------------------------------------------------------- | +| **Purpose** | Serialize application data to bytes | Transform encoded payloads (encrypt, compress) | Offload large payloads to external store | +| **Must be deterministic** | Yes | No | No | +| **Default** | JSON serialization | None (passthrough) | None (passthrough) | For a deeper conceptual explanation, see the [Data Conversion encyclopedia](/dataconversion). diff --git a/docs/develop/python/data-handling/large-payload-storage.mdx b/docs/develop/python/data-handling/large-payload-storage.mdx index a3eff6250a..77d4f44bb4 100644 --- a/docs/develop/python/data-handling/large-payload-storage.mdx +++ b/docs/develop/python/data-handling/large-payload-storage.mdx @@ -3,7 +3,7 @@ id: large-payload-storage title: Large payload storage - Python SDK sidebar_label: Large payload storage slug: /develop/python/data-handling/large-payload-storage -toc_max_heading_level: 2 +toc_max_heading_level: 3 tags: - Python SDK - Temporal SDKs @@ -11,9 +11,9 @@ tags: description: Offload large payloads to external storage using the claim check pattern in the Python SDK. --- -The Temporal Service enforces a ~2 MB per payload limit. -When your Workflows or Activities handle data larger than this, you can offload payloads to external storage (such as S3) and pass a small reference token through the event history instead. -This is sometimes called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). +The Temporal Service enforces a ~2 MB per payload limit. When your Workflows or Activities handle data larger than the +limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the event +history instead. This is sometimes called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). External storage sits at the end of the data pipeline, after both the Payload Converter and the Payload Codec: @@ -21,16 +21,21 @@ External storage sits at the end of the data pipeline, after both the Payload Co User code → PayloadConverter → PayloadCodec → External Storage → Temporal Service ``` -When a payload exceeds a configurable size threshold (default 256 KiB), the storage driver uploads it to your external store and replaces it with a lightweight reference. -Payloads below the threshold stay inline in the event history. -On the way back, reference payloads are retrieved from external storage before the codec decodes them. +When a payload exceeds a configurable size threshold (default 256 KiB), the storage driver uploads it to your external +store and replaces it with a lightweight reference. Payloads below the threshold stay inline in the event history. On +the way back, reference payloads are retrieved from external storage before the codec decodes them. -Because external storage runs after the codec, payloads are already encrypted (if you use an encryption codec) before they're uploaded to your store. +Because external storage runs after the codec. If you use an encryption codec, payloads are already encrypted before +they're uploaded to your store. ## Store and retrieve large payloads using external storage -To offload large payloads, implement a `StorageDriver` and configure it on your `DataConverter`. -The driver needs a `store()` method to upload payloads and a `retrieve()` method to fetch them back. +To offload large payloads, implement a `StorageDriver` and configure it on your `DataConverter`. The driver needs a +`store()` method to upload payloads and a `retrieve()` method to fetch them back. + +Once you implement a storage driver, configure it on your `DataConverter` and use it when creating your Client and +Worker. All Workflows and Activities running on the Worker will use the storage drive automatically without changes to +your business logic. You can also configure the size threshold and use multiple storage drivers. ### Implement a storage driver @@ -66,8 +71,9 @@ class S3StorageDriver(StorageDriver): ### Store payloads -The `store()` method receives a sequence of payloads and must return exactly one `StorageDriverClaim` per payload. -A claim is a set of string key-value pairs that the driver uses to locate the payload later — typically a storage key or URL. +The `store()` method receives a sequence of payloads and must return exactly one `StorageDriverClaim` per payload. A +claim is a set of string key-value pairs that the driver uses to locate the payload later — typically a storage key or +URL. ```python Sample implementation: @@ -109,8 +115,8 @@ converter = DataConverter( ### Adjust the size threshold -The `payload_size_threshold` controls which payloads get offloaded. -Payloads smaller than this value stay inline in the event history. +The `payload_size_threshold` controls which payloads get offloaded. Payloads smaller than this value stay inline in the +event history. ```python ExternalStorage( @@ -123,7 +129,8 @@ Set it to `None` to externalize all payloads regardless of size. ### Use multiple storage drivers -When you have multiple drivers (for example, hot and cold storage tiers), provide a `driver_selector` function that chooses which driver handles each payload: +When you have multiple drivers (for example, hot and cold storage tiers), provide a `driver_selector` function that +chooses which driver handles each payload: ```python hot_driver = S3StorageDriver("hot-bucket") diff --git a/docs/encyclopedia/data-conversion/dataconversion.mdx b/docs/encyclopedia/data-conversion/dataconversion.mdx index 6171458474..c94cb32042 100644 --- a/docs/encyclopedia/data-conversion/dataconversion.mdx +++ b/docs/encyclopedia/data-conversion/dataconversion.mdx @@ -2,7 +2,9 @@ id: dataconversion title: How does Temporal handle application data? sidebar_label: Data conversion -description: This guide explores Data Converters in the Temporal Platform, detailing how they handle serialization and encoding for Workflow inputs and outputs, ensuring data stays secure and manageable. +description: + This guide explores Data Converters in the Temporal Platform, detailing how they handle serialization and encoding for + Workflow inputs and outputs, ensuring data stays secure and manageable. slug: /dataconversion toc_max_heading_level: 4 keywords: @@ -23,25 +25,31 @@ import { CaptionedImage } from '@site/src/components'; This guide provides an overview of data handling using a Data Converter on the Temporal Platform. -Data Converters in Temporal are SDK components that handle the serialization and encoding of data entering and exiting a Temporal Service. -Workflow inputs and outputs need to be serialized and deserialized so they can be sent as JSON to a Temporal Service. +Data Converters in Temporal are SDK components that handle the serialization and encoding of data entering and exiting a +Temporal Service. Workflow inputs and outputs need to be serialized and deserialized so they can be sent as JSON to a +Temporal Service. - + -The Data Converter encodes data from your application to a [Payload](/dataconversion#payload) before it is sent to the Temporal Service in the Client call. -When the Temporal Server sends the encoded data back to the Worker, the Data Converter decodes it for processing within your application. -This ensures that all your sensitive data exists in its original format only on hosts that you control. +The Data Converter encodes data from your application to a [Payload](/dataconversion#payload) before it is sent to the +Temporal Service in the Client call. When the Temporal Server sends the encoded data back to the Worker, the Data +Converter decodes it for processing within your application. This ensures that all your sensitive data exists in its +original format only on hosts that you control. -Data Converter steps are followed when data is sent to a Temporal Service (as input to a Workflow) and when it is returned from a Workflow (as output). -Due to how Temporal provides access to Workflow output, this implementation is asymmetric: +Data Converter steps are followed when data is sent to a Temporal Service (as input to a Workflow) and when it is +returned from a Workflow (as output). Due to how Temporal provides access to Workflow output, this implementation is +asymmetric: -- Data encoding is performed automatically using the default converter provided by Temporal or your custom Data Converter when passing input to a Temporal Service. For example, plain text input is usually serialized into a JSON object. -- Data decoding may be performed by your application logic during your Workflows or Activities as necessary, but decoded Workflow results are never persisted back to the Temporal Service. Instead, they are stored encoded on the Temporal Service, and you need to provide an additional parameter when using [`temporal workflow show`](/cli/workflow#show) or when browsing the Web UI to view output. +- Data encoding is performed automatically using the default converter provided by Temporal or your custom Data + Converter when passing input to a Temporal Service. For example, plain text input is usually serialized into a JSON + object. +- Data decoding may be performed by your application logic during your Workflows or Activities as necessary, but decoded + Workflow results are never persisted back to the Temporal Service. Instead, they are stored encoded on the Temporal + Service, and you need to provide an additional parameter when using [`temporal workflow show`](/cli/workflow#show) or + when browsing the Web UI to view output. -Each piece of data (like a single argument or return value) is encoded as a [Payload](/dataconversion#payload), which consists of binary data and key-value metadata. +Each piece of data (like a single argument or return value) is encoded as a [Payload](/dataconversion#payload), which +consists of binary data and key-value metadata. For details, see the API references: @@ -52,10 +60,16 @@ For details, see the API references: ### What is a Payload? {#payload} -A [Payload](https://api-docs.temporal.io/#temporal.api.common.v1.Payload) represents binary data such as input and output from Activities and Workflows. -Payloads also contain metadata that describe their data type or other parameters for use by custom encoders/converters. +A [Payload](https://api-docs.temporal.io/#temporal.api.common.v1.Payload) represents binary data such as input and +output from Activities and Workflows. Payloads also contain metadata that describe their data type or other parameters +for use by custom encoders/converters. -When processed through the SDK, the [default Data Converter](/default-custom-data-converters#default-data-converter) serializes your data/value to a Payload before sending it to the Temporal Server. -The default Data Converter processes supported type values to Payloads. You can create a custom [Payload Converter](/payload-converter) to apply different conversion steps. +When processed through the SDK, the [default Data Converter](/default-custom-data-converters#default-data-converter) +serializes your data/value to a Payload before sending it to the Temporal Server. The default Data Converter processes +supported type values to Payloads. You can create a custom [Payload Converter](/payload-converter) to apply different +conversion steps. You can additionally apply [custom codecs](/payload-codec), such as for encryption or compression, on your Payloads. + +When Payloads are too large for the Temporal Service's ~2 MB limit, you can use [External Storage](/external-storage) to +offload them to an external store like S3 and keep only a reference in the Event History. diff --git a/docs/encyclopedia/data-conversion/external-storage.mdx b/docs/encyclopedia/data-conversion/external-storage.mdx new file mode 100644 index 0000000000..3e2f952c14 --- /dev/null +++ b/docs/encyclopedia/data-conversion/external-storage.mdx @@ -0,0 +1,67 @@ +--- +id: external-storage +title: External Storage +sidebar_label: External Storage +description: + External Storage offloads large payloads to an external store like S3, keeping only a small reference in the event + history. +slug: /external-storage +toc_max_heading_level: 4 +keywords: + - external-storage + - storage-driver + - large-payloads + - claim-check + - data-converters + - payloads +tags: + - Concepts + - Data Converters +--- + +The Temporal Service enforces a ~2 MB per-payload limit. When your Workflows or Activities handle data larger than this +limit, you can use External Storage to offload payloads to an external store (such as S3) and pass a small reference +token through the Event History instead. This is sometimes called the +[claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). + +## How External Storage fits in the data pipeline {#data-pipeline} + +External Storage sits at the end of the data pipeline, after both the [Payload Converter](/payload-converter) and the +[Payload Codec](/payload-codec): + +``` +User code → Payload Converter → Payload Codec → External Storage → Temporal Service +``` + +When a payload exceeds a configurable size threshold, the storage driver uploads it to your external store and replaces +it with a lightweight reference. Payloads below the threshold stay inline in the Event History. On the way back, the +codec receives reference payloads from external storage before decoding them. + +Because External Storage runs after the Payload Codec, if you use an encryption codec, payloads are already encrypted +before they're uploaded to your store. + +## Storage drivers + +A Storage Driver is the part you implement to connect External Storage to your backing store. Each driver provides two +operations: + +- **Store**. Upload payloads and return a claim, which is a set of key-value pairs the driver uses to locate the payload + later. +- **Retrieve**. Download payloads using the claims that `store` produced. + +You can configure multiple storage drivers and use a selector function to route payloads to different drivers based on +size, type, or other criteria such as hot and cold storage tiers. + +## Configuration + +Configure External Storage on the Data Converter. The key settings are: + +- **Drivers**. One or more storage driver implementations. +- **Size threshold**. The driver offloads payloads larger than this value, which typically defaults to 256 KiB. Turn off + the threshold to externalize all payloads regardless of size. +- **Driver selector**. When using multiple drivers, a function that chooses which driver handles each payload. + +For SDK-specific implementation details, see: + +- [Python SDK: Large payload storage](/develop/python/data-handling/large-payload-storage) +- [TypeScript SDK: Large payload storage](/develop/typescript/data-handling/large-payload-storage) diff --git a/docs/troubleshooting/blob-size-limit-error.mdx b/docs/troubleshooting/blob-size-limit-error.mdx index f8c66473eb..07ed659123 100644 --- a/docs/troubleshooting/blob-size-limit-error.mdx +++ b/docs/troubleshooting/blob-size-limit-error.mdx @@ -29,22 +29,22 @@ To resolve this error, reduce the size of the blob so that it is within the 4 MB There are multiple strategies you can use to avoid this error: -1. Use compression with a [custom payload codec](/payload-codec) for large payloads. +1. Use [External Storage](/external-storage) to offload large payloads to an object store like S3. The Temporal SDKs support this natively through the claim check pattern: when a payload exceeds a size threshold, a storage driver uploads it to your external store and replaces it with a small reference token in the Event History. Your Workflow and Activity code doesn't need to change. Even if your payloads are within the limit today, consider implementing External Storage if their size could grow over time. - - This addresses the immediate issue of the blob size limit; however, if blob sizes continue to grow this problem can arise again. + For SDK-specific guides, see: + - [Python: Large payload storage](/develop/python/data-handling/large-payload-storage) + - [TypeScript: Large payload storage](/develop/typescript/data-handling/large-payload-storage) -2. Break larger batches of commands into smaller batch sizes: +2. Use compression with a [custom Payload Codec](/payload-codec) for large payloads. This addresses the immediate issue, but if payload sizes continue to grow, the problem can arise again. + +3. Break larger batches of commands into smaller batch sizes: - Workflow-level batching: - 1. Modify the Workflow to process Activities or Child Workflows into smaller batches. + 1. Change the Workflow to process Activities or Child Workflows into smaller batches. 2. Iterate through each batch, waiting for completion before moving to the next. - Workflow Task-level batching: 1. Execute Activities in smaller batches within a single Workflow Task. - 2. Introduce brief pauses or sleeps (for example, 1ms) between batches. - -3. Consider offloading large payloads to an object store to reduce the risk of exceeding blob size limits: - 1. Pass references to the stored payloads within the Workflow instead of the actual data. - 2. Retrieve the payloads from the object store when needed during execution. + 2. Introduce brief pauses or sleeps between batches. ## Workflow termination due to oversized response diff --git a/sidebars.js b/sidebars.js index c867100a51..c57f3ee55b 100644 --- a/sidebars.js +++ b/sidebars.js @@ -906,6 +906,7 @@ module.exports = { 'encyclopedia/data-conversion/failure-converter', 'encyclopedia/data-conversion/remote-data-encoding', 'encyclopedia/data-conversion/codec-server', + 'encyclopedia/data-conversion/external-storage', 'encyclopedia/data-conversion/key-management', ], }, From 0ba6bf3204a279de04757a79ce072b6c218b2add Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Thu, 26 Mar 2026 13:52:38 -0700 Subject: [PATCH 06/14] docs: add diagram for data converter --- .../data-conversion/dataconversion.mdx | 23 ++++++++++++++++++- ...a-converter-flow-with-external-storage.svg | 1 + 2 files changed, 23 insertions(+), 1 deletion(-) create mode 100644 static/diagrams/data-converter-flow-with-external-storage.svg diff --git a/docs/encyclopedia/data-conversion/dataconversion.mdx b/docs/encyclopedia/data-conversion/dataconversion.mdx index c94cb32042..82d9b97c5d 100644 --- a/docs/encyclopedia/data-conversion/dataconversion.mdx +++ b/docs/encyclopedia/data-conversion/dataconversion.mdx @@ -51,6 +51,23 @@ asymmetric: Each piece of data (like a single argument or return value) is encoded as a [Payload](/dataconversion#payload), which consists of binary data and key-value metadata. +## Components of a Data Converter {#data-converter-components} + +A Data Converter has the following layers: + +| | [PayloadConverter](/payload-converter) | [PayloadCodec](/payload-codec) | [ExternalStorage](/external-storage) | +| ------------------------- | -------------------------------------- | ---------------------------------------------- | ---------------------------------------- | +| **Purpose** | Serialize application data to bytes | Transform encoded payloads (encrypt, compress) | Offload large payloads to external store | +| **Must be deterministic** | Yes | No | No | +| **Default** | JSON serialization | None (passthrough) | None (passthrough) | + +The following diagram shows how data flows through a Data Converter: + + + For details, see the API references: - [Go](https://pkg.go.dev/go.temporal.io/sdk/converter#DataConverter) @@ -58,7 +75,7 @@ For details, see the API references: - [Python](https://python.temporal.io/temporalio.converter.DataConverter.html) - [TypeScript](https://typescript.temporal.io/api/interfaces/common.DataConverter) -### What is a Payload? {#payload} +## What is a Payload? {#payload} A [Payload](https://api-docs.temporal.io/#temporal.api.common.v1.Payload) represents binary data such as input and output from Activities and Workflows. Payloads also contain metadata that describe their data type or other parameters @@ -73,3 +90,7 @@ You can additionally apply [custom codecs](/payload-codec), such as for encrypti When Payloads are too large for the Temporal Service's ~2 MB limit, you can use [External Storage](/external-storage) to offload them to an external store like S3 and keep only a reference in the Event History. + +## What are the components of a Data Converter? {#data-converter-components} + +A Data Converter has the following layers: diff --git a/static/diagrams/data-converter-flow-with-external-storage.svg b/static/diagrams/data-converter-flow-with-external-storage.svg new file mode 100644 index 0000000000..169f60f0df --- /dev/null +++ b/static/diagrams/data-converter-flow-with-external-storage.svg @@ -0,0 +1 @@ + \ No newline at end of file From 74910eb49f24abb556dc1a4bf74221b9a2968521 Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Thu, 26 Mar 2026 16:26:08 -0700 Subject: [PATCH 07/14] docs: add s3 how-to --- .../data-handling/large-payload-storage.mdx | 206 ++++++++++++------ .../data-conversion/dataconversion.mdx | 25 +-- .../data-conversion/external-storage.mdx | 19 +- 3 files changed, 161 insertions(+), 89 deletions(-) diff --git a/docs/develop/python/data-handling/large-payload-storage.mdx b/docs/develop/python/data-handling/large-payload-storage.mdx index 77d4f44bb4..616e420829 100644 --- a/docs/develop/python/data-handling/large-payload-storage.mdx +++ b/docs/develop/python/data-handling/large-payload-storage.mdx @@ -11,80 +11,169 @@ tags: description: Offload large payloads to external storage using the claim check pattern in the Python SDK. --- +import { CaptionedImage } from '@site/src/components'; + The Temporal Service enforces a ~2 MB per payload limit. When your Workflows or Activities handle data larger than the -limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the event -history instead. This is sometimes called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). +limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the Event +History instead. This is called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). -External storage sits at the end of the data pipeline, after both the Payload Converter and the Payload Codec: +[External Storage](/external-storage) sits at the end of the data pipeline, after both the Payload Converter and the +Payload Codec: -``` -User code → PayloadConverter → PayloadCodec → External Storage → Temporal Service -``` + -When a payload exceeds a configurable size threshold (default 256 KiB), the storage driver uploads it to your external -store and replaces it with a lightweight reference. Payloads below the threshold stay inline in the event history. On -the way back, reference payloads are retrieved from external storage before the codec decodes them. +When a payload exceeds a configurable size threshold, the storage driver uploads it to your external store and replaces +it with a lightweight reference. Payloads below the threshold stay inline in the Event History. On the way back, +reference payloads are retrieved from external storage before the Payload Codec decodes them. -Because external storage runs after the codec. If you use an encryption codec, payloads are already encrypted before -they're uploaded to your store. +Because External Storage runs after the Payload Codec, if you use an encryption codec, payloads are already encrypted +before they're uploaded to your store. -## Store and retrieve large payloads using external storage +## Store and retrieve large payloads with S3 -To offload large payloads, implement a `StorageDriver` and configure it on your `DataConverter`. The driver needs a -`store()` method to upload payloads and a `retrieve()` method to fetch them back. +The Python SDK includes a contrib S3 storage driver. Follow these steps to set it up: -Once you implement a storage driver, configure it on your `DataConverter` and use it when creating your Client and -Worker. All Workflows and Activities running on the Worker will use the storage drive automatically without changes to -your business logic. You can also configure the size threshold and use multiple storage drivers. +1. Install the S3 driver dependency: -### Implement a storage driver + ```shell + pip install temporalio[s3driver] + ``` -Extend the `StorageDriver` abstract class: +2. Create an `S3StorageDriverClient` and an `S3StorageDriver`: -```python -from temporalio.converter import ( - StorageDriver, - StorageDriverClaim, - StorageDriverStoreContext, - StorageDriverRetrieveContext, -) -from temporalio.api.common.v1 import Payload -from typing import Sequence + ```python + from temporalio.contrib.aws.s3driver import S3StorageDriver, S3StorageDriverClient + driver_client = S3StorageDriverClient() + driver = S3StorageDriver(client=driver_client, bucket="my-temporal-payloads") + ``` -class S3StorageDriver(StorageDriver): - def __init__(self, bucket: str) -> None: - self._bucket = bucket - self._s3 = boto3.client("s3") + The driver generates S3 keys using a SHA-256 hash of the payload content. Identical payloads produce the same key, so + the driver skips the upload if the object already exists. The driver includes the namespace and Workflow ID in the S3 + key to group related payloads in your bucket. For example: `v0/ns/my-namespace/wfi/my-workflow/d/sha256/{hash}`. - def name(self) -> str: - return "s3" + To reject payloads above a hard limit before uploading, set `max_payload_size`: - async def store(self, context, payloads): - # Upload payloads, return claims (see below) - ... + ```python + driver = S3StorageDriver( + client=driver_client, + bucket="my-temporal-payloads", + max_payload_size=50 * 1024 * 1024, # 50 MiB + ) + ``` - async def retrieve(self, context, claims): - # Download payloads using claims (see below) - ... -``` +3. Configure the driver on your `DataConverter` and pass the converter to your Client and Worker: + + + + ```python + from temporalio.converter import DataConverter, ExternalStorage + + converter = DataConverter( + external_storage=ExternalStorage( + drivers=[driver], + payload_size_threshold=256 * 1024, # 256 KiB (default) + ), + ) -### Store payloads + client = await Client.connect("localhost:7233", data_converter=converter) -The `store()` method receives a sequence of payloads and must return exactly one `StorageDriverClaim` per payload. A -claim is a set of string key-value pairs that the driver uses to locate the payload later — typically a storage key or -URL. + worker = Worker( + client, + task_queue="my-task-queue", + workflows=[MyWorkflow], + activities=[my_activity], + ) + ``` + + + +All Workflows and Activities running on the Worker use the storage driver automatically without changes to your business +logic. The driver uploads and downloads payloads concurrently, and on retrieve it validates the SHA-256 hash of each +downloaded payload against the hash stored in the claim. + +### Route payloads to different buckets + +The `bucket` parameter accepts a callable that receives the store context and the payload. Use this to route payloads to +different buckets based on context, such as the Activity Task Queue: ```python -Sample implementation: +from temporalio.converter import StorageDriverStoreContext, ActivitySerializationContext +from temporalio.api.common.v1 import Payload + +QUEUE_BUCKETS = { + "queue-a": "bucket-queue-a", + "queue-b": "bucket-queue-b", +} + +def bucket_selector(ctx: StorageDriverStoreContext, payload: Payload) -> str: + if isinstance(ctx.serialization_context, ActivitySerializationContext): + queue = ctx.serialization_context.activity_task_queue + if queue and queue in QUEUE_BUCKETS: + return QUEUE_BUCKETS[queue] + return "default-bucket" + +driver = S3StorageDriver(client=driver_client, bucket=bucket_selector) ``` -### Retrieve payloads +## Implement a custom storage driver -The `retrieve()` method receives the claims that `store()` produced and must return the original payloads: +If you need a backend other than S3, implement your own storage driver by extending the `StorageDriver` abstract class. +The driver needs a `name()` method, a `store()` method to upload payloads, and a `retrieve()` method to fetch them back. ```python -Sample implementation: +import hashlib +from temporalio.converter import ( + StorageDriver, + StorageDriverClaim, + StorageDriverStoreContext, + StorageDriverRetrieveContext, +) +from temporalio.api.common.v1 import Payload + + +class MyStorageDriver(StorageDriver): + def name(self) -> str: + return "my-storage" + + async def store( + self, + context: StorageDriverStoreContext, + payloads: list[Payload], + ) -> list[StorageDriverClaim]: + claims = [] + for payload in payloads: + data = payload.SerializeToString() + key = hashlib.sha256(data).hexdigest() + await self._upload(key, data) + claims.append( + StorageDriverClaim(claim_data={"key": key}) + ) + return claims + + async def retrieve( + self, + context: StorageDriverRetrieveContext, + claims: list[StorageDriverClaim], + ) -> list[Payload]: + payloads = [] + for claim in claims: + data = await self._download(claim.claim_data["key"]) + payload = Payload() + payload.ParseFromString(data) + payloads.append(payload) + return payloads + + async def _upload(self, key: str, data: bytes) -> None: + # Upload data to your storage backend + ... + + async def _download(self, key: str) -> bytes: + # Download data from your storage backend + ... ``` ### Configure external storage on the Data Converter @@ -96,27 +185,16 @@ from temporalio.converter import DataConverter, ExternalStorage converter = DataConverter( external_storage=ExternalStorage( - drivers=[S3StorageDriver("my-bucket")], + drivers=[MyStorageDriver()], payload_size_threshold=256 * 1024, # 256 KiB (default) ), ) ``` -Use this converter when creating your Client and Worker: - -```python -converter = DataConverter( - external_storage=ExternalStorage( - drivers=[LocalDiskStorageDriver()], - payload_size_threshold=1_000, # 1KB — low threshold for testing - ), -) -``` - ### Adjust the size threshold The `payload_size_threshold` controls which payloads get offloaded. Payloads smaller than this value stay inline in the -event history. +Event History. ```python ExternalStorage( @@ -133,8 +211,8 @@ When you have multiple drivers (for example, hot and cold storage tiers), provid chooses which driver handles each payload: ```python -hot_driver = S3StorageDriver("hot-bucket") -cold_driver = S3StorageDriver("cold-bucket") +hot_driver = MyStorageDriver("hot-bucket") +cold_driver = MyStorageDriver("cold-bucket") ExternalStorage( drivers=[hot_driver, cold_driver], diff --git a/docs/encyclopedia/data-conversion/dataconversion.mdx b/docs/encyclopedia/data-conversion/dataconversion.mdx index 82d9b97c5d..6ce19fd5f8 100644 --- a/docs/encyclopedia/data-conversion/dataconversion.mdx +++ b/docs/encyclopedia/data-conversion/dataconversion.mdx @@ -36,18 +36,6 @@ Temporal Service in the Client call. When the Temporal Server sends the encoded Converter decodes it for processing within your application. This ensures that all your sensitive data exists in its original format only on hosts that you control. -Data Converter steps are followed when data is sent to a Temporal Service (as input to a Workflow) and when it is -returned from a Workflow (as output). Due to how Temporal provides access to Workflow output, this implementation is -asymmetric: - -- Data encoding is performed automatically using the default converter provided by Temporal or your custom Data - Converter when passing input to a Temporal Service. For example, plain text input is usually serialized into a JSON - object. -- Data decoding may be performed by your application logic during your Workflows or Activities as necessary, but decoded - Workflow results are never persisted back to the Temporal Service. Instead, they are stored encoded on the Temporal - Service, and you need to provide an additional parameter when using [`temporal workflow show`](/cli/workflow#show) or - when browsing the Web UI to view output. - Each piece of data (like a single argument or return value) is encoded as a [Payload](/dataconversion#payload), which consists of binary data and key-value metadata. @@ -68,6 +56,15 @@ The following diagram shows how data flows through a Data Converter: title="The Flow of Data through a Data Converter" /> +Your application code controls the Temporal Client, which sends data through the Data Converter to the Temporal Service. +The data is first encoded by the PayloadConverter, then transformed by the PayloadCodec, and finally offloaded to +external storage by the ExternalStorage. + +When the Temporal Service dispatches the Tasks to the Workers, the Worker first retrieves the payloads from external +storage, then decodes them by the PayloadCodec, and finally deserializes them by the PayloadConverter. + +During the entire data conversion process, the Temporal Service never sees or persists the decoded data. + For details, see the API references: - [Go](https://pkg.go.dev/go.temporal.io/sdk/converter#DataConverter) @@ -90,7 +87,3 @@ You can additionally apply [custom codecs](/payload-codec), such as for encrypti When Payloads are too large for the Temporal Service's ~2 MB limit, you can use [External Storage](/external-storage) to offload them to an external store like S3 and keep only a reference in the Event History. - -## What are the components of a Data Converter? {#data-converter-components} - -A Data Converter has the following layers: diff --git a/docs/encyclopedia/data-conversion/external-storage.mdx b/docs/encyclopedia/data-conversion/external-storage.mdx index 3e2f952c14..417e72469a 100644 --- a/docs/encyclopedia/data-conversion/external-storage.mdx +++ b/docs/encyclopedia/data-conversion/external-storage.mdx @@ -21,17 +21,23 @@ tags: The Temporal Service enforces a ~2 MB per-payload limit. When your Workflows or Activities handle data larger than this limit, you can use External Storage to offload payloads to an external store (such as S3) and pass a small reference -token through the Event History instead. This is sometimes called the +token through the Event History instead. This is called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). +For SDK-specific implementation details, see: + +- [Python SDK: Large payload storage](/develop/python/data-handling/large-payload-storage) +- [TypeScript SDK: Large payload storage](/develop/typescript/data-handling/large-payload-storage) + ## How External Storage fits in the data pipeline {#data-pipeline} External Storage sits at the end of the data pipeline, after both the [Payload Converter](/payload-converter) and the [Payload Codec](/payload-codec): -``` -User code → Payload Converter → Payload Codec → External Storage → Temporal Service -``` + When a payload exceeds a configurable size threshold, the storage driver uploads it to your external store and replaces it with a lightweight reference. Payloads below the threshold stay inline in the Event History. On the way back, the @@ -60,8 +66,3 @@ Configure External Storage on the Data Converter. The key settings are: - **Size threshold**. The driver offloads payloads larger than this value, which typically defaults to 256 KiB. Turn off the threshold to externalize all payloads regardless of size. - **Driver selector**. When using multiple drivers, a function that chooses which driver handles each payload. - -For SDK-specific implementation details, see: - -- [Python SDK: Large payload storage](/develop/python/data-handling/large-payload-storage) -- [TypeScript SDK: Large payload storage](/develop/typescript/data-handling/large-payload-storage) From 148a8b2be598e350ce77693b291a58fb5ef95bfe Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Thu, 26 Mar 2026 17:13:27 -0700 Subject: [PATCH 08/14] docs: add complete storage driver sample --- .../data-handling/large-payload-storage.mdx | 131 +++++++++++------- 1 file changed, 80 insertions(+), 51 deletions(-) diff --git a/docs/develop/python/data-handling/large-payload-storage.mdx b/docs/develop/python/data-handling/large-payload-storage.mdx index 616e420829..f28612662f 100644 --- a/docs/develop/python/data-handling/large-payload-storage.mdx +++ b/docs/develop/python/data-handling/large-payload-storage.mdx @@ -92,93 +92,122 @@ The Python SDK includes a contrib S3 storage driver. Follow these steps to set i All Workflows and Activities running on the Worker use the storage driver automatically without changes to your business -logic. The driver uploads and downloads payloads concurrently, and on retrieve it validates the SHA-256 hash of each -downloaded payload against the hash stored in the claim. +logic. The driver uploads and downloads payloads concurrently and validates payload integrity on retrieve. -### Route payloads to different buckets +4. (Optional) To route payloads to different buckets, pass a function as the `bucket` parameter instead of a string. The + function takes the store context and the payload as arguments, and returns a bucket name, so you can route based on + the Activity Task Queue or other context: -The `bucket` parameter accepts a callable that receives the store context and the payload. Use this to route payloads to -different buckets based on context, such as the Activity Task Queue: - -```python -from temporalio.converter import StorageDriverStoreContext, ActivitySerializationContext -from temporalio.api.common.v1 import Payload - -QUEUE_BUCKETS = { - "queue-a": "bucket-queue-a", - "queue-b": "bucket-queue-b", -} - -def bucket_selector(ctx: StorageDriverStoreContext, payload: Payload) -> str: - if isinstance(ctx.serialization_context, ActivitySerializationContext): - queue = ctx.serialization_context.activity_task_queue - if queue and queue in QUEUE_BUCKETS: - return QUEUE_BUCKETS[queue] - return "default-bucket" - -driver = S3StorageDriver(client=driver_client, bucket=bucket_selector) -``` + ```python + from temporalio.converter import StorageDriverStoreContext, ActivitySerializationContext + from temporalio.api.common.v1 import Payload + + QUEUE_BUCKETS = { + "queue-a": "bucket-queue-a", + "queue-b": "bucket-queue-b", + } + + def bucket_selector(ctx: StorageDriverStoreContext, payload: Payload) -> str: + if isinstance(ctx.serialization_context, ActivitySerializationContext): + queue = ctx.serialization_context.activity_task_queue + if queue and queue in QUEUE_BUCKETS: + return QUEUE_BUCKETS[queue] + return "default-bucket" + + driver = S3StorageDriver(client=driver_client, bucket=bucket_selector) + ``` ## Implement a custom storage driver -If you need a backend other than S3, implement your own storage driver by extending the `StorageDriver` abstract class. -The driver needs a `name()` method, a `store()` method to upload payloads, and a `retrieve()` method to fetch them back. +If you need a storage system other than S3, you can implement your own storage driver. The following example shows a +complete custom driver implementation that uses local disk as the backing store: + + ```python -import hashlib +import os +import uuid +from typing import Sequence + +from temporalio.api.common.v1 import Payload from temporalio.converter import ( StorageDriver, StorageDriverClaim, StorageDriverStoreContext, StorageDriverRetrieveContext, ) -from temporalio.api.common.v1 import Payload -class MyStorageDriver(StorageDriver): +class LocalDiskStorageDriver(StorageDriver): + def __init__(self, store_dir: str = "/tmp/temporal-payload-store") -> None: + self._store_dir = store_dir + def name(self) -> str: - return "my-storage" + return "local-disk" async def store( self, context: StorageDriverStoreContext, - payloads: list[Payload], + payloads: Sequence[Payload], ) -> list[StorageDriverClaim]: + os.makedirs(self._store_dir, exist_ok=True) + + prefix = self._store_dir + sc = context.serialization_context + if sc is not None and hasattr(sc, "workflow_id"): + prefix = os.path.join(self._store_dir, sc.namespace, sc.workflow_id) + os.makedirs(prefix, exist_ok=True) + claims = [] for payload in payloads: - data = payload.SerializeToString() - key = hashlib.sha256(data).hexdigest() - await self._upload(key, data) - claims.append( - StorageDriverClaim(claim_data={"key": key}) - ) + key = f"{uuid.uuid4()}.bin" + file_path = os.path.join(prefix, key) + with open(file_path, "wb") as f: + f.write(payload.SerializeToString()) + claims.append(StorageDriverClaim(claim_data={"path": file_path})) return claims async def retrieve( self, context: StorageDriverRetrieveContext, - claims: list[StorageDriverClaim], + claims: Sequence[StorageDriverClaim], ) -> list[Payload]: payloads = [] for claim in claims: - data = await self._download(claim.claim_data["key"]) + file_path = claim.claim_data["path"] + with open(file_path, "rb") as f: + data = f.read() payload = Payload() payload.ParseFromString(data) payloads.append(payload) return payloads +``` - async def _upload(self, key: str, data: bytes) -> None: - # Upload data to your storage backend - ... + - async def _download(self, key: str) -> bytes: - # Download data from your storage backend - ... -``` +### Extend the StorageDriver class + +A custom driver extends the `StorageDriver` abstract class and implements three methods: + +- `name()` returns a unique string that identifies the driver. +- `store()` receives a list of payloads and returns one `StorageDriverClaim` per payload. A claim is a set of string + key-value pairs that the driver uses to locate the payload later. +- `retrieve()` receives the claims that `store()` produced and returns the original payloads. + +### Store payloads + +In `store()`, serialize each payload to bytes with `payload.SerializeToString()`, upload the bytes to your storage +system, and return a `StorageDriverClaim` with enough information to find the payload later. Using a content-addressable +key like a SHA-256 hash gives you deduplication for free. + +### Retrieve payloads + +In `retrieve()`, download the bytes using the claim data, then reconstruct the payload with +`payload.ParseFromString(data)`. -### Configure external storage on the Data Converter +### Configure the Data Converter -Pass an `ExternalStorage` instance to your `DataConverter`: +Pass an `ExternalStorage` instance to your `DataConverter` and use the converter when creating your Client and Worker: ```python from temporalio.converter import DataConverter, ExternalStorage @@ -205,10 +234,10 @@ ExternalStorage( Set it to `None` to externalize all payloads regardless of size. -### Use multiple storage drivers +## Use multiple storage drivers -When you have multiple drivers (for example, hot and cold storage tiers), provide a `driver_selector` function that -chooses which driver handles each payload: +When you have multiple drivers, for example hot and cold storage tiers, pass a `driver_selector` function that chooses +which driver handles each payload: ```python hot_driver = MyStorageDriver("hot-bucket") From c698f34049a3a9b10a310aeb912b3b4f71885798 Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Thu, 26 Mar 2026 17:22:24 -0700 Subject: [PATCH 09/14] fix captioned image issue --- docs/encyclopedia/data-conversion/external-storage.mdx | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/encyclopedia/data-conversion/external-storage.mdx b/docs/encyclopedia/data-conversion/external-storage.mdx index 417e72469a..0b32b4a7ca 100644 --- a/docs/encyclopedia/data-conversion/external-storage.mdx +++ b/docs/encyclopedia/data-conversion/external-storage.mdx @@ -19,6 +19,8 @@ tags: - Data Converters --- +import { CaptionedImage } from '@site/src/components'; + The Temporal Service enforces a ~2 MB per-payload limit. When your Workflows or Activities handle data larger than this limit, you can use External Storage to offload payloads to an external store (such as S3) and pass a small reference token through the Event History instead. This is called the From b661a0f068efadb7d201fdec614c7780a7d51dff Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Thu, 26 Mar 2026 17:47:46 -0700 Subject: [PATCH 10/14] docs: edits --- .../data-handling/large-payload-storage.mdx | 57 +++++-------------- .../data-conversion/external-storage.mdx | 9 +-- .../troubleshooting/blob-size-limit-error.mdx | 37 ++++++++---- 3 files changed, 44 insertions(+), 59 deletions(-) diff --git a/docs/develop/python/data-handling/large-payload-storage.mdx b/docs/develop/python/data-handling/large-payload-storage.mdx index f28612662f..c54e705ae5 100644 --- a/docs/develop/python/data-handling/large-payload-storage.mdx +++ b/docs/develop/python/data-handling/large-payload-storage.mdx @@ -13,28 +13,16 @@ description: Offload large payloads to external storage using the claim check pa import { CaptionedImage } from '@site/src/components'; -The Temporal Service enforces a ~2 MB per payload limit. When your Workflows or Activities handle data larger than the +The Temporal Service enforces a 2 MB per payload limit. When your Workflows or Activities handle data larger than the limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the Event -History instead. This is called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). +History instead. This is called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). This page +shows you how to set up External Storage with AWS S3 and how to implement a custom storage driver. -[External Storage](/external-storage) sits at the end of the data pipeline, after both the Payload Converter and the -Payload Codec: +For more information about how External Storage fits in the data pipeline, see [External Storage](/external-storage). - +## Store and retrieve large payloads with AWS S3 -When a payload exceeds a configurable size threshold, the storage driver uploads it to your external store and replaces -it with a lightweight reference. Payloads below the threshold stay inline in the Event History. On the way back, -reference payloads are retrieved from external storage before the Payload Codec decodes them. - -Because External Storage runs after the Payload Codec, if you use an encryption codec, payloads are already encrypted -before they're uploaded to your store. - -## Store and retrieve large payloads with S3 - -The Python SDK includes a contrib S3 storage driver. Follow these steps to set it up: +The Python SDK includes an S3 storage driver. Follow these steps to set it up: 1. Install the S3 driver dependency: @@ -42,7 +30,8 @@ The Python SDK includes a contrib S3 storage driver. Follow these steps to set i pip install temporalio[s3driver] ``` -2. Create an `S3StorageDriverClient` and an `S3StorageDriver`: +2. Import the classes `S3StorageDriverClient` and `S3StorageDriver` and create an instance of each. When you create the + driver instance, pass the `client` argument to the constructor as well as the name of your S3 bucket: ```python from temporalio.contrib.aws.s3driver import S3StorageDriver, S3StorageDriverClient @@ -55,16 +44,6 @@ The Python SDK includes a contrib S3 storage driver. Follow these steps to set i the driver skips the upload if the object already exists. The driver includes the namespace and Workflow ID in the S3 key to group related payloads in your bucket. For example: `v0/ns/my-namespace/wfi/my-workflow/d/sha256/{hash}`. - To reject payloads above a hard limit before uploading, set `max_payload_size`: - - ```python - driver = S3StorageDriver( - client=driver_client, - bucket="my-temporal-payloads", - max_payload_size=50 * 1024 * 1024, # 50 MiB - ) - ``` - 3. Configure the driver on your `DataConverter` and pass the converter to your Client and Worker: @@ -91,8 +70,8 @@ The Python SDK includes a contrib S3 storage driver. Follow these steps to set i -All Workflows and Activities running on the Worker use the storage driver automatically without changes to your business -logic. The driver uploads and downloads payloads concurrently and validates payload integrity on retrieve. + All Workflows and Activities running on the Worker use the storage driver automatically without changes to your + business logic. The driver uploads and downloads payloads concurrently and validates payload integrity on retrieve. 4. (Optional) To route payloads to different buckets, pass a function as the `bucket` parameter instead of a string. The function takes the store context and the payload as arguments, and returns a bucket name, so you can route based on @@ -220,23 +199,13 @@ converter = DataConverter( ) ``` -### Adjust the size threshold - The `payload_size_threshold` controls which payloads get offloaded. Payloads smaller than this value stay inline in the -Event History. - -```python -ExternalStorage( - drivers=[driver], - payload_size_threshold=100 * 1024, # 100 KiB -) -``` - -Set it to `None` to externalize all payloads regardless of size. +Event History. If you don't provide a value, the default value is 256 KiB. Set it to `None` to externalize all payloads +regardless of size. ## Use multiple storage drivers -When you have multiple drivers, for example hot and cold storage tiers, pass a `driver_selector` function that chooses +When you have multiple drivers, such as for hot and cold storage tiers, pass a `driver_selector` function that chooses which driver handles each payload: ```python diff --git a/docs/encyclopedia/data-conversion/external-storage.mdx b/docs/encyclopedia/data-conversion/external-storage.mdx index 0b32b4a7ca..7ac5840545 100644 --- a/docs/encyclopedia/data-conversion/external-storage.mdx +++ b/docs/encyclopedia/data-conversion/external-storage.mdx @@ -21,7 +21,7 @@ tags: import { CaptionedImage } from '@site/src/components'; -The Temporal Service enforces a ~2 MB per-payload limit. When your Workflows or Activities handle data larger than this +The Temporal Service enforces a 2 MB per-payload limit. When your Workflows or Activities handle data larger than this limit, you can use External Storage to offload payloads to an external store (such as S3) and pass a small reference token through the Event History instead. This is called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). @@ -41,9 +41,10 @@ External Storage sits at the end of the data pipeline, after both the [Payload C title="The Flow of Data through a Data Converter" /> -When a payload exceeds a configurable size threshold, the storage driver uploads it to your external store and replaces -it with a lightweight reference. Payloads below the threshold stay inline in the Event History. On the way back, the -codec receives reference payloads from external storage before decoding them. +When a Temporal Client sends a payload that exceeds a configurable size threshold, the storage driver uploads it to your +external store and replaces it with a lightweight reference. Payloads below the threshold stay inline in the Event +History. When the Temporal Service dispatches Tasks to the Worker, the Worker retrieves the reference payloads from +external storage and loads them into memory to execute the Task. Because External Storage runs after the Payload Codec, if you use an encryption codec, payloads are already encrypted before they're uploaded to your store. diff --git a/docs/troubleshooting/blob-size-limit-error.mdx b/docs/troubleshooting/blob-size-limit-error.mdx index 07ed659123..51c913028b 100644 --- a/docs/troubleshooting/blob-size-limit-error.mdx +++ b/docs/troubleshooting/blob-size-limit-error.mdx @@ -2,7 +2,9 @@ id: blob-size-limit-error title: Troubleshoot the blob size limit error sidebar_label: Blob size limit error -description: The BlobSizeLimitError occurs when a Workflow's payload exceeds the 2 MB request limit or the 4 MB Event History transaction limit set by Temporal. Reduce blob size via compression or batching. +description: + The BlobSizeLimitError occurs when a Workflow's payload exceeds the 2 MB request limit or the 4 MB Event History + transaction limit set by Temporal. Reduce blob size via compression or batching. toc_max_heading_level: 4 keywords: - error @@ -12,7 +14,8 @@ tags: - Failures --- -The `BlobSizeLimitError` is an error that occurs when the size of a blob (payloads including Workflow context and each Workflow and Activity argument and return value) exceeds the set limit in Temporal. +The `BlobSizeLimitError` is an error that occurs when the size of a blob (payloads including Workflow context and each +Workflow and Activity argument and return value) exceeds the set limit in Temporal. - The max payload for a single request is 2 MB. - The max size limit for any given [Event History](/workflow-execution/event#event-history) transaction is 4 MB. @@ -21,24 +24,31 @@ The `BlobSizeLimitError` is an error that occurs when the size of a blob (payloa This error occurs when the size of the blob exceeds the maximum size allowed by Temporal. -This limit helps ensure that the Temporal Service prevents excessive resource usage and potential performance issues when handling large payloads. +This limit helps ensure that the Temporal Service prevents excessive resource usage and potential performance issues +when handling large payloads. ## How do I resolve this error? -To resolve this error, reduce the size of the blob so that it is within the 4 MB limit. +To resolve this error, reduce the size of the blob so that it is within the limits. There are multiple strategies you can use to avoid this error: -1. Use [External Storage](/external-storage) to offload large payloads to an object store like S3. The Temporal SDKs support this natively through the claim check pattern: when a payload exceeds a size threshold, a storage driver uploads it to your external store and replaces it with a small reference token in the Event History. Your Workflow and Activity code doesn't need to change. Even if your payloads are within the limit today, consider implementing External Storage if their size could grow over time. +1. (Recommended) Use [External Storage](/external-storage) to offload large payloads to an object store like S3. The + Temporal SDKs support this natively through [External Storage](/external-storage)(available in the + [Python](/develop/python/data-handling/large-payload-storage) and + [TypeScript](/develop/typescript/data-handling/large-payload-storage) SDKs): when a payload exceeds a size threshold, + a storage driver uploads it to your external store and replaces it with a small reference token in the Event History. + Your Workflow and Activity code doesn't need to change. Even if your payloads are within the limit today, consider + implementing External Storage if their size could grow over time. For SDK-specific guides, see: - [Python: Large payload storage](/develop/python/data-handling/large-payload-storage) - [TypeScript: Large payload storage](/develop/typescript/data-handling/large-payload-storage) -2. Use compression with a [custom Payload Codec](/payload-codec) for large payloads. This addresses the immediate issue, but if payload sizes continue to grow, the problem can arise again. +2. Use compression with a [custom Payload Codec](/payload-codec) for large payloads. This addresses the immediate issue, + but if payload sizes continue to grow, the problem can arise again. 3. Break larger batches of commands into smaller batch sizes: - - Workflow-level batching: 1. Change the Workflow to process Activities or Child Workflows into smaller batches. 2. Iterate through each batch, waiting for completion before moving to the next. @@ -48,12 +58,17 @@ There are multiple strategies you can use to avoid this error: ## Workflow termination due to oversized response -When a Workflow Task response exceeds the 4 MB gRPC message size limit, Temporal automatically terminates the Workflow Execution. This is a non-recoverable error. The Workflow can't progress if it generates a response that's too large, so retrying won't help. +When a Workflow Task response exceeds the 4 MB gRPC message size limit, Temporal automatically terminates the Workflow +Execution. This is a non-recoverable error. The Workflow can't progress if it generates a response that's too large, so +retrying won't help. -This typically happens when a Workflow schedules too many Activities, Child Workflows, or other commands in a single Workflow Task. The total size of all commands generated by the Workflow Task must fit within the 4 MB limit. +This typically happens when a Workflow schedules too many Activities, Child Workflows, or other commands in a single +Workflow Task. The total size of all commands generated by the Workflow Task must fit within the 4 MB limit. -If your Workflow was terminated for this reason, you'll see a `WorkflowExecutionTerminated` event in the Event History with the cause `WORKFLOW_TASK_FAILED_CAUSE_GRPC_MESSAGE_TOO_LARGE`. +If your Workflow was terminated for this reason, you'll see a `WorkflowExecutionTerminated` event in the Event History +with the cause `WORKFLOW_TASK_FAILED_CAUSE_GRPC_MESSAGE_TOO_LARGE`. -To prevent this, use the batching strategies described above to split work across multiple Workflow Tasks instead of scheduling everything at once. +To prevent this, use the batching strategies described above to split work across multiple Workflow Tasks instead of +scheduling everything at once. See the [gRPC Message Too Large error reference](/references/errors#grpc-message-too-large) for more details. From c583f77145292fbddb0ae74f38e430c8efb062b5 Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Thu, 26 Mar 2026 17:48:25 -0700 Subject: [PATCH 11/14] docs: removes unnecessary import --- docs/develop/python/data-handling/large-payload-storage.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/develop/python/data-handling/large-payload-storage.mdx b/docs/develop/python/data-handling/large-payload-storage.mdx index c54e705ae5..d5ad1fdabb 100644 --- a/docs/develop/python/data-handling/large-payload-storage.mdx +++ b/docs/develop/python/data-handling/large-payload-storage.mdx @@ -11,8 +11,6 @@ tags: description: Offload large payloads to external storage using the claim check pattern in the Python SDK. --- -import { CaptionedImage } from '@site/src/components'; - The Temporal Service enforces a 2 MB per payload limit. When your Workflows or Activities handle data larger than the limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the Event History instead. This is called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). This page From 2266b660bae94256cbdd119f6f1db5d0422e1ba0 Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Thu, 26 Mar 2026 23:14:46 -0700 Subject: [PATCH 12/14] docs: remove TS references --- .../data-handling/data-conversion.mdx | 290 ------------------ .../data-handling/data-encryption.mdx | 176 ----------- .../typescript/data-handling/index.mdx | 30 -- .../data-handling/large-payload-storage.mdx | 64 ---- .../data-conversion/external-storage.mdx | 1 - .../troubleshooting/blob-size-limit-error.mdx | 6 +- sidebars.js | 14 - 7 files changed, 2 insertions(+), 579 deletions(-) delete mode 100644 docs/develop/typescript/data-handling/data-conversion.mdx delete mode 100644 docs/develop/typescript/data-handling/data-encryption.mdx delete mode 100644 docs/develop/typescript/data-handling/index.mdx delete mode 100644 docs/develop/typescript/data-handling/large-payload-storage.mdx diff --git a/docs/develop/typescript/data-handling/data-conversion.mdx b/docs/develop/typescript/data-handling/data-conversion.mdx deleted file mode 100644 index 478292685d..0000000000 --- a/docs/develop/typescript/data-handling/data-conversion.mdx +++ /dev/null @@ -1,290 +0,0 @@ ---- -id: data-conversion -title: Payload conversion - TypeScript SDK -sidebar_label: Payload conversion -slug: /develop/typescript/data-handling/data-conversion -toc_max_heading_level: 2 -tags: - - Data Converters - - TypeScript SDK - - Temporal SDKs -description: Customize how Temporal serializes application objects using Payload Converters in the TypeScript SDK, including EJSON and protobuf examples. ---- - -Payload Converters serialize your application objects into a `Payload` and deserialize them back. -A `Payload` is a binary form with metadata that Temporal uses to transport data. - -By default, Temporal uses a `DefaultPayloadConverter` that handles `undefined`, binary data, and JSON. -You only need a custom Payload Converter when your application uses types that aren't natively JSON-serializable, such as `BigInt`, `Date`, or protobufs. - -## How the default converter works - -The default converter is a `CompositePayloadConverter` that tries each converter in order until one handles the value: - -```typescript -export class DefaultPayloadConverter extends CompositePayloadConverter { - constructor() { - super( - new UndefinedPayloadConverter(), - new BinaryPayloadConverter(), - new JsonPayloadConverter(), - ); - } -} -``` - -During serialization, the Data Converter tries each Payload Converter in sequence until one returns a non-null `Payload`. - -## Custom Payload Converters - -To handle custom data types, implement the [`PayloadConverter`](https://typescript.temporal.io/api/interfaces/common.PayloadConverter) interface: - -```typescript -interface PayloadConverter { - toPayload(value: T): Payload; - fromPayload(payload: Payload): T; -} -``` - -Then provide it to the Worker and Client through a `CompositePayloadConverter`: - -```typescript -export const payloadConverter = new CompositePayloadConverter( - new UndefinedPayloadConverter(), - new EjsonPayloadConverter(), -); -``` - -```typescript -// Worker -const worker = await Worker.create({ - workflowsPath: require.resolve('./workflows'), - taskQueue: 'ejson', - dataConverter: { - payloadConverterPath: require.resolve('./payload-converter'), - }, -}); -``` - -```typescript -// Client -const client = new Client({ - dataConverter: { - payloadConverterPath: require.resolve('./payload-converter'), - }, -}); -``` - -### EJSON example - -The [samples-typescript/ejson](https://github.com/temporalio/samples-typescript/tree/main/ejson) sample creates a custom `PayloadConverter` using EJSON, which supports types like `Date`, `RegExp`, `Infinity`, and `Uint8Array`. - - -[ejson/src/ejson-payload-converter.ts](https://github.com/temporalio/samples-typescript/blob/main/ejson/src/ejson-payload-converter.ts) -```ts -import { - EncodingType, - METADATA_ENCODING_KEY, - Payload, - PayloadConverterWithEncoding, - PayloadConverterError, -} from '@temporalio/common'; -import EJSON from 'ejson'; -import { decode, encode } from '@temporalio/common/lib/encoding'; - -/** - * Converts between values and [EJSON](https://docs.meteor.com/api/ejson.html) Payloads. - */ -export class EjsonPayloadConverter implements PayloadConverterWithEncoding { - // Use 'json/plain' so that Payloads are displayed in the UI - public encodingType = 'json/plain' as EncodingType; - - public toPayload(value: unknown): Payload | undefined { - if (value === undefined) return undefined; - let ejson; - try { - ejson = EJSON.stringify(value); - } catch (e) { - throw new UnsupportedEjsonTypeError( - `Can't run EJSON.stringify on this value: ${value}. Either convert it (or its properties) to EJSON-serializable values (see https://docs.meteor.com/api/ejson.html ), or create a custom data converter. EJSON.stringify error message: ${errorMessage( - e, - )}`, - e as Error, - ); - } - - return { - metadata: { - [METADATA_ENCODING_KEY]: encode('json/plain'), - // Include an additional metadata field to indicate that this is an EJSON payload - format: encode('extended'), - }, - data: encode(ejson), - }; - } - - public fromPayload(content: Payload): T { - return content.data ? EJSON.parse(decode(content.data)) : content.data; - } -} - -export class UnsupportedEjsonTypeError extends PayloadConverterError { - public readonly name: string = 'UnsupportedJsonTypeError'; - - constructor( - message: string | undefined, - public readonly cause?: Error, - ) { - super(message ?? undefined); - } -} -``` - - -### Protobufs - -To serialize values as [Protocol Buffers](https://protobuf.dev/): - -- Use [protobufjs](https://protobufjs.github.io/protobuf.js/). -- Use runtime-loaded messages (not generated classes) and `MessageClass.create` (not `new MessageClass()`). -- Generate `json-module.js`: - - ```sh - pbjs -t json-module --workflow-id commonjs -o protos/json-module.js protos/*.proto - ``` - -- Patch `json-module.js`: - - -[protobufs/protos/root.js](https://github.com/temporalio/samples-typescript/blob/main/protobufs/protos/root.js) -```js -const { patchProtobufRoot } = require('@temporalio/common/lib/protobufs'); -const unpatchedRoot = require('./json-module'); -module.exports = patchProtobufRoot(unpatchedRoot); -``` - - -- Generate `root.d.ts`: - - ```sh - pbjs -t static-module protos/*.proto | pbts -o protos/root.d.ts - - ``` - -- Create a [`DefaultPayloadConverterWithProtobufs`](https://typescript.temporal.io/api/classes/protobufs.DefaultPayloadConverterWithProtobufs/): - - -[protobufs/src/payload-converter.ts](https://github.com/temporalio/samples-typescript/blob/main/protobufs/src/payload-converter.ts) -```ts -import { DefaultPayloadConverterWithProtobufs } from '@temporalio/common/lib/protobufs'; -import root from '../protos/root'; - -export const payloadConverter = new DefaultPayloadConverterWithProtobufs({ protobufRoot: root }); -``` - - -For binary-encoded protobufs (saves space but can't be viewed in the Web UI): - -```ts -import { ProtobufBinaryPayloadConverter } from '@temporalio/common/lib/protobufs'; -import root from '../protos/root'; - -export const payloadConverter = new ProtobufBinaryPayloadConverter(root); -``` - -For binary-encoded protobufs alongside other default types: - -```ts -import { - BinaryPayloadConverter, - CompositePayloadConverter, - JsonPayloadConverter, - UndefinedPayloadConverter, -} from '@temporalio/common'; -import { ProtobufBinaryPayloadConverter } from '@temporalio/common/lib/protobufs'; -import root from '../protos/root'; - -export const payloadConverter = new CompositePayloadConverter( - new UndefinedPayloadConverter(), - new BinaryPayloadConverter(), - new ProtobufBinaryPayloadConverter(root), - new JsonPayloadConverter(), -); -``` - -Provide the converter to the Worker and Client: - - -[protobufs/src/worker.ts](https://github.com/temporalio/samples-typescript/blob/main/protobufs/src/worker.ts) -```ts -const worker = await Worker.create({ - workflowsPath: require.resolve('./workflows'), - activities, - taskQueue: 'protobufs', - dataConverter: { payloadConverterPath: require.resolve('./payload-converter') }, -}); -``` - - - -[protobufs/src/client.ts](https://github.com/temporalio/samples-typescript/blob/main/protobufs/src/client.ts) -```ts -import { Client, Connection } from '@temporalio/client'; -import { loadClientConnectConfig } from '@temporalio/envconfig'; -import { v4 as uuid } from 'uuid'; -import { foo, ProtoResult } from '../protos/root'; -import { example } from './workflows'; - -async function run() { - const config = loadClientConnectConfig(); - const connection = await Connection.connect(config.connectionOptions); - const client = new Client({ - connection, - dataConverter: { payloadConverterPath: require.resolve('./payload-converter') }, - }); - - const handle = await client.workflow.start(example, { - args: [foo.bar.ProtoInput.create({ name: 'Proto', age: 2 })], - // can't do: - // args: [new foo.bar.ProtoInput({ name: 'Proto', age: 2 })], - taskQueue: 'protobufs', - workflowId: 'my-business-id-' + uuid(), - }); - - console.log(`Started workflow ${handle.workflowId}`); - - const result: ProtoResult = await handle.result(); - console.log(result.toJSON()); -} -``` - - -Use protobufs in your Workflows and Activities: - - -[protobufs/src/workflows.ts](https://github.com/temporalio/samples-typescript/blob/main/protobufs/src/workflows.ts) -```ts -import { proxyActivities } from '@temporalio/workflow'; -import { foo, ProtoResult } from '../protos/root'; -import type * as activities from './activities'; - -const { protoActivity } = proxyActivities({ - startToCloseTimeout: '1 minute', -}); - -export async function example(input: foo.bar.ProtoInput): Promise { - const result = await protoActivity(input); - return result; -} -``` - - - -[protobufs/src/activities.ts](https://github.com/temporalio/samples-typescript/blob/main/protobufs/src/activities.ts) -```ts -import { foo, ProtoResult } from '../protos/root'; - -export async function protoActivity(input: foo.bar.ProtoInput): Promise { - return ProtoResult.create({ sentence: `${input.name} is ${input.age} years old.` }); -} -``` - diff --git a/docs/develop/typescript/data-handling/data-encryption.mdx b/docs/develop/typescript/data-handling/data-encryption.mdx deleted file mode 100644 index 0d05293356..0000000000 --- a/docs/develop/typescript/data-handling/data-encryption.mdx +++ /dev/null @@ -1,176 +0,0 @@ ---- -id: data-encryption -title: Payload encryption - TypeScript SDK -sidebar_label: Payload encryption -slug: /develop/typescript/data-handling/data-encryption -toc_max_heading_level: 2 -tags: - - Security - - Encryption - - Codec Server - - TypeScript SDK - - Temporal SDKs -description: Encrypt data sent to and from the Temporal Service using a custom Payload Codec in the TypeScript SDK. ---- - -Payload Codecs transform `Payload` bytes after serialization (by the Payload Converter) and before the data is sent to the Temporal Service. -Unlike Payload Converters, codecs run outside the Workflow sandbox, so they can call external services and use non-deterministic modules. - -The most common use case is encryption: encrypting payloads before they reach the Temporal Service so that sensitive data is never stored in plaintext. - -## PayloadCodec interface - -Implement the [`PayloadCodec`](https://typescript.temporal.io/api/interfaces/common.PayloadCodec) interface: - -```ts -interface PayloadCodec { - encode(payloads: Payload[]): Promise; - decode(payloads: Payload[]): Promise; -} -``` - -`encode` is called before data is sent to the Temporal Service. -`decode` is called when data is received from the Temporal Service. - -## Encryption example - -The following example implements AES-GCM encryption as a `PayloadCodec`: - - -[encryption/src/encryption-codec.ts](https://github.com/temporalio/samples-typescript/blob/main/encryption/src/encryption-codec.ts) -```ts -import { webcrypto as crypto } from 'node:crypto'; -import { METADATA_ENCODING_KEY, Payload, PayloadCodec, ValueError } from '@temporalio/common'; -import { temporal } from '@temporalio/proto'; -import { decode, encode } from '@temporalio/common/lib/encoding'; -import { decrypt, encrypt } from './crypto'; - -const ENCODING = 'binary/encrypted'; -const METADATA_ENCRYPTION_KEY_ID = 'encryption-key-id'; - -export class EncryptionCodec implements PayloadCodec { - constructor( - protected readonly keys: Map, - protected readonly defaultKeyId: string, - ) {} - - static async create(keyId: string): Promise { - const keys = new Map(); - keys.set(keyId, await fetchKey(keyId)); - return new this(keys, keyId); - } - - async encode(payloads: Payload[]): Promise { - return Promise.all( - payloads.map(async (payload) => ({ - metadata: { - [METADATA_ENCODING_KEY]: encode(ENCODING), - [METADATA_ENCRYPTION_KEY_ID]: encode(this.defaultKeyId), - }, - // Encrypt entire payload, preserving metadata - data: await encrypt( - temporal.api.common.v1.Payload.encode(payload).finish(), - this.keys.get(this.defaultKeyId)!, // eslint-disable-line @typescript-eslint/no-non-null-assertion - ), - })), - ); - } - - async decode(payloads: Payload[]): Promise { - return Promise.all( - payloads.map(async (payload) => { - if (!payload.metadata || decode(payload.metadata[METADATA_ENCODING_KEY]) !== ENCODING) { - return payload; - } - if (!payload.data) { - throw new ValueError('Payload data is missing'); - } - - const keyIdBytes = payload.metadata[METADATA_ENCRYPTION_KEY_ID]; - if (!keyIdBytes) { - throw new ValueError('Unable to decrypt Payload without encryption key id'); - } - - const keyId = decode(keyIdBytes); - let key = this.keys.get(keyId); - if (!key) { - key = await fetchKey(keyId); - this.keys.set(keyId, key); - } - const decryptedPayloadBytes = await decrypt(payload.data, key); - console.log('Decrypting payload.data:', payload.data); - return temporal.api.common.v1.Payload.decode(decryptedPayloadBytes); - }), - ); - } -} - -async function fetchKey(_keyId: string): Promise { - // In production, fetch key from a key management system (KMS). You may want to memoize requests if you'll be decoding - // Payloads that were encrypted using keys other than defaultKeyId. - const key = Buffer.from('test-key-test-key-test-key-test!'); - const cryptoKey = await crypto.subtle.importKey( - 'raw', - key, - { - name: 'AES-GCM', - }, - true, - ['encrypt', 'decrypt'], - ); - - return cryptoKey; -} -``` - - -Provide the codec to the Client and Worker through the Data Converter: - - -[encryption/src/client.ts](https://github.com/temporalio/samples-typescript/blob/main/encryption/src/client.ts) -```ts -const client = new Client({ - connection, - dataConverter: await getDataConverter(), -}); - -const handle = await client.workflow.start(example, { - args: ['Alice: Private message for Bob.'], - taskQueue: 'encryption', - workflowId: `my-business-id-${uuid()}`, -}); - -console.log(`Started workflow ${handle.workflowId}`); -console.log(await handle.result()); -``` - - - -[encryption/src/worker.ts](https://github.com/temporalio/samples-typescript/blob/main/encryption/src/worker.ts) -```ts -const worker = await Worker.create({ - workflowsPath: require.resolve('./workflows'), - taskQueue: 'encryption', - dataConverter: await getDataConverter(), -}); -``` - - -When the Client sends `'Alice: Private message for Bob.'` to the Workflow, it gets encrypted on the Client and decrypted in the Worker. -The Workflow receives the decrypted message and appends another message. -When it returns that longer string, the string gets encrypted by the Worker and decrypted by the Client. - - -[encryption/src/workflows.ts](https://github.com/temporalio/samples-typescript/blob/main/encryption/src/workflows.ts) -```ts -export async function example(message: string): Promise { - return `${message}\nBob: Hi Alice, I'm Workflow Bob.`; -} -``` - - -## Codec Server - -A Codec Server is an HTTP server that runs your `PayloadCodec` remotely, so that the Temporal Web UI and CLI can decode encrypted payloads for display. - -For more information, see [Codec Server](/codec-server). diff --git a/docs/develop/typescript/data-handling/index.mdx b/docs/develop/typescript/data-handling/index.mdx deleted file mode 100644 index ec56b5337e..0000000000 --- a/docs/develop/typescript/data-handling/index.mdx +++ /dev/null @@ -1,30 +0,0 @@ ---- -id: data-handling -title: Data handling - TypeScript SDK -sidebar_label: Data handling -slug: /develop/typescript/data-handling -description: Learn how Temporal handles data through the Data Converter, including payload conversion, encryption, and large payload storage. -toc_max_heading_level: 2 -tags: - - TypeScript SDK - - Temporal SDKs - - Data Converters ---- - -All data sent to and from the Temporal Service passes through the **Data Converter**. -The Data Converter has three layers that handle different concerns: - -``` -User code → PayloadConverter → PayloadCodec → ExternalStorage → Temporal Service -``` - -| | [PayloadConverter](/develop/typescript/data-handling/data-conversion) | [PayloadCodec](/develop/typescript/data-handling/data-encryption) | [ExternalStorage](/develop/typescript/data-handling/large-payload-storage) | -| --- | --- | --- | --- | -| **Purpose** | Serialize types to bytes | Transform encoded payloads (encrypt, compress) | Offload large payloads to external store | -| **Must be deterministic** | Yes | No | No | -| **Default** | JSON serialization | None (passthrough) | None (passthrough) | - -By default, Temporal uses JSON serialization with no codec and no external storage. -You only need to customize these layers when your application requires non-JSON types, encryption, or payload offloading. - -For a deeper conceptual explanation, see the [Data Conversion encyclopedia](/dataconversion). diff --git a/docs/develop/typescript/data-handling/large-payload-storage.mdx b/docs/develop/typescript/data-handling/large-payload-storage.mdx deleted file mode 100644 index 950b998453..0000000000 --- a/docs/develop/typescript/data-handling/large-payload-storage.mdx +++ /dev/null @@ -1,64 +0,0 @@ ---- -id: large-payload-storage -title: Large payload storage - TypeScript SDK -sidebar_label: Large payload storage -slug: /develop/typescript/data-handling/large-payload-storage -toc_max_heading_level: 2 -tags: - - TypeScript SDK - - Temporal SDKs - - Data Converters -description: Offload large payloads to external storage using the claim check pattern in the TypeScript SDK. ---- - -:::info This feature is in development - -Large payload storage support is not yet available in the TypeScript SDK. -This page will be updated when the feature is released. -See the [Python SDK](/develop/python/data-handling/large-payload-storage) for a working implementation. - -::: - -The Temporal Service enforces a ~2 MB per payload limit. -When your Workflows or Activities handle data larger than this, you can offload payloads to external storage (such as S3) and pass a small reference token through the event history instead. -This is sometimes called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). - -External storage sits at the end of the data pipeline, after both the Payload Converter and the Payload Codec: - -``` -User code → PayloadConverter → PayloadCodec → External Storage → Temporal Service -``` - -When a payload exceeds a configurable size threshold (default 256 KiB), the storage driver uploads it to your external store and replaces it with a lightweight reference. -Payloads below the threshold stay inline in the event history. -On the way back, reference payloads are retrieved from external storage before the codec decodes them. - -Because external storage runs after the codec, payloads are already encrypted (if you use an encryption codec) before they're uploaded to your store. - -## Store and retrieve large payloads using external storage - -_This section will document how to implement a storage driver and configure external storage on the TypeScript `DataConverter` once the feature ships._ - -### Implement a storage driver - -_Stub: storage driver interface and example._ - -### Store payloads - -_Stub: store method implementation._ - -### Retrieve payloads - -_Stub: retrieve method implementation._ - -### Configure external storage on the Data Converter - -_Stub: DataConverter configuration._ - -### Adjust the size threshold - -_Stub: payload_size_threshold configuration._ - -### Use multiple storage drivers - -_Stub: driver_selector configuration._ diff --git a/docs/encyclopedia/data-conversion/external-storage.mdx b/docs/encyclopedia/data-conversion/external-storage.mdx index 7ac5840545..0d3befae3b 100644 --- a/docs/encyclopedia/data-conversion/external-storage.mdx +++ b/docs/encyclopedia/data-conversion/external-storage.mdx @@ -29,7 +29,6 @@ token through the Event History instead. This is called the For SDK-specific implementation details, see: - [Python SDK: Large payload storage](/develop/python/data-handling/large-payload-storage) -- [TypeScript SDK: Large payload storage](/develop/typescript/data-handling/large-payload-storage) ## How External Storage fits in the data pipeline {#data-pipeline} diff --git a/docs/troubleshooting/blob-size-limit-error.mdx b/docs/troubleshooting/blob-size-limit-error.mdx index 51c913028b..8c832cb7b4 100644 --- a/docs/troubleshooting/blob-size-limit-error.mdx +++ b/docs/troubleshooting/blob-size-limit-error.mdx @@ -34,16 +34,14 @@ To resolve this error, reduce the size of the blob so that it is within the limi There are multiple strategies you can use to avoid this error: 1. (Recommended) Use [External Storage](/external-storage) to offload large payloads to an object store like S3. The - Temporal SDKs support this natively through [External Storage](/external-storage)(available in the - [Python](/develop/python/data-handling/large-payload-storage) and - [TypeScript](/develop/typescript/data-handling/large-payload-storage) SDKs): when a payload exceeds a size threshold, + Temporal SDKs support this natively through [External Storage](/external-storage) (available in the + [Python SDK](/develop/python/data-handling/large-payload-storage)): when a payload exceeds a size threshold, a storage driver uploads it to your external store and replaces it with a small reference token in the Event History. Your Workflow and Activity code doesn't need to change. Even if your payloads are within the limit today, consider implementing External Storage if their size could grow over time. For SDK-specific guides, see: - [Python: Large payload storage](/develop/python/data-handling/large-payload-storage) - - [TypeScript: Large payload storage](/develop/typescript/data-handling/large-payload-storage) 2. Use compression with a [custom Payload Codec](/payload-codec) for large payloads. This addresses the immediate issue, but if payload sizes continue to grow, the problem can arise again. diff --git a/sidebars.js b/sidebars.js index c57f3ee55b..34dd3badf6 100644 --- a/sidebars.js +++ b/sidebars.js @@ -268,20 +268,6 @@ module.exports = { 'develop/typescript/debugging', 'develop/typescript/schedules', 'develop/typescript/converters-and-encryption', - { - type: 'category', - label: 'Data handling', - collapsed: true, - link: { - type: 'doc', - id: 'develop/typescript/data-handling/data-handling', - }, - items: [ - 'develop/typescript/data-handling/data-conversion', - 'develop/typescript/data-handling/data-encryption', - 'develop/typescript/data-handling/large-payload-storage', - ], - }, 'develop/typescript/timers', 'develop/typescript/nexus', 'develop/typescript/child-workflows', From 5c8bc2cf3153ebee30f9aa344a49aebcccaf294d Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Thu, 26 Mar 2026 23:25:53 -0700 Subject: [PATCH 13/14] copyedits --- .../python/data-handling/large-payload-storage.mdx | 10 ++++++++-- docs/encyclopedia/data-conversion/dataconversion.mdx | 8 ++++---- docs/troubleshooting/blob-size-limit-error.mdx | 5 ++--- 3 files changed, 14 insertions(+), 9 deletions(-) diff --git a/docs/develop/python/data-handling/large-payload-storage.mdx b/docs/develop/python/data-handling/large-payload-storage.mdx index d5ad1fdabb..51452e5d56 100644 --- a/docs/develop/python/data-handling/large-payload-storage.mdx +++ b/docs/develop/python/data-handling/large-payload-storage.mdx @@ -174,8 +174,14 @@ A custom driver extends the `StorageDriver` abstract class and implements three ### Store payloads In `store()`, serialize each payload to bytes with `payload.SerializeToString()`, upload the bytes to your storage -system, and return a `StorageDriverClaim` with enough information to find the payload later. Using a content-addressable -key like a SHA-256 hash gives you deduplication for free. +system, and return a `StorageDriverClaim` with enough information to find the payload later. + +:::tip + +For simplicity, the code example uses a UUID for the key. In production systems, consider using a content-addressable +key like a SHA-256 hash, which can help you deduplicate payloads and reduce storage costs. + +::: ### Retrieve payloads diff --git a/docs/encyclopedia/data-conversion/dataconversion.mdx b/docs/encyclopedia/data-conversion/dataconversion.mdx index 6ce19fd5f8..43934e9829 100644 --- a/docs/encyclopedia/data-conversion/dataconversion.mdx +++ b/docs/encyclopedia/data-conversion/dataconversion.mdx @@ -57,11 +57,11 @@ The following diagram shows how data flows through a Data Converter: /> Your application code controls the Temporal Client, which sends data through the Data Converter to the Temporal Service. -The data is first encoded by the PayloadConverter, then transformed by the PayloadCodec, and finally offloaded to -external storage by the ExternalStorage. +The PayloadConverter encodes the data first, then the PayloadCodec transforms it, and finally ExternalStorage offloads +large payloads to an external store. -When the Temporal Service dispatches the Tasks to the Workers, the Worker first retrieves the payloads from external -storage, then decodes them by the PayloadCodec, and finally deserializes them by the PayloadConverter. +When the Temporal Service dispatches Tasks to Workers, the process reverses: ExternalStorage retrieves offloaded +payloads, the PayloadCodec decodes them, and the PayloadConverter deserializes them back into application types. During the entire data conversion process, the Temporal Service never sees or persists the decoded data. diff --git a/docs/troubleshooting/blob-size-limit-error.mdx b/docs/troubleshooting/blob-size-limit-error.mdx index 8c832cb7b4..7080ff9f42 100644 --- a/docs/troubleshooting/blob-size-limit-error.mdx +++ b/docs/troubleshooting/blob-size-limit-error.mdx @@ -33,9 +33,8 @@ To resolve this error, reduce the size of the blob so that it is within the limi There are multiple strategies you can use to avoid this error: -1. (Recommended) Use [External Storage](/external-storage) to offload large payloads to an object store like S3. The - Temporal SDKs support this natively through [External Storage](/external-storage) (available in the - [Python SDK](/develop/python/data-handling/large-payload-storage)): when a payload exceeds a size threshold, +1. (Recommended) Use [External Storage](/external-storage) to offload large payloads to an object store like S3. + Currently available in the [Python SDK](/develop/python/data-handling/large-payload-storage). When a payload exceeds a size threshold, a storage driver uploads it to your external store and replaces it with a small reference token in the Event History. Your Workflow and Activity code doesn't need to change. Even if your payloads are within the limit today, consider implementing External Storage if their size could grow over time. From 1a65f557729c48aa82470dd7948a819d58e6443f Mon Sep 17 00:00:00 2001 From: Lenny Chen Date: Thu, 26 Mar 2026 23:34:27 -0700 Subject: [PATCH 14/14] remove install step --- .../python/data-handling/large-payload-storage.mdx | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/docs/develop/python/data-handling/large-payload-storage.mdx b/docs/develop/python/data-handling/large-payload-storage.mdx index 51452e5d56..1e7d568bb3 100644 --- a/docs/develop/python/data-handling/large-payload-storage.mdx +++ b/docs/develop/python/data-handling/large-payload-storage.mdx @@ -22,13 +22,7 @@ For more information about how External Storage fits in the data pipeline, see [ The Python SDK includes an S3 storage driver. Follow these steps to set it up: -1. Install the S3 driver dependency: - - ```shell - pip install temporalio[s3driver] - ``` - -2. Import the classes `S3StorageDriverClient` and `S3StorageDriver` and create an instance of each. When you create the +1. Import the classes `S3StorageDriverClient` and `S3StorageDriver` and create an instance of each. When you create the driver instance, pass the `client` argument to the constructor as well as the name of your S3 bucket: ```python @@ -42,7 +36,7 @@ The Python SDK includes an S3 storage driver. Follow these steps to set it up: the driver skips the upload if the object already exists. The driver includes the namespace and Workflow ID in the S3 key to group related payloads in your bucket. For example: `v0/ns/my-namespace/wfi/my-workflow/d/sha256/{hash}`. -3. Configure the driver on your `DataConverter` and pass the converter to your Client and Worker: +2. Configure the driver on your `DataConverter` and pass the converter to your Client and Worker: @@ -71,7 +65,7 @@ The Python SDK includes an S3 storage driver. Follow these steps to set it up: All Workflows and Activities running on the Worker use the storage driver automatically without changes to your business logic. The driver uploads and downloads payloads concurrently and validates payload integrity on retrieve. -4. (Optional) To route payloads to different buckets, pass a function as the `bucket` parameter instead of a string. The +3. (Optional) To route payloads to different buckets, pass a function as the `bucket` parameter instead of a string. The function takes the store context and the payload as arguments, and returns a bucket name, so you can route based on the Activity Task Queue or other context: