Skip to content

docs: large payload storage; data handling refactor#4333

Open
lennessyy wants to merge 20 commits intomainfrom
feat/large-payload-storage
Open

docs: large payload storage; data handling refactor#4333
lennessyy wants to merge 20 commits intomainfrom
feat/large-payload-storage

Conversation

@lennessyy
Copy link
Copy Markdown
Contributor

@lennessyy lennessyy commented Mar 25, 2026

What does this PR do?

  • Restructures current docs on data conversion and encryption
  • Added new page for large payload storage

Notes to reviewers

┆Attachments: EDU-6097 docs: Add Data handling section for Python and TypeScript SDKs

New section with overview, data conversion, data encryption, and
large payload storage pages for both SDKs. Keeps existing
converters-and-encryption pages in place.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
temporal-documentation Ready Ready Preview, Comment Mar 31, 2026 1:53am

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

@lennessyy lennessyy changed the title docs: Add Data handling section for Python and TypeScript SDKs docs: large payload storage; data handling refactor Mar 25, 2026
@lennessyy lennessyy marked this pull request as ready for review March 27, 2026 00:22
1. Install the S3 driver dependency:

```shell
pip install temporalio[s3driver]
Copy link
Copy Markdown
Contributor Author

@lennessyy lennessyy Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be installed separately or can it be imported directly? @jmaeagle99

Copy link
Copy Markdown
Contributor

@drewhoskins-temporal drewhoskins-temporal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed

  • Encyclopedia/data-conversion
  • data-handling/large-payload-storage.

Overall, those two looking good and covering a lot of important ground, but there's a lot to work through and more ground to cover! This is a BIG topic. Also, thanks for actually dogfooding this and snipsyncing as well. Makes me feel a lot more confident.

Ran out of time today to review the rest, but I imagine best is to iterate on those two (ideally in separate PRs to scope down each one) and then come back to the others in another couple of PRs.

Please slack me if you make any comments here so I'm sure to notice quickly. Also quite happy to Zoom, whatever works for you.

---

The Temporal Service enforces a 2 MB per payload limit. When your Workflows or Activities handle data larger than the
limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the Event
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the Event
limit, you can offload payloads to external storage, such as Amazon S3, and pass a small reference token through the Event

The Temporal Service enforces a 2 MB per payload limit. When your Workflows or Activities handle data larger than the
limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the Event
History instead. This is called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). This page
shows you how to set up External Storage with AWS S3 and how to implement a custom storage driver.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
shows you how to set up External Storage with AWS S3 and how to implement a custom storage driver.
shows you how to set up External Storage with Amazon S3 and how to implement a custom storage driver.


import { CaptionedImage } from '@site/src/components';

The Temporal Service enforces a 2 MB per-payload limit. When your Workflows or Activities handle data larger than this
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a big warning about pre-release here and in the python docs.

Suggested change
The Temporal Service enforces a 2 MB per-payload limit. When your Workflows or Activities handle data larger than this
The Temporal Service enforces a 2 MB per-payload limit. If your Workflows or Activities might handle data larger than this

title="The Flow of Data through a Data Converter"
/>

When a Temporal Client sends a payload that exceeds a configurable size threshold, the storage driver uploads it to your
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I kinda felt like this was worth a diagram to show the decision tree, but it's low-pri .

driver = S3StorageDriver(client=driver_client, bucket=bucket_selector)
```

## Implement a custom storage driver
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also cover custom drivers from the Extensibility section of the encyclopedia just so that people who are extending understand their options. Maybe that content can link here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added backlink here: https://github.com/temporalio/documentation/pull/4333/changes#diff-0c7a1240530a855fc14bf1b15d463832cc040bbd4b7ecd42146885a733a331e2R99

Or did you mean the Extensibility index page, which links to data conversation which is a parent-level abstraction to this


:::tip

For simplicity, the code example uses a UUID for the key. In production systems, consider using a content-addressable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix this once @jmaeagle99 settles on a scheme for S3. Mistakes here are pretty brutal as I learned from DataDog--you may be left with a bunch of data that you can't figure out ownership of or how to delete.

:::tip

For simplicity, the code example uses a UUID for the key. In production systems, consider using a content-addressable
key like a SHA-256 hash, which can help you deduplicate payloads and reduce storage costs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, DataDog did this globally and couldn't delete their old data because they couldn't figure out which workflows owned which data.
Likely, content-addressing should be scoped within a namespace/workflow ID and content-hashed inside of that.

converter = DataConverter(
external_storage=ExternalStorage(
drivers=[driver],
payload_size_threshold=256 * 1024, # 256 KiB (default)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote not to include it and leave it to more "advanced" use cases (e.g. all data should go to external storage). We want this to be a default value (so not required).

```python
from temporalio.contrib.aws.s3driver import S3StorageDriver, S3StorageDriverClient

driver_client = S3StorageDriverClient()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the code should look more like:

import aioboto3
from temporalio.contrib.aws.s3driver import S3StorageDriver
from temporalio.contrib.aws.s3driver.aioboto3 import new_aioboto3_client

session = aioboto3.Session(profile_name=AWS_PROFILE, region_name=AWS_REGION)
async with session.client("s3") as s3_client:
    driver = S3StorageDriver(
        client=new_aioboto3_client(s3_client),
        bucket="my-temporal-payloads",
    )

),
)

client = await Client.connect("localhost:7233", data_converter=converter)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be promoting env config over explicitly setting the target?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants