docs: large payload storage; data handling refactor#4333
docs: large payload storage; data handling refactor#4333
Conversation
New section with overview, data conversion, data encryption, and large payload storage pages for both SDKs. Keeps existing converters-and-encryption pages in place. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📖 Docs PR preview links
|
…ralio/documentation into feat/large-payload-storage
…ralio/documentation into feat/large-payload-storage
| 1. Install the S3 driver dependency: | ||
|
|
||
| ```shell | ||
| pip install temporalio[s3driver] |
There was a problem hiding this comment.
Does this need to be installed separately or can it be imported directly? @jmaeagle99
…ralio/documentation into feat/large-payload-storage
There was a problem hiding this comment.
I've reviewed
- Encyclopedia/data-conversion
- data-handling/large-payload-storage.
Overall, those two looking good and covering a lot of important ground, but there's a lot to work through and more ground to cover! This is a BIG topic. Also, thanks for actually dogfooding this and snipsyncing as well. Makes me feel a lot more confident.
Ran out of time today to review the rest, but I imagine best is to iterate on those two (ideally in separate PRs to scope down each one) and then come back to the others in another couple of PRs.
Please slack me if you make any comments here so I'm sure to notice quickly. Also quite happy to Zoom, whatever works for you.
| --- | ||
|
|
||
| The Temporal Service enforces a 2 MB per payload limit. When your Workflows or Activities handle data larger than the | ||
| limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the Event |
There was a problem hiding this comment.
| limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the Event | |
| limit, you can offload payloads to external storage, such as Amazon S3, and pass a small reference token through the Event |
| The Temporal Service enforces a 2 MB per payload limit. When your Workflows or Activities handle data larger than the | ||
| limit, you can offload payloads to external storage, such as S3, and pass a small reference token through the Event | ||
| History instead. This is called the [claim check pattern](https://en.wikipedia.org/wiki/Claim_check_pattern). This page | ||
| shows you how to set up External Storage with AWS S3 and how to implement a custom storage driver. |
There was a problem hiding this comment.
| shows you how to set up External Storage with AWS S3 and how to implement a custom storage driver. | |
| shows you how to set up External Storage with Amazon S3 and how to implement a custom storage driver. |
|
|
||
| import { CaptionedImage } from '@site/src/components'; | ||
|
|
||
| The Temporal Service enforces a 2 MB per-payload limit. When your Workflows or Activities handle data larger than this |
There was a problem hiding this comment.
Need a big warning about pre-release here and in the python docs.
| The Temporal Service enforces a 2 MB per-payload limit. When your Workflows or Activities handle data larger than this | |
| The Temporal Service enforces a 2 MB per-payload limit. If your Workflows or Activities might handle data larger than this |
| title="The Flow of Data through a Data Converter" | ||
| /> | ||
|
|
||
| When a Temporal Client sends a payload that exceeds a configurable size threshold, the storage driver uploads it to your |
There was a problem hiding this comment.
nit: I kinda felt like this was worth a diagram to show the decision tree, but it's low-pri .
| driver = S3StorageDriver(client=driver_client, bucket=bucket_selector) | ||
| ``` | ||
|
|
||
| ## Implement a custom storage driver |
There was a problem hiding this comment.
We should also cover custom drivers from the Extensibility section of the encyclopedia just so that people who are extending understand their options. Maybe that content can link here?
There was a problem hiding this comment.
Added backlink here: https://github.com/temporalio/documentation/pull/4333/changes#diff-0c7a1240530a855fc14bf1b15d463832cc040bbd4b7ecd42146885a733a331e2R99
Or did you mean the Extensibility index page, which links to data conversation which is a parent-level abstraction to this
|
|
||
| :::tip | ||
|
|
||
| For simplicity, the code example uses a UUID for the key. In production systems, consider using a content-addressable |
There was a problem hiding this comment.
Let's fix this once @jmaeagle99 settles on a scheme for S3. Mistakes here are pretty brutal as I learned from DataDog--you may be left with a bunch of data that you can't figure out ownership of or how to delete.
| :::tip | ||
|
|
||
| For simplicity, the code example uses a UUID for the key. In production systems, consider using a content-addressable | ||
| key like a SHA-256 hash, which can help you deduplicate payloads and reduce storage costs. |
There was a problem hiding this comment.
However, DataDog did this globally and couldn't delete their old data because they couldn't figure out which workflows owned which data.
Likely, content-addressing should be scoped within a namespace/workflow ID and content-hashed inside of that.
| converter = DataConverter( | ||
| external_storage=ExternalStorage( | ||
| drivers=[driver], | ||
| payload_size_threshold=256 * 1024, # 256 KiB (default) |
There was a problem hiding this comment.
I vote not to include it and leave it to more "advanced" use cases (e.g. all data should go to external storage). We want this to be a default value (so not required).
| ```python | ||
| from temporalio.contrib.aws.s3driver import S3StorageDriver, S3StorageDriverClient | ||
|
|
||
| driver_client = S3StorageDriverClient() |
There was a problem hiding this comment.
Yeah, the code should look more like:
import aioboto3
from temporalio.contrib.aws.s3driver import S3StorageDriver
from temporalio.contrib.aws.s3driver.aioboto3 import new_aioboto3_client
session = aioboto3.Session(profile_name=AWS_PROFILE, region_name=AWS_REGION)
async with session.client("s3") as s3_client:
driver = S3StorageDriver(
client=new_aioboto3_client(s3_client),
bucket="my-temporal-payloads",
)| ), | ||
| ) | ||
|
|
||
| client = await Client.connect("localhost:7233", data_converter=converter) |
There was a problem hiding this comment.
Should we be promoting env config over explicitly setting the target?
15e8f55 to
1a65f55
Compare
What does this PR do?
Notes to reviewers
┆Attachments: EDU-6097 docs: Add Data handling section for Python and TypeScript SDKs