Skip to content

Use proto-serialized context metadata in gRPC trailers#10269

Merged
nikki-dag merged 4 commits into
temporalio:mainfrom
nikki-dag:nikki/proto-context-metadata-trailer
May 18, 2026
Merged

Use proto-serialized context metadata in gRPC trailers#10269
nikki-dag merged 4 commits into
temporalio:mainfrom
nikki-dag:nikki/proto-context-metadata-trailer

Conversation

@nikki-dag
Copy link
Copy Markdown
Contributor

@nikki-dag nikki-dag commented May 14, 2026

What changed

  • Replaced per-key trailer format with a single protobuf ContextMetadata message serialized into contextmetadata-bin trailer key
  • gRPC automatically base64-encodes the -bin value, making arbitrary bytes (including HTTP/2-unsafe control chars) transport-safe
  • Writer emits both proto format and legacy per-key format for backward compatibility during rolling deploys
  • Reader prefers proto key, falls back to legacy per-key format for old writers
  • Wired TrailerToContextMetadataInterceptor in test server to match production behavior

Why

Workflow type names containing control characters (newlines, NUL, etc.) cause the gRPC HTTP/2 framer to reject trailer values. A single proto message in a -bin key is simpler than per-key -bin suffixes: one trailer key, one serialization, no key naming constraints, cleaner backward compat removal.

How tested

  • Unit tests for proto round-trip, dual-format emission, reader preference, legacy fallback, HTTP/2 safety
  • Integration test suite (TestWorkflowTypeEncodingSuite) with control chars, UTF-8, long names, -bin suffix workflow types
  • All existing tests pass

Risks

  • During rolling deploy, old writers emit only legacy keys. New readers handle this via fallback path. No data loss.
  • After full rollout, legacy key emission can be removed in a follow-up.

Note

Medium Risk
Changes how context metadata is encoded/decoded in gRPC trailers, which can affect cross-version compatibility and observability of propagated metadata. Backward-compatible legacy fallback and extensive unit/integration tests reduce the rollout risk.

Overview
Switches context-metadata propagation in gRPC trailers to a single proto-encoded payload. Server-side ContextMetadataInterceptor now serializes all context metadata into a new ContextMetadata protobuf and emits it under contextmetadata-bin, avoiding HTTP/2-unsafe control characters in values.

Maintains rolling-deploy compatibility. Writers still emit legacy per-key trailers (skipping unsafe values), and the client-side TrailerToContextMetadataInterceptor now prefers the proto trailer and falls back to legacy keys (including unprefixed well-known keys) when needed.

Adds the new contextpropagation/v1 proto + generated Go types, plus unit tests around proto/legacy behavior and an integration suite (WorkflowTypeEncodingSuite) covering control characters, UTF-8, long names, and -bin suffix workflow types.

Reviewed by Cursor Bugbot for commit e1d772f. Bugbot is set up for automated code reviews on this repo. Configure here.

@nikki-dag nikki-dag requested review from a team as code owners May 14, 2026 19:30
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 5555616. Configure here.

Comment thread common/rpc/interceptor/context_metadata_interceptor.go
@nikki-dag nikki-dag force-pushed the nikki/proto-context-metadata-trailer branch from 5709a34 to cbd44fc Compare May 14, 2026 21:09
Copy link
Copy Markdown
Contributor

@simvlad simvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

protoMsg.Entries[key] = fmt.Sprint(value)
}
if protoBytes, err := proto.Marshal(protoMsg); err != nil {
c.throttledLogger.Warn("ContextMetadataInterceptor: Failed to marshal proto metadata, falling back to legacy-only",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This would be quite hard to check when we want to remove legacy format. Pass metricsHandler if possible

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is not currently used by the OSS. We should be able to track SaaS with version numbers.

// Skip entries with HTTP/2-unsafe values (the proto key handles those).
for key, value := range allMetadata {
valStr := fmt.Sprint(value)
if !isHTTP2SafeValue(valStr) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means you can now potentially miss some of the values (whereas in the past they would fail to start the workflow). It's probably fine, but I'm unsure how exactly those are used in metering, so flagging.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ok.

nikki-dag and others added 3 commits May 18, 2026 09:12
Replace per-key trailer format with a single protobuf ContextMetadata
message serialized into a contextmetadata-bin trailer key. gRPC
automatically base64-encodes -bin values, making arbitrary bytes
(including HTTP/2-unsafe control chars) transport-safe.

Writer emits both proto and legacy per-key format for backward
compatibility during rolling deploys. Reader prefers proto key,
falls back to legacy per-key format for old writers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename contextpropagationpb → contextpropagationspb per project lint rules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Writer now logs a Warn when proto.Marshal fails instead of silently
dropping the proto key. Reader logs a Warn when proto.Unmarshal fails
before falling back to legacy format. Both use throttled loggers to
prevent log storms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nikki-dag nikki-dag force-pushed the nikki/proto-context-metadata-trailer branch from cbd44fc to 155a4e4 Compare May 18, 2026 14:13
@nikki-dag nikki-dag enabled auto-merge (squash) May 18, 2026 14:48
@nikki-dag nikki-dag merged commit 573c2f7 into temporalio:main May 18, 2026
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants