[core] Combine initial run fetch, event fetch, and run_started event creation#1569
[core] Combine initial run fetch, event fetch, and run_started event creation#1569VaguelySerious wants to merge 2 commits intomainfrom
Conversation
…creation Signed-off-by: Peter Wielander <mittgfu@gmail.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🦋 Changeset detectedLatest commit: 42ffc92 The changes in this PR will be included in the next version bump. This PR includes changesets to release 20 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🌍 Community Worlds (59 failed)mongodb (2 failed):
redis (2 failed):
turso (55 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
❌ Some benchmark jobs failed:
Check the workflow run for details. |
TooTallNate
left a comment
There was a problem hiding this comment.
Nice optimization — combining the run fetch, event fetch, and run_started creation into a single round-trip is a clean TTFB win. The overall approach is sound, but there's a correctness bug in the postgres preloaded events path that needs fixing before merge, plus a couple of non-blocking concerns.
Summary of findings:
- Blocking: Missing
eventData ||= eventDataJsonlegacy fallback in postgres preloaded events (see inline comment) - Blocking: Early return for already-running runs in postgres re-fetches the run but doesn't return preloaded events, creating an asymmetry with the local world (see inline comment)
- Non-blocking: The
(result as any).eventDatadelete in postgres is redundant after the spread logic already excludes it — minor style nit - Non-blocking:
world-verceldoesn't surface theeventsfield yet, but since it's optional and the runtime falls back gracefully, this is fine for now
| .from(Schema.events) | ||
| .where(eq(Schema.events.runId, effectiveRunId)) | ||
| .orderBy(Schema.events.eventId); | ||
| allEvents = eventRows.map((e) => EventSchema.parse(compact(e))); |
There was a problem hiding this comment.
Blocking: Missing eventData ||= eventDataJson legacy fallback.
Every other read path in this file (events.get, events.list, events.listByCorrelationId) applies v.eventData ||= v.eventDataJson before parsing. This preloaded events path skips it, which means legacy events that only have data in the payload (jsonb) column — and not in payload_cbor — will silently have eventData: undefined during replay, breaking those workflows.
Suggested fix:
allEvents = eventRows.map((e) => {
e.eventData ||= e.eventDataJson;
return EventSchema.parse(compact(e));
});(The Vercel bot comment also suggests applying stripEventDataRefs here. Whether that's needed depends on whether the runtime expects resolved or raw refs — getAllWorkflowRunEvents uses the default resolveData: 'all' so applying it would be consistent, though the runtime would work either way since it resolves everything.)
There was a problem hiding this comment.
Fixed — added eventData ||= eventDataJson and stripEventDataRefs to match the pattern in events.list and other read paths. Good catch on the legacy fallback.
| .limit(1); | ||
| if (fullRun) { | ||
| return { run: deserializeRunError(compact(fullRun)) }; | ||
| } |
There was a problem hiding this comment.
Blocking: When the run is already running, this returns { run } with no events field. The runtime will then fall back to getAllWorkflowRunEvents(), which is fine functionally — but it means re-invocations of an already-running workflow get no TTFB benefit from the preloaded events optimization.
More importantly, this early return happens before the event insertion, so it short-circuits the rest of the create method. That means it also skips the requestId idempotency check. If two concurrent invocations race, the first one transitions pending→running and continues, but the second hits this branch and also continues with a valid { run } result. Is that the intended behavior? Normally a duplicate run_started would be guarded by the DB unique constraint on eventId, but here we skip insertion entirely.
If this is intentional (i.e., it's safe for multiple invocations to proceed since replay is deterministic), a comment explaining that would help. If not, this should at minimum return the preloaded events for consistency, or throw an EntityConflictError to signal the duplicate.
There was a problem hiding this comment.
Added a comment explaining the idempotency semantics. The early return is intentional — replay is deterministic so concurrent callers proceeding with the same run is safe. Skipping preloaded events here is acceptable since this is a rare race-condition path and the runtime falls back to getAllWorkflowRunEvents().
| // If already running, return the run directly without | ||
| // creating a duplicate event. | ||
| if (currentRun.status === 'running') { | ||
| return { run: currentRun }; |
There was a problem hiding this comment.
Non-blocking (same concern as postgres): This early return for already-running also returns { run: currentRun } with no event and no events. The runtime will fall back to getAllWorkflowRunEvents() which is fine, but worth a comment explaining why skipping the preloaded events is acceptable here (presumably because this path is a rare race condition).
There was a problem hiding this comment.
Added the comment — same rationale as the postgres path (rare race condition, runtime falls back to full event fetch).
| specVersion: SPEC_VERSION_CURRENT, | ||
| }, | ||
| { requestId } | ||
| ); |
There was a problem hiding this comment.
Non-blocking observation: Now that runs.get is removed and the runtime always calls events.create(run_started), the behavior changes for runs that are already running. Previously, runs.get would succeed and the if (status === 'pending') guard would skip the run_started creation. Now, every invocation attempts run_started regardless of status.
This works because the world implementations have an early-return for already-running runs — but it's a semantic change worth noting in the PR description. The contract is now: events.create('run_started') must be idempotent for running status (return the run without error), not just for pending → running transitions.
There was a problem hiding this comment.
Good callout — added a comment documenting this contract: events.create('run_started') must be idempotent for runs already in running status, not just for pending → running transitions.
…ents Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This PR extracts some of the safer performance improvements originally added as part of #1537