Skip to content

Fix optimistic-root persistence breaking conversation restore (QUALITY-774)#11814

Merged
cephalonaut merged 2 commits into
masterfrom
matthew/fix-optimistic-root-persistence
May 28, 2026
Merged

Fix optimistic-root persistence breaking conversation restore (QUALITY-774)#11814
cephalonaut merged 2 commits into
masterfrom
matthew/fix-optimistic-root-persistence

Conversation

@cephalonaut
Copy link
Copy Markdown
Contributor

@cephalonaut cephalonaut commented May 28, 2026

Description

Fix-forward for the bug introduced by #10794 where Agent Mode conversations can fail to restore after a quit/restart, sometimes producing the server error Response stream finished unexpectedly with internal error: Not found: Tried to add messages to non-existent task (id=).

Root cause. PR #10794 added eager persist_conversation_state calls at several sites (start_new_child_conversation, initialize_output_for_response_stream, assign_run_id_for_conversation, mark_conversation_as_remote_child) that fire while the root task is still Optimistic(Root). Task::source_for_persistence was synthesizing a stub api::Task for the optimistic root, which was written to agent_tasks keyed by the client-generated UUID. After the server upgraded the root, a second row was written keyed by the server task id; the stub row was never deleted. On restore, AIConversation::new_restored saw two parentless tasks and HashMap iteration order non-deterministically picked between them — when the stub won, restore left the conversation stuck with an empty root and subtask parent_task_id references pointing at a server root id that no longer existed in the task store.

Fix. Two coordinated changes:

  1. Writer side (Task::source_for_persistence): returns None for Optimistic(Root), matching the existing Optimistic(CLIAgent(_)) behavior. No new stub rows are written. AgentConversationData.root_task_is_optimistic is retained on the persistence model for backward-compatible deserialization of existing local DB rows, but is no longer written and is ignored on restore.

  2. Restore side (AIConversation::new_restored):

    • Two single-purpose constructors. new_restored is strict (returns Err(NoRootTask) for an empty tasks list) and is used by cloud-restore and fork-insert, where an empty payload is malformed input. new_restored_synthesizing_on_empty is a small wrapper that synthesizes a fresh in-progress Optimistic(Root) conversation when tasks is empty (mirroring AIConversation::new()) and is used by the local-DB restore path, where an empty agent_tasks row is the normal shape for child conversations persisted before their first server response.
    • Restore-side dedupe: when multiple parentless tasks exist (legacy DB case), prefer the candidate with non-empty messages. Silently heals pre-fix rows without a SQLite migration.

Removes Task::new_restored_optimistic_root and Task::is_optimistic_root_task; replaces the previous PersistedConversationFields helper by inlining the overlay logic into the single constructor. Adds a #[cfg(test)] pub(crate) helper for driving root upgrades in tests.

Linked Issue

QUALITY-774 — fix-forward of #10794.

  • The linked issue is labeled ready-to-spec or ready-to-implement.
  • Where appropriate, screenshots or a short video of the implementation are included below (especially for user-visible or UI changes).

Testing

  • New deterministic fail-first writer-side test test_persist_with_optimistic_root_emits_event_with_no_task_rows — proves the writer bug exists on master and disappears with the fix.
  • New defense-in-depth tests in app/src/ai/blocklist/history_model_tests.rs:
    • test_optimistic_root_upgrade_then_persist_emits_event_with_single_server_task_row
    • test_optimistic_root_restore_round_trip_yields_in_progress_optimistic_root
    • test_truncate_from_exchange_to_empty_persist_event_has_empty_updated_tasks
    • test_two_restart_cycles_keep_exactly_one_server_root_task_row
  • New restore-semantics tests in app/src/ai/agent/conversation_tests.rs:
    • restored_conversation_with_empty_task_list_creates_in_progress_optimistic_root
    • restored_conversation_ignores_legacy_root_task_is_optimistic_flag_with_non_empty_tasks
    • restored_conversation_ignores_legacy_root_task_is_optimistic_flag_with_empty_tasks
    • new_restored_with_empty_task_list_returns_no_root_task_error
    • test_new_restored_prefers_parentless_task_with_messages_over_empty_stub (50-iteration loop per Vec ordering to catch HashMap-order regressions)
  • Existing agent_conversation_data_roundtrips_optimistic_root_marker in crates/persistence/src/model.rs still passes, pinning back-compat deserialization of the legacy field.
  • ./script/presubmit passes end-to-end.

Note on legacy data: This PR does not physically remove pre-existing orphan stub rows in dev/preview local SQLite, but the restore-side dedupe silently ignores them. Affected internal users should not need to clear local AI history.

  • I have manually tested my changes locally with ./script/run

Agent Mode

  • Warp Agent Mode - This PR was created via Warp's AI Agent Mode

Conversation: https://staging.warp.dev/conversation/3fc61b76-c75b-4e86-bc4a-6f4534e44abb
Plan: https://staging.warp.dev/drive/notebook/EAwKh7EVES70otTsIdkZ9u

@cla-bot cla-bot Bot added the cla-signed label May 28, 2026
…Y-774)

PR #10794 introduced eager `persist_conversation_state` calls at several
sites (start_new_child_conversation, initialize_output_for_response_stream,
assign_run_id_for_conversation, mark_conversation_as_remote_child) that fire
while the root task is still `Optimistic(Root)`. Combined with
`Task::source_for_persistence` synthesizing a stub `api::Task` for
optimistic roots, this wrote orphan rows to `agent_tasks` keyed by the
client-generated UUID. After the server upgraded the root, a second row
was written keyed by the server task id; on restore, HashMap iteration
order non-deterministically picked between the two parentless tasks,
sometimes leaving the conversation stuck with an empty stub root.

Two-pronged fix:

1. Writer side: `Task::source_for_persistence` returns `None` for
   `Optimistic(Root)` (matching `Optimistic(CLIAgent(_))`). No new stub
   rows are written. `AgentConversationData.root_task_is_optimistic` is
   retained for backward-compatible deserialization but is no longer
   written and is ignored on restore.

2. Restore side:
   - Two-constructor pattern. `AIConversation::new_restored` is strict
     (`Err(NoRootTask)` on empty input) — used for cloud-restore and
     fork-insert. `AIConversation::new_restored_synthesizing_on_empty`
     synthesizes a fresh in-progress optimistic root when `tasks` is
     empty — used for local-DB restore, where the empty shape is now
     normal for child conversations persisted before their first server
     response.
   - Dedupe heuristic: when multiple parentless tasks exist (legacy DB
     case), prefer the candidate with non-empty `messages`. Heals
     pre-fix rows without a migration.

Removes `Task::new_restored_optimistic_root`, `Task::is_optimistic_root_task`,
and the prior `PersistedConversationFields` helper; adds a
`#[cfg(test)]` helper for driving root upgrades in tests.

Co-Authored-By: Oz <oz-agent@warp.dev>
@cephalonaut cephalonaut force-pushed the matthew/fix-optimistic-root-persistence branch from 6a540dc to fc1135e Compare May 28, 2026 05:45
@cephalonaut cephalonaut marked this pull request as ready for review May 28, 2026 05:46
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 28, 2026

@cephalonaut

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR stops persisting optimistic root task stubs, adds a lenient local-DB restore path for empty task lists, and tries to prefer the real root when legacy rows contain both an empty optimistic stub and a server root.

Concerns

  • Returning no persisted task rows for an optimistic root does not clear any previously persisted task rows because the SQLite writer only upserts tasks. Truncating/resetting a previously server-backed conversation can still restore stale rows after restart.
  • The new legacy parentless-task dedupe is bypassed by the existing historical-conversation restorable prefilter, so affected legacy rows with two parentless tasks can still be dropped before new_restored gets a chance to heal them.

Verdict

Found: 0 critical, 2 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

// real server root) over an empty stub. If multiple have messages
// or none have messages, fall back to the first-encountered
// candidate.
let root_task_pick = parentless_candidates
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This dedupe is not reached by the historical startup path: initialize_historical_conversations still calls AgentConversation::is_restorable() first, and that rejects legacy rows with the empty stub plus real root as two parentless tasks. Apply the legacy-root filtering before that prefilter or relax is_restorable for this shape so affected conversations are not dropped from history/metadata before restore.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one needs a careful relaxation of is_restorable to allow the stub/real conversation case to restore.

Comment thread app/src/ai/agent/task.rs
// produces an orphan row in `agent_tasks` that survives the later
// server-side upgrade and breaks restore by competing with the
// real server root for parentless-task selection. See QUALITY-774.
TaskImpl::Optimistic(optimistic::Task::Root) => None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] upsert_agent_conversation only upserts provided tasks and never deletes omitted rows, so returning None here leaves previously persisted task rows behind when an existing conversation is reset back to an optimistic root (for example truncate-to-empty). After restart the local DB loader can restore the stale server row instead of the intended empty in-progress conversation; the persist path needs replace/delete semantics for rows not present in updated_tasks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something we can defer. Basically I think this only applies to conversations that get rewound to zero and then have no new exchanges before restart.

…Y-774)

Pre-QUALITY-774, the optimistic-root writer bug could leave a conversation
with two parentless rows in agent_tasks: an empty stub keyed by the
optimistic UUID alongside the real server-rooted task. The fix for the
writer (this PR) plus the restore-side dedupe in AIConversation::new_restored
deterministically pick the real root in that shape.

However, the history-list restore path goes through
BlocklistAIHistoryModel::initialize_historical_conversations, which
prefilters via AgentConversation::is_restorable before invoking the
constructor. The old predicate rejected any conversation with more than
one parentless task, so the dedupe never had a chance to run for legacy
conversations on the history list — they were silently dropped.

Relax the predicate to permit multi-root conversations iff exactly one
parentless task has non-empty messages (the [stub + real] shape).
Multi-root conversations with multiple real roots or no real roots
remain rejected (genuinely ambiguous or malformed).

Co-Authored-By: Oz <oz-agent@warp.dev>
@cephalonaut cephalonaut requested a review from advait-m May 28, 2026 14:18
@szgupta szgupta requested a review from kjankov May 28, 2026 16:54
Copy link
Copy Markdown
Contributor

@seemeroland seemeroland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for the fix

@cephalonaut cephalonaut merged commit 82ec31f into master May 28, 2026
26 checks passed
@cephalonaut cephalonaut deleted the matthew/fix-optimistic-root-persistence branch May 28, 2026 17:41
cephalonaut added a commit that referenced this pull request May 29, 2026
Upstream PR #11814 ("Fix optimistic-root persistence breaking conversation restore", QUALITY-774) covers the optimistic-stub case via two coordinated changes: Task::source_for_persistence returns None for Optimistic(Root) so no new stub rows are written, and AIConversation::new_restored deduplicates multi-root payloads by preferring the parentless task with non-empty messages. AgentConversation::is_restorable was relaxed to accept the multi-root [stub + real] shape so legacy DB rows still load.

This commit updates specs/QUALITY-768/TECH.md to mark change 3 (the read-time optimistic-stub filter) as superseded by upstream and to drop the corresponding test coverage from the doc. The code change itself happened during the rebase by dropping commit d5ca6ab and accepting upstream's version of model.rs and conversation_loader.rs for the three subsequent commits that touched them.

Co-Authored-By: Oz <oz-agent@warp.dev>
cephalonaut added a commit that referenced this pull request May 29, 2026
Upstream PR #11814 ("Fix optimistic-root persistence breaking conversation restore", QUALITY-774) covers the optimistic-stub case via two coordinated changes: Task::source_for_persistence returns None for Optimistic(Root) so no new stub rows are written, and AIConversation::new_restored deduplicates multi-root payloads by preferring the parentless task with non-empty messages. AgentConversation::is_restorable was relaxed to accept the multi-root [stub + real] shape so legacy DB rows still load.

This commit updates specs/QUALITY-768/TECH.md to mark change 3 (the read-time optimistic-stub filter) as superseded by upstream and to drop the corresponding test coverage from the doc. The code change itself happened during the rebase by dropping commit d5ca6ab and accepting upstream's version of model.rs and conversation_loader.rs for the three subsequent commits that touched them.

Co-Authored-By: Oz <oz-agent@warp.dev>
cephalonaut added a commit that referenced this pull request May 29, 2026
Upstream PR #11814 ("Fix optimistic-root persistence breaking conversation restore", QUALITY-774) covers the optimistic-stub case via two coordinated changes: Task::source_for_persistence returns None for Optimistic(Root) so no new stub rows are written, and AIConversation::new_restored deduplicates multi-root payloads by preferring the parentless task with non-empty messages. AgentConversation::is_restorable was relaxed to accept the multi-root [stub + real] shape so legacy DB rows still load.

This commit updates specs/QUALITY-768/TECH.md to mark change 3 (the read-time optimistic-stub filter) as superseded by upstream and to drop the corresponding test coverage from the doc. The code change itself happened during the rebase by dropping commit d5ca6ab and accepting upstream's version of model.rs and conversation_loader.rs for the three subsequent commits that touched them.

Co-Authored-By: Oz <oz-agent@warp.dev>
cephalonaut added a commit that referenced this pull request Jun 2, 2026
#11722)

## Description

Demo: https://www.loom.com/share/46634b5c2513469aa4e69cb0f09cd03a

Restarting Warp could leave orchestration sessions in four broken
states: child agents rendered as "Unknown agent" in the pill bar,
restored remote-child panes showed a blank "New conversation"
placeholder instead of the cloud transcript, a cloud-parent
orchestration restored correctly on the first restart but lost its
cloud-mode shape on the second, and the on-disk prune split
orchestration trees across restarts. This PR addresses all four.

**Eager orchestration-child hydration at startup.**
`initialize_historical_conversations` now inserts orchestration children
into `BlocklistAIHistoryModel::conversations_by_id` at load time, so the
pill bar and transcript name resolver find them before any pane
materializes.

**Direct `AmbientAgentTask` inspection for remote-child hydration.**
Restored remote-child placeholders went through
`AgentConversationsModel::resolve_open_action`, which once the local
placeholder existed collapsed "navigate to local" and "hydrate cloud
transcript" into the same outcome — so the transcript was never loaded.
Hidden-pane hydration now inspects the `AmbientAgentTask` directly via a
pure `decide_remote_child_hydration_action` and routes all four arms
through a single tombstone gate keyed on whether the task is in a
terminal state.

**Cloud-mode shared-session viewer snapshot/restore loop.** A
cloud-parent orchestration's pane is a shared-session viewer locally.
The restore path used the non-deferred
`viewer::TerminalManager::new(..)` which hardcoded `is_cloud_mode =
false`, so the restored `TerminalView` had no
`ambient_agent_view_model`. Restart 1 still rendered correctly because
the network's `JoinedSuccessfully` handler spins up an
`OrchestrationViewerModel` regardless, but the snapshot path keys off
`ambient_agent_view_model` to choose between `LeafContents::AmbientAgent
{ task_id }` and the fallback empty `LeafContents::Terminal` — so the
next shutdown silently dropped the `task_id` and restart 2
re-materialized the parent as a plain local terminal with no
orchestration UI. The fix threads `is_cloud_mode: bool` through
`create_shared_session_viewer` and passes `true` from the two
ambient-agent restoration call sites; the restored pane's shape now
matches a freshly created cloud-mode pane.

**Tree-aware persisted-conversation prune.** The on-disk prune treated
rows individually, so parents and children of the same orchestration
session could be evicted independently and the next boot would
re-encounter a split tree. The prune is now tree-aware: orchestration
trees are sorted freshest-first by `max(member.last_modified_at)` and
kept atomically, with the freshest tree always retained intact. The
retention cap moved from 100 to 200 to fit typical orchestration session
sizes.

A fifth issue originally in scope — local-no-harness Oz children
persisting an optimistic stub task alongside the real upgraded root —
was independently addressed by PR #11814 (QUALITY-774); this PR was
rebased onto that fix and drops its earlier read-time stub filter.

The post-facto technical spec at `specs/QUALITY-768/TECH.md` documents
the implementation as it ships.

## Testing

- [x] I have manually tested my changes locally with `./script/run`

Manual validation: local-local restart shows correct agent names;
local-remote restart shows the cloud transcript instead of the blank
placeholder; cloud-parent restart-restart preserves the orchestration UI
on both restarts; tree-aware prune keeps orchestration trees together on
disk; baseline single-conversation restore unchanged.

Automated coverage added across the dispatch decision, the
cloud-transcript merge seam, the cloud-mode viewer snapshot contract,
and the tree-aware eviction. `cargo fmt`, `cargo clippy --all-targets
--features local_fs -- -D warnings`, and `cargo nextest run` across the
affected crates all pass.

## Agent Mode
- [x] Warp Agent Mode - This PR was created via Warp's AI Agent Mode

CHANGELOG-BUG-FIX: Fixed orchestration session restoration so the pill
bar, child transcript name resolution, restored remote-child cloud
transcripts, and cloud-parent panes all work correctly after a Warp
restart.

---------

Co-authored-by: Oz <oz-agent@warp.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants