Skip to content

Add client-side retry for oz starting shared sessions#12111

Merged
seemeroland merged 9 commits into
masterfrom
roland/shared-session-creation-retry
Jun 4, 2026
Merged

Add client-side retry for oz starting shared sessions#12111
seemeroland merged 9 commits into
masterfrom
roland/shared-session-creation-retry

Conversation

@seemeroland
Copy link
Copy Markdown
Contributor

@seemeroland seemeroland commented Jun 2, 2026

Description

Motivation for this: https://linear.app/warpdotdev/issue/REMOTE-1802/session-sharing-server-times-out-creating-session

When we fail to start session-sharing-server for retryable reasons in an oz run, we make up to 3 attempts, with each attempt getting 5s (total 20s wait on the oz agent driver).

Testing

Tested with local session-sharing-server injecting errors. The first two succeed on the third attempt and run proceeds normally. Manual user sharing also still works

Inject two internal server errors

2026-06-03T18:14:04Z [INFO] [warp::terminal::view] Terminal bootstrapped with pending shared session; attempting to share
2026-06-03T18:14:04Z [INFO] [warp::terminal::view::shared_session::view_impl] Emitting request to start sharing current session
2026-06-03T18:14:04Z [INFO] [warpui_core::core::app] dispatching global action for workspace:save_app
2026-06-03T18:14:04Z [INFO] [warp::terminal::local_tty::terminal_manager] Starting shared session
2026-06-03T18:14:04Z [INFO] [warp::terminal::local_tty::terminal_manager] Shared session local lifecycle: event=start_requested session_id=None source_type=ambient_agent source_task_id=Some("019e8eb1-3119-7119-9616-17506db20222") trigger=terminal_view_start_sharing
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_started session_id=None source_type=ambient_agent source_task_id=Some("019e8eb1-3119-7119-9616-17506db20222") attempt=1 max_attempts=3
2026-06-03T18:14:04Z [INFO] [warp::terminal::model::session] Loading history from file /Users/rolandhuang/.zsh_history for shell zsh
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Connecting to session sharing server
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Connected to session sharing server; preparing initialization
2026-06-03T18:14:04Z [INFO] [warp::ai::agent_sdk::driver] Starting 0 existing and 0 ephemeral MCP servers
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Sent session sharing initialization message
2026-06-03T18:14:04Z [INFO] [warp::ai::agent_sdk::driver] Selecting base agent model auto-efficient (from agent driver)
2026-06-03T18:14:04Z [INFO] [warpui_core::core::app] dispatching global action for workspace:save_app
2026-06-03T18:14:04Z [INFO] [warp::terminal::model::ansi] Received CommandFinished hook
2026-06-03T18:14:04Z [INFO] [warp::terminal::model::terminal_model] Tried to exit the alternate screen, but it was already inactive
2026-06-03T18:14:04Z [INFO] [warp::terminal::model::block] Block finished with new state DoneWithNoExecution
2026-06-03T18:14:04Z [INFO] [warp::terminal::model::blocks] Incrementing stage from Bootstrapped to PostBootstrapPrecmd
2026-06-03T18:14:04Z [INFO] [warpui_core::core::app] dispatching typed action: warp::pane_group::PaneGroupAction::HandleFocusChange
2026-06-03T18:14:04Z [WARN] [warpui_core::core::app] Action HandleFocusChange was dispatched, but no view handled it
2026-06-03T18:14:04Z [WARN] [warp::terminal::shared_session::sharer::network] Failed to initialize session: InternalServerError { details: "" }
2026-06-03T18:14:04Z [WARN] [warp::terminal::shared_session::sharer::network] Shared session creation attempt failed; source_type=ambient_agent source_task_id=Some("019e8eb1-3119-7119-9616-17506db20222") attempt=1 max_attempts=3 reason=server_internal_error outcome=retry_pending
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_failed session_id=None source_type=ambient_agent source_task_id=Some("019e8eb1-3119-7119-9616-17506db20222") attempt=1 max_attempts=3 reason=server_internal_error outcome=retry_pending
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_started session_id=None source_type=ambient_agent source_task_id=Some("019e8eb1-3119-7119-9616-17506db20222") attempt=2 max_attempts=3
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Closing websocket to session sharing server as sharer
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Connecting to session sharing server
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Connected to session sharing server; preparing initialization
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Sent session sharing initialization message
2026-06-03T18:14:04Z [INFO] [warp::terminal::model::ansi] Received Precmd hook
2026-06-03T18:14:04Z [WARN] [warp::terminal::shared_session::sharer::network] Failed to initialize session: InternalServerError { details: "" }
2026-06-03T18:14:04Z [WARN] [warp::terminal::shared_session::sharer::network] Shared session creation attempt failed; source_type=ambient_agent source_task_id=Some("019e8eb1-3119-7119-9616-17506db20222") attempt=2 max_attempts=3 reason=server_internal_error outcome=retry_pending
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_failed session_id=None source_type=ambient_agent source_task_id=Some("019e8eb1-3119-7119-9616-17506db20222") attempt=2 max_attempts=3 reason=server_internal_error outcome=retry_pending
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Closing websocket to session sharing server as sharer
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_started session_id=None source_type=ambient_agent source_task_id=Some("019e8eb1-3119-7119-9616-17506db20222") attempt=3 max_attempts=3
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Connecting to session sharing server
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Connected to session sharing server; preparing initialization
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Sent session sharing initialization message
2026-06-03T18:14:04Z [INFO] [warp::terminal::model::ansi] Received InputBuffer hook
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Successfully created shared session.
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_succeeded session_id=None source_type=ambient_agent source_task_id=Some("019e8eb1-3119-7119-9616-17506db20222") attempt=3 max_attempts=3
2026-06-03T18:14:04Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=session_initialized session_id=Some(SessionId(a8d6e014-c43c-42ce-b713-7c929315745a)) source_type=ambient_agent source_task_id=Some("019e8eb1-3119-7119-9616-17506db20222") outcome=active_sharer
2026-06-03T18:14:04Z [INFO] [warp::terminal::local_tty::terminal_manager] Shared session local lifecycle: event=session_established session_id=Some(SessionId(a8d6e014-c43c-42ce-b713-7c929315745a)) source_type=ambient_agent source_task_id=Some("019e8eb1-3119-7119-9616-17506db20222") outcome=active_sharer

Inject 2 hangs (no session initialized sent)

2026-06-03T18:17:29Z [INFO] [warp::terminal::view] Terminal bootstrapped with pending shared session; attempting to share
2026-06-03T18:17:29Z [INFO] [warp::terminal::view::shared_session::view_impl] Emitting request to start sharing current session
2026-06-03T18:17:29Z [INFO] [warpui_core::core::app] dispatching global action for workspace:save_app
2026-06-03T18:17:29Z [INFO] [warp::terminal::local_tty::terminal_manager] Starting shared session
2026-06-03T18:17:29Z [INFO] [warp::terminal::local_tty::terminal_manager] Shared session local lifecycle: event=start_requested session_id=None source_type=ambient_agent source_task_id=Some("019e8eb4-565f-7ab6-b880-fc98116cf02d") trigger=terminal_view_start_sharing
2026-06-03T18:17:29Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_started session_id=None source_type=ambient_agent source_task_id=Some("019e8eb4-565f-7ab6-b880-fc98116cf02d") attempt=1 max_attempts=3
2026-06-03T18:17:29Z [INFO] [warp::terminal::shared_session::sharer::network] Connecting to session sharing server
2026-06-03T18:17:29Z [INFO] [warp::terminal::shared_session::sharer::network] Connected to session sharing server; preparing initialization
2026-06-03T18:17:29Z [INFO] [warp::terminal::shared_session::sharer::network] Sent session sharing initialization message
2026-06-03T18:17:29Z [INFO] [warp::ai::agent_sdk::driver] Starting 0 existing and 0 ephemeral MCP servers
2026-06-03T18:17:29Z [INFO] [warp::ai::agent_sdk::driver] Selecting base agent model auto-efficient (from agent driver)
2026-06-03T18:17:29Z [INFO] [warpui_core::core::app] dispatching global action for workspace:save_app
2026-06-03T18:17:29Z [INFO] [warp::terminal::model::ansi] Received CommandFinished hook
2026-06-03T18:17:29Z [INFO] [warp::terminal::model::terminal_model] Tried to exit the alternate screen, but it was already inactive
2026-06-03T18:17:29Z [INFO] [warp::terminal::model::block] Block finished with new state DoneWithNoExecution
2026-06-03T18:17:29Z [INFO] [warp::terminal::model::blocks] Incrementing stage from Bootstrapped to PostBootstrapPrecmd
2026-06-03T18:17:29Z [INFO] [warpui_core::core::app] dispatching typed action: warp::pane_group::PaneGroupAction::HandleFocusChange
2026-06-03T18:17:29Z [WARN] [warpui_core::core::app] Action HandleFocusChange was dispatched, but no view handled it
2026-06-03T18:17:29Z [INFO] [warp::terminal::model::ansi] Received Precmd hook
2026-06-03T18:17:29Z [INFO] [warp::terminal::model::ansi] Received InputBuffer hook
2026-06-03T18:17:34Z [WARN] [warp::terminal::shared_session::sharer::network] Shared session creation attempt failed; source_type=ambient_agent source_task_id=Some("019e8eb4-565f-7ab6-b880-fc98116cf02d") attempt=1 max_attempts=3 reason=timeout outcome=retry_pending
2026-06-03T18:17:34Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_failed session_id=None source_type=ambient_agent source_task_id=Some("019e8eb4-565f-7ab6-b880-fc98116cf02d") attempt=1 max_attempts=3 reason=timeout outcome=retry_pending
2026-06-03T18:17:34Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_started session_id=None source_type=ambient_agent source_task_id=Some("019e8eb4-565f-7ab6-b880-fc98116cf02d") attempt=2 max_attempts=3
2026-06-03T18:17:34Z [INFO] [warp::terminal::shared_session::sharer::network] Closing websocket to session sharing server as sharer
2026-06-03T18:17:34Z [INFO] [warp::terminal::shared_session::sharer::network] Connecting to session sharing server
2026-06-03T18:17:34Z [INFO] [warp::terminal::shared_session::sharer::network] Connected to session sharing server; preparing initialization
2026-06-03T18:17:34Z [INFO] [warp::terminal::shared_session::sharer::network] Sent session sharing initialization message
2026-06-03T18:17:39Z [WARN] [warp::terminal::shared_session::sharer::network] Shared session creation attempt failed; source_type=ambient_agent source_task_id=Some("019e8eb4-565f-7ab6-b880-fc98116cf02d") attempt=2 max_attempts=3 reason=timeout outcome=retry_pending
2026-06-03T18:17:39Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_failed session_id=None source_type=ambient_agent source_task_id=Some("019e8eb4-565f-7ab6-b880-fc98116cf02d") attempt=2 max_attempts=3 reason=timeout outcome=retry_pending
2026-06-03T18:17:39Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_started session_id=None source_type=ambient_agent source_task_id=Some("019e8eb4-565f-7ab6-b880-fc98116cf02d") attempt=3 max_attempts=3
2026-06-03T18:17:39Z [INFO] [warp::terminal::shared_session::sharer::network] Closing websocket to session sharing server as sharer
2026-06-03T18:17:39Z [INFO] [warp::terminal::shared_session::sharer::network] Connecting to session sharing server
2026-06-03T18:17:39Z [INFO] [warp::terminal::shared_session::sharer::network] Connected to session sharing server; preparing initialization
2026-06-03T18:17:39Z [INFO] [warp::terminal::shared_session::sharer::network] Sent session sharing initialization message
2026-06-03T18:17:39Z [INFO] [warp::terminal::shared_session::sharer::network] Successfully created shared session.
2026-06-03T18:17:39Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_succeeded session_id=None source_type=ambient_agent source_task_id=Some("019e8eb4-565f-7ab6-b880-fc98116cf02d") attempt=3 max_attempts=3
2026-06-03T18:17:39Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=session_initialized session_id=Some(SessionId(89d2e8fd-9d9c-4fe8-b2e7-5229814cd62b)) source_type=ambient_agent source_task_id=Some("019e8eb4-565f-7ab6-b880-fc98116cf02d") outcome=active_sharer
2026-06-03T18:17:39Z [INFO] [warp::terminal::local_tty::terminal_manager] Shared session local lifecycle: event=session_established session_id=Some(SessionId(89d2e8fd-9d9c-4fe8-b2e7-5229814cd62b)) source_type=ambient_agent source_task_id=Some("019e8eb4-565f-7ab6-b880-fc98116cf02d") outcome=active_sharer

No session-sharing-server running (expected failure after 3 attempts):

2026-06-03T18:20:36Z [INFO] [warp::terminal::view] Terminal bootstrapped with pending shared session; attempting to share
2026-06-03T18:20:36Z [INFO] [warp::terminal::view::shared_session::view_impl] Emitting request to start sharing current session
2026-06-03T18:20:36Z [INFO] [warpui_core::core::app] dispatching global action for workspace:save_app
2026-06-03T18:20:36Z [INFO] [warp::terminal::local_tty::terminal_manager] Starting shared session
2026-06-03T18:20:36Z [INFO] [warp::terminal::local_tty::terminal_manager] Shared session local lifecycle: event=start_requested session_id=None source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") trigger=terminal_view_start_sharing
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_started session_id=None source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") attempt=1 max_attempts=3
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Connecting to session sharing server
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=initial_websocket_connect_failed session_id=None source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") outcome=transport_error
2026-06-03T18:20:36Z [WARN] [warp::terminal::shared_session::sharer::network] Shared session creation attempt failed; source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") attempt=1 max_attempts=3 reason=transport_error outcome=retry_pending cause=Failed to create shared session: IO error: Connection refused (os error 61): Connection refused (os error 61)
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_failed session_id=None source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") attempt=1 max_attempts=3 reason=transport_error outcome=retry_pending
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_started session_id=None source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") attempt=2 max_attempts=3
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Connecting to session sharing server
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=initial_websocket_connect_failed session_id=None source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") outcome=transport_error
2026-06-03T18:20:36Z [WARN] [warp::terminal::shared_session::sharer::network] Shared session creation attempt failed; source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") attempt=2 max_attempts=3 reason=transport_error outcome=retry_pending cause=Failed to create shared session: IO error: Connection refused (os error 61): Connection refused (os error 61)
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_failed session_id=None source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") attempt=2 max_attempts=3 reason=transport_error outcome=retry_pending
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_attempt_started session_id=None source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") attempt=3 max_attempts=3
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Connecting to session sharing server
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=initial_websocket_connect_failed session_id=None source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") outcome=transport_error
2026-06-03T18:20:36Z [WARN] [warp::terminal::shared_session::sharer::network] Shared session creation failed; source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") attempt=3 max_attempts=3 reason=transport_error outcome=final_failure cause=Failed to create shared session: IO error: Connection refused (os error 61): Connection refused (os error 61)
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=create_failed session_id=None source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") attempt=3 max_attempts=3 reason=transport_error outcome=final_failure
2026-06-03T18:20:36Z [ERROR] [errors::report_error] Failed to create shared session: IO error: Connection refused (os error 61): Connection refused (os error 61)
2026-06-03T18:20:36Z [WARN] [warp::terminal::local_tty::terminal_manager] Failed to create shared session: reason=InternalServerError { details: "" }, cause=Some(Failed to create shared session

Caused by:
    0: IO error: Connection refused (os error 61)
    1: Connection refused (os error 61))
2026-06-03T18:20:36Z [INFO] [warp::terminal::shared_session::sharer::network] Shared session sharer lifecycle: event=network_dropped session_id=None source_type=ambient_agent source_task_id=Some("019e8eb7-2efb-791f-8e72-183984bc07e8") stage=finished

@cla-bot cla-bot Bot added the cla-signed label Jun 2, 2026
@seemeroland seemeroland force-pushed the roland/shared-session-creation-retry branch from ec26006 to 04224d1 Compare June 3, 2026 21:14
@seemeroland seemeroland marked this pull request as ready for review June 3, 2026 21:14
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented Jun 3, 2026

@seemeroland

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

}
}

fn log_diagnostic(&self, event: &'static str, details: impl std::fmt::Display) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@captainsafia I'm removing this log because it seemed too verbose and duplicated some existing logs. I added ad hoc logs to replace some of the removals.

But I'm curious about the motivation - is the session id/source type/source task id needed on all logs for something? I'd be down to add it back or enforce these are included in the logs if needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included the source ID and the task ID in the log originally to make it easier to correlate this log with the task-specific logs that are emitted from the server. I think it might be helpful at minimum to include the task ID if the session is a cloud mode session?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we would generally know the task and session ID by the point we're looking at worker logs, but I don't see a harm in adding them back. Will add back a version of this that includes those values

oz-for-oss[bot]

This comment was marked as resolved.

@seemeroland
Copy link
Copy Markdown
Contributor Author

/oz-review

@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented Jun 3, 2026

@seemeroland

I'm re-reviewing this pull request in response to a review request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR adds ambient-agent-only retry handling around shared-session creation, including per-attempt timeouts, retry bookkeeping, heartbeat cleanup between attempts, and an extended agent-driver wait window.

Concerns

  • The startup attempt guard remains active after SessionInitialized clears startup_retry, so the established websocket drops all subsequent downstream messages and skips the close/reconnect path.

Verdict

Found: 0 critical, 1 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread app/src/terminal/shared_session/sharer/network.rs Outdated
@seemeroland
Copy link
Copy Markdown
Contributor Author

/oz-review

@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented Jun 4, 2026

@seemeroland

I'm re-reviewing this pull request in response to a review request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR adds client-side retry handling for ambient-agent shared-session creation, including per-attempt timeout handling and tests for retryability/state filtering.

Concerns

  • The new retry/final-failure log messages drop the source_type/source_task_id diagnostic context that previously made shared-session failures traceable to a specific Oz task.

Verdict

Found: 0 critical, 0 important, 1 suggestions

Approve with nits

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

if self.should_retry_startup_failure(&failure) {
if let Some(cause) = cause.as_ref() {
log::warn!(
"Shared session creation attempt failed, will retry; attempt={attempt} max_attempts={max_attempts} reason={reason} cause={cause:#}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 [SUGGESTION] Preserve the ambient-agent source context in retry/final-failure logs so these failures can still be correlated to a specific Oz task; include startup_config.source here or keep equivalent diagnostic context on Network.

@seemeroland seemeroland requested a review from captainsafia June 4, 2026 00:18
Copy link
Copy Markdown
Contributor

@captainsafia captainsafia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach seems good overall! Left one question inline about whether it's worth adding back-offs here. I think we want to optimize for a snappier connection experience in the event the connection recovers but leaving it as food for though.

}
}

fn log_diagnostic(&self, event: &'static str, details: impl std::fmt::Display) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included the source ID and the task ID in the log originally to make it easier to correlate this log with the task-specific logs that are emitted from the server. I think it might be helpful at minimum to include the task ID if the session is a cloud mode session?


impl StartupRetryState {
#[cfg_attr(any(test, feature = "integration_tests"), allow(dead_code))]
fn new(max_attempts: usize) -> Self {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not doing any backoff coupled with the retries here AFAICT. Is that intentional or do we want to incorporate backoffs here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional because my current belief is that the request gets stuck for some reason and retrying immediately would help, and we want to keep the time to session share low. I'll revisit this if needed though

@seemeroland
Copy link
Copy Markdown
Contributor Author

I added sharer_warn and other level macros to include session id and source_task_id on all logs using it. Otherwise it has the same usage as log::warn

@seemeroland seemeroland enabled auto-merge (squash) June 4, 2026 18:24
@seemeroland seemeroland merged commit b3bde35 into master Jun 4, 2026
25 checks passed
@seemeroland seemeroland deleted the roland/shared-session-creation-retry branch June 4, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants