Skip to content

Invert SCP fallback to deny-list and increase install timeout to 3 minutes#10533

Merged
alokedesai merged 4 commits into
masterfrom
orchestrator/scp-fallback-inversion
May 11, 2026
Merged

Invert SCP fallback to deny-list and increase install timeout to 3 minutes#10533
alokedesai merged 4 commits into
masterfrom
orchestrator/scp-fallback-inversion

Conversation

@alokedesai
Copy link
Copy Markdown
Member

@alokedesai alokedesai commented May 8, 2026

Description

I used orchestration to analye 1k remote server installation errors from telemetry.

~65% of all failures are caused by the remote host being unable to reach the CDN — but the install succeeds when we bypass the remote's network by downloading on the client and uploading via SCP. Today, SCP fallback only triggers for one specific case (exit code 3 = no curl/wget). The other 650+ errors per 1,000 installs fail unnecessarily.

Breakdown of errors this PR fixes:

Error Count % Curl Exit Code
Script timeout 477 47.7% n/a (our 60s hard cap)
DNS resolution failure 111 11.1% 6
Connection refused / unreachable 32 3.2% 7
SSL/TLS errors 24 2.4% 35, 60, 77
Connection reset by peer 12 1.2% 56
Failure writing output 7 0.7% 23
HTTP 403 (corporate proxy) 5 0.5% 22
Total fixed 668 ~65%

All of these errors share the same root cause: the remote host can't download from app.warp.dev. The SCP fallback path (which already exists for the no-curl/wget case) downloads on the client machine and uploads via the existing SSH tunnel, completely bypassing the remote's network. This PR extends that fallback to cover all network failures.

Solution: Invert the SCP fallback logic from an allowlist to a deny-list:

  • Before: Only exit code 3 triggers SCP fallback. Everything else fails.
  • After: ALL script failures trigger SCP fallback, EXCEPT exit code 2 (unsupported arch/OS) where SCP wouldn't help.

Additional changes:

  • Timeout increased from 60s to 180s (3 minutes) — VS Code doesn't hard-cap at all; our 60s was too aggressive
  • --connect-timeout 15 added to curl so hung connections fail fast, leaving budget for SCP fallback
  • Timeout now triggers SCP fallback instead of returning Error::TimedOut

Testing

I had the agent test by creating docker containers that would demonstrate the issues and then verify the changes fixed the issue.

Live Demo: DNS Failure on Remote Host

Built Warp from this branch in a cloud environment and SSH'd into a Docker container with broken DNS to demonstrate the SCP fallback end-to-end.

Setup:

  • Docker container: Ubuntu 22.04 with SSH server, DNS broken (nameserver 192.0.2.1 in /etc/resolv.conf) so curl cannot resolve any hosts
  • Warp built from this branch (cargo build --bin dev)
  • SSH extension install mode set to always_install
  • Connected via ssh -o StrictHostKeyChecking=no root@localhost -p 2201

Result: SCP fallback triggered correctly. Warp logs show the full sequence:

[INFO] Installing remote server binary to ~/.warp-dev/remote-server/oz-dev-0.1.0
[INFO] Install script failed (exit 6), falling back to SCP upload
[INFO] Downloading tarball locally from https://staging.warp.dev/download/cli?package=tar&os=linux&arch=x86_64&channel=dev&version=0.1.0
  1. Remote install script ran on the container — curl could not resolve CDN hostname (DNS broken)
  2. Script exited with code 6 (DNS resolution failure)
  3. SCP fallback triggered automatically"Install script failed (exit 6), falling back to SCP upload"
  4. Warp attempted the local download + SCP upload path

Before vs After:

  • Before (allowlist): Only exit code 3 (no curl/wget) triggered SCP fallback. Exit code 6 (DNS failure) would show a raw error to the user with no recovery path.
  • After (deny-list): Exit code 6 now correctly triggers SCP fallback. All network-related failures are automatically retried via SCP.

Agent Mode

  • Warp Agent Mode - This PR was created via Warp AI Agent Mode

Co-Authored-By: Oz oz-agent@warp.dev

CHANGELOG-IMPROVEMENT: Improved SSH extension install reliability — network failures now automatically retry via SCP fallback

- Increase INSTALL_TIMEOUT from 60s to 180s to accommodate slow hosts
- Add --connect-timeout 15 to curl in install script so it fails fast
  on unreachable CDN hosts instead of blocking for the full TCP timeout
- Refactor install_binary() SCP fallback from allow-list (only exit
  code 3) to deny-list (skip only exit code 2 = unsupported arch/OS).
  All other failures now trigger SCP fallback, covering DNS failures,
  SSL errors, connection refused, and other network issues (~65% of
  install errors)
- Also trigger SCP fallback on timeout (previously returned TimedOut)
- Add should_skip_scp_fallback() helper with unit tests
- Add setup_tests for INSTALL_TIMEOUT value and curl connect-timeout

Co-Authored-By: Oz <oz-agent@warp.dev>
@cla-bot cla-bot Bot added the cla-signed label May 8, 2026
@alokedesai alokedesai changed the title Invert SCP fallback to deny-list and increase install timeout Invert SCP fallback to deny-list and increase install timeout to 3 minutes May 9, 2026
@alokedesai alokedesai marked this pull request as ready for review May 11, 2026 13:56
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 11, 2026

@alokedesai

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Co-Authored-By: Oz <oz-agent@warp.dev>
@alokedesai alokedesai requested a review from kevinyang372 May 11, 2026 13:58
Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR broadens SSH remote-server install recovery so non-successful install-script exits and script timeouts fall back to the existing local-download-plus-SCP install path, adds a curl connect timeout, and raises the direct install timeout.

Concerns

  • The direct install timeout was raised to 180s, but the existing SCP fallback timeout remains 120s even though the fallback is documented as the longer budget and is now invoked for more failure modes. Slow SSH uploads can still fail prematurely.

Verdict

Found: 0 critical, 1 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread crates/remote_server/src/setup.rs
fn should_skip_scp_fallback(exit_code: i32) -> bool {
match exit_code {
// Unsupported arch/OS — SCP won't change the architecture
2 => true,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is unsupported arch / OS the only error where we should skip fallback?

I thought you mentioned things like the permission errors and missing tar will also cause SCP install to fail too. I am not sure exit_code alone is sufficient here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there are cases where falling back to SCP will also fail but I still think it's better to be overly permissive to fallback to SCP than under permissive.

The tradeoff is slightly more latency for a session where we'd fail to install anyway, but that seems better than the other way around and we can always add more cases where early exit and don't do SCP fallback.

Fwiw: VSCode does the same. If the remote box doesn't have tar installed it will still fallback to SCP and then fail again.

@alokedesai alokedesai merged commit 654402f into master May 11, 2026
31 of 32 checks passed
@alokedesai alokedesai deleted the orchestrator/scp-fallback-inversion branch May 11, 2026 20:51
dagmfactory pushed a commit that referenced this pull request May 11, 2026
…nutes (#10533)

## Description
I used orchestration to analye 1k remote server installation errors from
telemetry.

**~65% of all failures** are caused by the remote host being unable to
reach the CDN — but the install succeeds when we bypass the remote's
network by downloading on the client and uploading via SCP. Today, SCP
fallback only triggers for one specific case (exit code 3 = no
curl/wget). The other 650+ errors per 1,000 installs fail unnecessarily.

**Breakdown of errors this PR fixes:**

| Error | Count | % | Curl Exit Code |
|-------|------:|--:|----------------|
| Script timeout | 477 | 47.7% | n/a (our 60s hard cap) |
| DNS resolution failure | 111 | 11.1% | 6 |
| Connection refused / unreachable | 32 | 3.2% | 7 |
| SSL/TLS errors | 24 | 2.4% | 35, 60, 77 |
| Connection reset by peer | 12 | 1.2% | 56 |
| Failure writing output | 7 | 0.7% | 23 |
| HTTP 403 (corporate proxy) | 5 | 0.5% | 22 |
| **Total fixed** | **668** | **~65%** | |

All of these errors share the same root cause: the remote host can't
download from `app.warp.dev`. The SCP fallback path (which already
exists for the no-curl/wget case) downloads on the client machine and
uploads via the existing SSH tunnel, completely bypassing the remote's
network. This PR extends that fallback to cover all network failures.

**Solution:** Invert the SCP fallback logic from an allowlist to a
deny-list:
- **Before:** Only exit code 3 triggers SCP fallback. Everything else
fails.
- **After:** ALL script failures trigger SCP fallback, EXCEPT exit code
2 (unsupported arch/OS) where SCP wouldn't help.

Additional changes:
- **Timeout increased** from 60s to 180s (3 minutes) — VS Code doesn't
hard-cap at all; our 60s was too aggressive
- **`--connect-timeout 15`** added to curl so hung connections fail
fast, leaving budget for SCP fallback
- **Timeout now triggers SCP fallback** instead of returning
`Error::TimedOut`

## Testing
I had the agent test by creating docker containers that would
demonstrate the issues and then verify the changes fixed the issue.

### Live Demo: DNS Failure on Remote Host

Built Warp from this branch in a cloud environment and SSH'd into a
Docker container with broken DNS to demonstrate the SCP fallback
end-to-end.

**Setup:**
- Docker container: Ubuntu 22.04 with SSH server, DNS broken
(`nameserver 192.0.2.1` in `/etc/resolv.conf`) so curl cannot resolve
any hosts
- Warp built from this branch (`cargo build --bin dev`)
- SSH extension install mode set to `always_install`
- Connected via `ssh -o StrictHostKeyChecking=no root@localhost -p 2201`

**Result: SCP fallback triggered correctly.** Warp logs show the full
sequence:

```
[INFO] Installing remote server binary to ~/.warp-dev/remote-server/oz-dev-0.1.0
[INFO] Install script failed (exit 6), falling back to SCP upload
[INFO] Downloading tarball locally from https://staging.warp.dev/download/cli?package=tar&os=linux&arch=x86_64&channel=dev&version=0.1.0
```

1. Remote install script ran on the container — curl could not resolve
CDN hostname (DNS broken)
2. Script exited with code 6 (DNS resolution failure)
3. **SCP fallback triggered automatically** — `"Install script failed
(exit 6), falling back to SCP upload"`
4. Warp attempted the local download + SCP upload path


**Before vs After:**
- **Before (allowlist):** Only exit code 3 (no curl/wget) triggered SCP
fallback. Exit code 6 (DNS failure) would show a raw error to the user
with no recovery path.
- **After (deny-list):** Exit code 6 now correctly triggers SCP
fallback. All network-related failures are automatically retried via
SCP.

## Agent Mode
- [x] Warp Agent Mode - This PR was created via Warp AI Agent Mode

Co-Authored-By: Oz <oz-agent@warp.dev>

CHANGELOG-IMPROVEMENT: Improved SSH extension install reliability —
network failures now automatically retry via SCP fallback

---------

Co-authored-by: Oz <oz-agent@warp.dev>
cephalonaut pushed a commit that referenced this pull request May 12, 2026
…nutes (#10533)

## Description
I used orchestration to analye 1k remote server installation errors from
telemetry.

**~65% of all failures** are caused by the remote host being unable to
reach the CDN — but the install succeeds when we bypass the remote's
network by downloading on the client and uploading via SCP. Today, SCP
fallback only triggers for one specific case (exit code 3 = no
curl/wget). The other 650+ errors per 1,000 installs fail unnecessarily.

**Breakdown of errors this PR fixes:**

| Error | Count | % | Curl Exit Code |
|-------|------:|--:|----------------|
| Script timeout | 477 | 47.7% | n/a (our 60s hard cap) |
| DNS resolution failure | 111 | 11.1% | 6 |
| Connection refused / unreachable | 32 | 3.2% | 7 |
| SSL/TLS errors | 24 | 2.4% | 35, 60, 77 |
| Connection reset by peer | 12 | 1.2% | 56 |
| Failure writing output | 7 | 0.7% | 23 |
| HTTP 403 (corporate proxy) | 5 | 0.5% | 22 |
| **Total fixed** | **668** | **~65%** | |

All of these errors share the same root cause: the remote host can't
download from `app.warp.dev`. The SCP fallback path (which already
exists for the no-curl/wget case) downloads on the client machine and
uploads via the existing SSH tunnel, completely bypassing the remote's
network. This PR extends that fallback to cover all network failures.

**Solution:** Invert the SCP fallback logic from an allowlist to a
deny-list:
- **Before:** Only exit code 3 triggers SCP fallback. Everything else
fails.
- **After:** ALL script failures trigger SCP fallback, EXCEPT exit code
2 (unsupported arch/OS) where SCP wouldn't help.

Additional changes:
- **Timeout increased** from 60s to 180s (3 minutes) — VS Code doesn't
hard-cap at all; our 60s was too aggressive
- **`--connect-timeout 15`** added to curl so hung connections fail
fast, leaving budget for SCP fallback
- **Timeout now triggers SCP fallback** instead of returning
`Error::TimedOut`

## Testing
I had the agent test by creating docker containers that would
demonstrate the issues and then verify the changes fixed the issue.

### Live Demo: DNS Failure on Remote Host

Built Warp from this branch in a cloud environment and SSH'd into a
Docker container with broken DNS to demonstrate the SCP fallback
end-to-end.

**Setup:**
- Docker container: Ubuntu 22.04 with SSH server, DNS broken
(`nameserver 192.0.2.1` in `/etc/resolv.conf`) so curl cannot resolve
any hosts
- Warp built from this branch (`cargo build --bin dev`)
- SSH extension install mode set to `always_install`
- Connected via `ssh -o StrictHostKeyChecking=no root@localhost -p 2201`

**Result: SCP fallback triggered correctly.** Warp logs show the full
sequence:

```
[INFO] Installing remote server binary to ~/.warp-dev/remote-server/oz-dev-0.1.0
[INFO] Install script failed (exit 6), falling back to SCP upload
[INFO] Downloading tarball locally from https://staging.warp.dev/download/cli?package=tar&os=linux&arch=x86_64&channel=dev&version=0.1.0
```

1. Remote install script ran on the container — curl could not resolve
CDN hostname (DNS broken)
2. Script exited with code 6 (DNS resolution failure)
3. **SCP fallback triggered automatically** — `"Install script failed
(exit 6), falling back to SCP upload"`
4. Warp attempted the local download + SCP upload path


**Before vs After:**
- **Before (allowlist):** Only exit code 3 (no curl/wget) triggered SCP
fallback. Exit code 6 (DNS failure) would show a raw error to the user
with no recovery path.
- **After (deny-list):** Exit code 6 now correctly triggers SCP
fallback. All network-related failures are automatically retried via
SCP.

## Agent Mode
- [x] Warp Agent Mode - This PR was created via Warp AI Agent Mode

Co-Authored-By: Oz <oz-agent@warp.dev>

CHANGELOG-IMPROVEMENT: Improved SSH extension install reliability —
network failures now automatically retry via SCP fallback

---------

Co-authored-by: Oz <oz-agent@warp.dev>
cephalonaut pushed a commit that referenced this pull request May 12, 2026
…nutes (#10533)

## Description
I used orchestration to analye 1k remote server installation errors from
telemetry.

**~65% of all failures** are caused by the remote host being unable to
reach the CDN — but the install succeeds when we bypass the remote's
network by downloading on the client and uploading via SCP. Today, SCP
fallback only triggers for one specific case (exit code 3 = no
curl/wget). The other 650+ errors per 1,000 installs fail unnecessarily.

**Breakdown of errors this PR fixes:**

| Error | Count | % | Curl Exit Code |
|-------|------:|--:|----------------|
| Script timeout | 477 | 47.7% | n/a (our 60s hard cap) |
| DNS resolution failure | 111 | 11.1% | 6 |
| Connection refused / unreachable | 32 | 3.2% | 7 |
| SSL/TLS errors | 24 | 2.4% | 35, 60, 77 |
| Connection reset by peer | 12 | 1.2% | 56 |
| Failure writing output | 7 | 0.7% | 23 |
| HTTP 403 (corporate proxy) | 5 | 0.5% | 22 |
| **Total fixed** | **668** | **~65%** | |

All of these errors share the same root cause: the remote host can't
download from `app.warp.dev`. The SCP fallback path (which already
exists for the no-curl/wget case) downloads on the client machine and
uploads via the existing SSH tunnel, completely bypassing the remote's
network. This PR extends that fallback to cover all network failures.

**Solution:** Invert the SCP fallback logic from an allowlist to a
deny-list:
- **Before:** Only exit code 3 triggers SCP fallback. Everything else
fails.
- **After:** ALL script failures trigger SCP fallback, EXCEPT exit code
2 (unsupported arch/OS) where SCP wouldn't help.

Additional changes:
- **Timeout increased** from 60s to 180s (3 minutes) — VS Code doesn't
hard-cap at all; our 60s was too aggressive
- **`--connect-timeout 15`** added to curl so hung connections fail
fast, leaving budget for SCP fallback
- **Timeout now triggers SCP fallback** instead of returning
`Error::TimedOut`

## Testing
I had the agent test by creating docker containers that would
demonstrate the issues and then verify the changes fixed the issue.

### Live Demo: DNS Failure on Remote Host

Built Warp from this branch in a cloud environment and SSH'd into a
Docker container with broken DNS to demonstrate the SCP fallback
end-to-end.

**Setup:**
- Docker container: Ubuntu 22.04 with SSH server, DNS broken
(`nameserver 192.0.2.1` in `/etc/resolv.conf`) so curl cannot resolve
any hosts
- Warp built from this branch (`cargo build --bin dev`)
- SSH extension install mode set to `always_install`
- Connected via `ssh -o StrictHostKeyChecking=no root@localhost -p 2201`

**Result: SCP fallback triggered correctly.** Warp logs show the full
sequence:

```
[INFO] Installing remote server binary to ~/.warp-dev/remote-server/oz-dev-0.1.0
[INFO] Install script failed (exit 6), falling back to SCP upload
[INFO] Downloading tarball locally from https://staging.warp.dev/download/cli?package=tar&os=linux&arch=x86_64&channel=dev&version=0.1.0
```

1. Remote install script ran on the container — curl could not resolve
CDN hostname (DNS broken)
2. Script exited with code 6 (DNS resolution failure)
3. **SCP fallback triggered automatically** — `"Install script failed
(exit 6), falling back to SCP upload"`
4. Warp attempted the local download + SCP upload path


**Before vs After:**
- **Before (allowlist):** Only exit code 3 (no curl/wget) triggered SCP
fallback. Exit code 6 (DNS failure) would show a raw error to the user
with no recovery path.
- **After (deny-list):** Exit code 6 now correctly triggers SCP
fallback. All network-related failures are automatically retried via
SCP.

## Agent Mode
- [x] Warp Agent Mode - This PR was created via Warp AI Agent Mode

Co-Authored-By: Oz <oz-agent@warp.dev>

CHANGELOG-IMPROVEMENT: Improved SSH extension install reliability —
network failures now automatically retry via SCP fallback

---------

Co-authored-by: Oz <oz-agent@warp.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants