Skip to content

Commit e5be146

Browse files
committed
Updater: cap hangs at 30s, surface real error causes
- Add `connect_timeout(10s)` + overall `timeout(30s)` to the manifest fetch in `check_for_update`. Was unbounded — observed to hang ~2.5 min on a TCP handshake that never completed against the redirect target, looking like a hung app and tripping the auto error reporter. - Add `connect_timeout(10s)` + `read_timeout(30s)` to the tarball fetch in `download_update`. No overall `timeout` — legitimate slow downloads on a hotel/mobile network still work; mid-download stalls now fail in 30s instead of minutes. - New `describe_error_chain` walks `reqwest::Error::source()` so logs and the Settings UI show the real cause (`tcp connect error: deadline has elapsed`, `dns error: failed to lookup ...`) instead of just `error sending request for url (...)`. - Split the `checkForUpdates` catch into a check phase (warn — transient, expected) and a download/install phase (error — real failure, auto-reports). Helper `finishCheckWithFailure(err, phase)` keeps duplication minimal. The previous single-catch silently downgraded install/signature failures to warn too — actual bugs we'd want auto-reported. - Two synthetic unit tests for the chain walker (always run); two `#[ignore]`'d network tests with the captured chain content for ad-hoc verification.
1 parent bee3ed0 commit e5be146

4 files changed

Lines changed: 281 additions & 57 deletions

File tree

apps/desktop/src-tauri/src/updater/CLAUDE.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,18 @@ signatures use base64(minisign-text-format) encoding, matching Tauri's internal
3838
`do shell script ... with administrator privileges` shows the native macOS auth dialog. `rsync` is used because it
3939
expresses the full sync (copy + delete stale) in a single shell command.
4040

41+
**Decision**: Bounded `connect_timeout` (10 s) and overall `timeout` (30 s) on the manifest fetch.
42+
**Why**: `reqwest::get` uses a default client with no overall timeout. A stuck TCP handshake to the redirect target
43+
(`getcmdr.com/latest.json`) was observed to hang for 2.5 min before reporting `error sending request for url …`,
44+
which made transient network blips look like a hung app and tripped the auto error reporter. The bounds keep the
45+
periodic check honest. Download/install paths are intentionally NOT timed out — they run with user attention and
46+
can legitimately take a while.
47+
48+
**Decision**: Walk `reqwest::Error::source()` for log-friendly messages (`describe_error_chain`).
49+
**Why**: `reqwest::Error`'s `Display` only prints the outermost layer (`error sending request for url …`), so logs
50+
hid the real cause (DNS lookup, TCP connect timeout, TLS handshake). Walking the source chain surfaces the
51+
underlying error class without pulling in `anyhow`.
52+
4153
**Decision**: Atomic rename (write to temp file, then `rename()`) instead of in-place `fs::copy`.
4254
**Why**: `fs::copy` overwrites the destination in-place, keeping the same inode. macOS's kernel code signing cache
4355
keys on inode — it validates the new binary's pages against the old binary's cached code directory, causing

apps/desktop/src-tauri/src/updater/mod.rs

Lines changed: 157 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,37 @@ mod signature;
1616
use manifest::UpdateInfo;
1717
use std::path::PathBuf;
1818
use std::sync::Mutex;
19+
use std::time::Duration;
1920
use tauri::State;
2021

22+
// Per-call timeouts for the manifest fetch. The default `reqwest::get` client has no
23+
// overall timeout — a stuck TCP handshake against the redirect target can hang for
24+
// minutes before the OS gives up. These bounds keep a flaky network from looking like
25+
// a hung app and stop the auto-error-reporter from firing on long hangs.
26+
const MANIFEST_CONNECT_TIMEOUT: Duration = Duration::from_secs(10);
27+
const MANIFEST_REQUEST_TIMEOUT: Duration = Duration::from_secs(30);
28+
29+
// Per-call timeouts for the tarball download. No overall `timeout` here — a 60+ MB
30+
// download on a slow connection can legitimately take minutes. `read_timeout` bounds
31+
// "no bytes received in N seconds" instead, which catches mid-download stalls without
32+
// punishing slow-but-working networks.
33+
const DOWNLOAD_CONNECT_TIMEOUT: Duration = Duration::from_secs(10);
34+
const DOWNLOAD_READ_TIMEOUT: Duration = Duration::from_secs(30);
35+
36+
/// Renders an error and its full `source()` chain. `reqwest::Error`'s `Display` only
37+
/// prints the outermost layer (`error sending request for url …`), which hides the
38+
/// real cause (DNS lookup, TCP connect timeout, TLS handshake, etc.).
39+
fn describe_error_chain(err: &(dyn std::error::Error + 'static)) -> String {
40+
let mut out = err.to_string();
41+
let mut src = err.source();
42+
while let Some(cause) = src {
43+
out.push_str(": ");
44+
out.push_str(&cause.to_string());
45+
src = cause.source();
46+
}
47+
out
48+
}
49+
2150
/// Shared state between `download_update` and `install_update`.
2251
/// Holds the path to the downloaded (and verified) tarball.
2352
pub struct UpdateState {
@@ -52,14 +81,22 @@ pub async fn check_for_update() -> Result<Option<UpdateInfo>, String> {
5281
let arch = manifest::platform_key().strip_prefix("darwin-").unwrap_or("unknown");
5382
let url = format!("https://api.getcmdr.com/update-check/{current_version}?arch={arch}");
5483

55-
let response = reqwest::get(&url)
84+
let client = reqwest::Client::builder()
85+
.connect_timeout(MANIFEST_CONNECT_TIMEOUT)
86+
.timeout(MANIFEST_REQUEST_TIMEOUT)
87+
.build()
88+
.map_err(|e| format!("Couldn't build update HTTP client: {}", describe_error_chain(&e)))?;
89+
90+
let response = client
91+
.get(&url)
92+
.send()
5693
.await
57-
.map_err(|e| format!("Couldn't fetch update manifest: {e}"))?;
94+
.map_err(|e| format!("Couldn't fetch update manifest: {}", describe_error_chain(&e)))?;
5895

5996
let manifest: manifest::UpdateManifest = response
6097
.json()
6198
.await
62-
.map_err(|e| format!("Couldn't parse update manifest: {e}"))?;
99+
.map_err(|e| format!("Couldn't parse update manifest: {}", describe_error_chain(&e)))?;
63100

64101
Ok(manifest::check_manifest(&manifest, current_version))
65102
}
@@ -71,14 +108,22 @@ pub async fn check_for_update() -> Result<Option<UpdateInfo>, String> {
71108
pub async fn download_update(url: String, signature: String, state: State<'_, UpdateState>) -> Result<(), String> {
72109
log::info!("Downloading update from {url}");
73110

74-
let response = reqwest::get(&url)
111+
let client = reqwest::Client::builder()
112+
.connect_timeout(DOWNLOAD_CONNECT_TIMEOUT)
113+
.read_timeout(DOWNLOAD_READ_TIMEOUT)
114+
.build()
115+
.map_err(|e| format!("Couldn't build update HTTP client: {}", describe_error_chain(&e)))?;
116+
117+
let response = client
118+
.get(&url)
119+
.send()
75120
.await
76-
.map_err(|e| format!("Couldn't download update: {e}"))?;
121+
.map_err(|e| format!("Couldn't download update: {}", describe_error_chain(&e)))?;
77122

78123
let bytes = response
79124
.bytes()
80125
.await
81-
.map_err(|e| format!("Couldn't read update response: {e}"))?;
126+
.map_err(|e| format!("Couldn't read update response: {}", describe_error_chain(&e)))?;
82127

83128
log::info!("Downloaded {} bytes, verifying signature", bytes.len());
84129
signature::verify(&bytes, &signature)?;
@@ -120,3 +165,109 @@ pub async fn install_update(state: State<'_, UpdateState>) -> Result<(), String>
120165
.await
121166
.map_err(|e| format!("Install task panicked: {e}"))?
122167
}
168+
169+
#[cfg(test)]
170+
mod tests {
171+
use super::*;
172+
use std::error::Error;
173+
use std::fmt;
174+
175+
#[derive(Debug)]
176+
struct ChainErr {
177+
msg: &'static str,
178+
source: Option<Box<dyn Error + 'static>>,
179+
}
180+
181+
impl fmt::Display for ChainErr {
182+
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
183+
f.write_str(self.msg)
184+
}
185+
}
186+
187+
impl Error for ChainErr {
188+
fn source(&self) -> Option<&(dyn Error + 'static)> {
189+
self.source.as_deref()
190+
}
191+
}
192+
193+
#[test]
194+
fn describe_error_chain_renders_only_outer_when_no_source() {
195+
let err = ChainErr {
196+
msg: "outer",
197+
source: None,
198+
};
199+
assert_eq!(describe_error_chain(&err), "outer");
200+
}
201+
202+
#[test]
203+
fn describe_error_chain_walks_full_source_chain() {
204+
let inner = ChainErr {
205+
msg: "io broken pipe",
206+
source: None,
207+
};
208+
let middle = ChainErr {
209+
msg: "hyper transport",
210+
source: Some(Box::new(inner)),
211+
};
212+
let outer = ChainErr {
213+
msg: "reqwest send",
214+
source: Some(Box::new(middle)),
215+
};
216+
assert_eq!(
217+
describe_error_chain(&outer),
218+
"reqwest send: hyper transport: io broken pipe"
219+
);
220+
}
221+
222+
/// Sanity-check against an actual `reqwest::Error` for a name that can never resolve
223+
/// (`.invalid` TLD per RFC 6761). `#[ignore]`'d because it depends on the local resolver
224+
/// — run with
225+
/// `cargo nextest run -p cmdr describe_error_chain --run-ignored=ignored-only --no-capture`
226+
/// to see what reqwest 0.13's source() chain actually surfaces. The `eprintln!` is
227+
/// allowed locally because the whole point of these tests is to render the chain into
228+
/// stderr for human inspection — they're verification harnesses, not production code.
229+
#[tokio::test]
230+
#[ignore = "network-dependent; run manually to verify reqwest chain content"]
231+
#[allow(clippy::print_stderr, reason = "verification harness — see fn doc")]
232+
async fn describe_error_chain_unwraps_reqwest_dns_failure() {
233+
let err = reqwest::Client::builder()
234+
.connect_timeout(Duration::from_secs(2))
235+
.timeout(Duration::from_secs(5))
236+
.build()
237+
.unwrap()
238+
.get("http://nonexistent-host-for-cmdr-tests.invalid/")
239+
.send()
240+
.await
241+
.expect_err("request to .invalid should fail");
242+
let msg = describe_error_chain(&err);
243+
eprintln!("DNS-failure chain: {msg}");
244+
assert!(msg.len() > 60, "chain too short, source() likely empty: {msg}");
245+
}
246+
247+
/// Sanity-check against an actual connect timeout (RFC 5737 unreachable address).
248+
/// `#[ignore]` for the same reason as the DNS test.
249+
#[tokio::test]
250+
#[ignore = "network-dependent; run manually to verify reqwest chain content"]
251+
#[allow(clippy::print_stderr, reason = "verification harness — see fn doc")]
252+
async fn describe_error_chain_unwraps_reqwest_connect_timeout() {
253+
let err = reqwest::Client::builder()
254+
.connect_timeout(Duration::from_millis(500))
255+
.build()
256+
.unwrap()
257+
.get("http://10.255.255.1/")
258+
.send()
259+
.await
260+
.expect_err("connect to 10.255.255.1 should time out");
261+
let msg = describe_error_chain(&err);
262+
eprintln!("connect-timeout chain: {msg}");
263+
// reqwest 0.13 wording, captured from a one-shot run on macOS:
264+
// error sending request for url (http://10.255.255.1/): client error (Connect): tcp connect error: deadline has elapsed
265+
// Match on the "tcp connect" cause rather than a "timeout" keyword — reqwest words it
266+
// as "deadline has elapsed", not "timeout".
267+
let lower = msg.to_lowercase();
268+
assert!(
269+
lower.contains("tcp connect") || lower.contains("deadline") || lower.contains("timed out"),
270+
"expected connect/deadline-shaped cause in chain: {msg}"
271+
);
272+
}
273+
}

apps/desktop/src/lib/updates/CLAUDE.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,9 @@ interval is acceptable.
125125
- The update manifest endpoint is hardcoded in the Rust backend (`https://getcmdr.com/latest.json`), not in TypeScript.
126126
- The `check_for_update` command returns `None` when `CI` env var is set — no network calls in CI.
127127
- No retry or backoff on error — the next interval fires a fresh attempt.
128+
- The catch in `checkForUpdates()` logs at `warn`, not `error`, so transient network failures during the periodic
129+
background check don't trip the auto error reporter (Flow B). The Settings UI still surfaces the message via
130+
`updateState.error`. See `apps/desktop/src-tauri/src/error_reporter/CLAUDE.md` § convention.
128131
- Default interval: 60 minutes. Configurable in settings from 5 minutes to 24 hours.
129132
- Unit tests in `updater.test.ts` cover the gating logic via the pure `shouldShowUpdateToast` predicate plus the
130133
`notifyOnboardingComplete` and `setFdaPromptShowing` triggers. The download-and-install path is still untested — it

apps/desktop/src/lib/updates/updater.svelte.ts

Lines changed: 109 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -85,59 +85,117 @@ export async function checkForUpdates(): Promise<void> {
8585
updateState.status = 'checking'
8686
updateState.error = null
8787

88+
log.debug('Checking for updates (current: v{version})...', { version: currentVersion })
89+
90+
// Platform branches diverge significantly: macOS runs three custom commands (split download +
91+
// install phases, preserves TCC), non-macOS uses the Tauri plugin's fused `downloadAndInstall`.
92+
// The two-phase error handling (warn on check, error on download/install) lives inside each.
93+
if (isMacOS) {
94+
await runMacUpdateFlow(currentVersion)
95+
} else {
96+
await runPluginUpdateFlow(currentVersion)
97+
}
98+
}
99+
100+
/**
101+
* macOS path: custom updater that preserves TCC/Full Disk Access permissions by syncing files
102+
* into the existing `.app` bundle. Three Tauri commands; download and install are distinct
103+
* phases so the UI can show separate `downloading` and `installing` states.
104+
*/
105+
async function runMacUpdateFlow(currentVersion: string): Promise<void> {
106+
let update: UpdateInfo | null
88107
try {
89-
log.debug('Checking for updates (current: v{version})...', { version: currentVersion })
90-
91-
if (isMacOS) {
92-
// macOS: custom updater preserves TCC/Full Disk Access permissions.
93-
// Platform asymmetry: the macOS path runs `download_update` and `install_update` as two
94-
// distinct invokes, so we can split the UI into a `downloading` phase and an `installing`
95-
// phase. The non-macOS path uses one fused `downloadAndInstall()` and stays in `downloading`.
96-
const update = await invoke<UpdateInfo | null>('check_for_update')
97-
98-
if (update !== null) {
99-
log.info('Update available: v{current} -> v{next}', { current: currentVersion, next: update.version })
100-
updateState.nextVersion = update.version
101-
updateState.status = 'downloading'
102-
await invoke('download_update', { url: update.url, signature: update.signature })
103-
updateState.status = 'installing'
104-
await invoke('install_update')
105-
log.info('v{version} installed, restart to apply', { version: update.version })
106-
updateState.status = 'ready'
107-
updateState.update = update
108-
showUpdateToast()
109-
} else {
110-
log.debug('v{version} is up to date', { version: currentVersion })
111-
updateState.status = 'idle'
112-
updateState.nextVersion = null
113-
}
114-
} else {
115-
// Non-macOS: delegate to the Tauri updater plugin. `downloadAndInstall()` is a single fused
116-
// call, so we can't expose a separate `installing` phase here — we stay in `downloading`
117-
// throughout. The Settings/menu UIs accept this asymmetry.
118-
const { check } = await import('@tauri-apps/plugin-updater')
119-
const update = await check()
120-
121-
if (update) {
122-
log.info('Update available: v{current} -> v{next}', { current: currentVersion, next: update.version })
123-
updateState.nextVersion = update.version
124-
updateState.status = 'downloading'
125-
await update.downloadAndInstall()
126-
log.info('v{version} installed, restart to apply', { version: update.version })
127-
updateState.status = 'ready'
128-
updateState.update = { version: update.version, url: '', signature: '' }
129-
showUpdateToast()
130-
} else {
131-
log.debug('v{version} is up to date', { version: currentVersion })
132-
updateState.status = 'idle'
133-
updateState.nextVersion = null
134-
}
135-
}
108+
update = await invoke<UpdateInfo | null>('check_for_update')
136109
} catch (error) {
137-
updateState.status = 'idle'
138-
updateState.nextVersion = null
139-
updateState.error = error instanceof Error ? error.message : String(error)
140-
log.error('Check failed: {error}', { error: updateState.error })
110+
finishCheckWithFailure(error, 'check')
111+
return
112+
}
113+
114+
if (update === null) {
115+
finishCheckWithNoUpdate(currentVersion)
116+
return
117+
}
118+
119+
log.info('Update available: v{current} -> v{next}', { current: currentVersion, next: update.version })
120+
updateState.nextVersion = update.version
121+
updateState.status = 'downloading'
122+
123+
try {
124+
await invoke('download_update', { url: update.url, signature: update.signature })
125+
updateState.status = 'installing'
126+
await invoke('install_update')
127+
} catch (error) {
128+
finishCheckWithFailure(error, 'download-install')
129+
return
130+
}
131+
132+
log.info('v{version} installed, restart to apply', { version: update.version })
133+
updateState.status = 'ready'
134+
updateState.update = update
135+
showUpdateToast()
136+
}
137+
138+
/**
139+
* Non-macOS path: Tauri updater plugin. `downloadAndInstall()` is fused so we stay in
140+
* `downloading` throughout the second phase (no separate `installing` state).
141+
*/
142+
async function runPluginUpdateFlow(currentVersion: string): Promise<void> {
143+
let update: Awaited<ReturnType<typeof import('@tauri-apps/plugin-updater').check>>
144+
try {
145+
const { check } = await import('@tauri-apps/plugin-updater')
146+
update = await check()
147+
} catch (error) {
148+
finishCheckWithFailure(error, 'check')
149+
return
150+
}
151+
152+
if (!update) {
153+
finishCheckWithNoUpdate(currentVersion)
154+
return
155+
}
156+
157+
log.info('Update available: v{current} -> v{next}', { current: currentVersion, next: update.version })
158+
updateState.nextVersion = update.version
159+
updateState.status = 'downloading'
160+
161+
try {
162+
await update.downloadAndInstall()
163+
} catch (error) {
164+
finishCheckWithFailure(error, 'download-install')
165+
return
166+
}
167+
168+
log.info('v{version} installed, restart to apply', { version: update.version })
169+
updateState.status = 'ready'
170+
updateState.update = { version: update.version, url: '', signature: '' }
171+
showUpdateToast()
172+
}
173+
174+
function finishCheckWithNoUpdate(currentVersion: string): void {
175+
log.debug('v{version} is up to date', { version: currentVersion })
176+
updateState.status = 'idle'
177+
updateState.nextVersion = null
178+
}
179+
180+
/**
181+
* Reset state and log the failure at the right level for the phase.
182+
*
183+
* - `'check'` failures (network, DNS, bad manifest) are transient and expected on the periodic
184+
* background tick — log at warn so they don't trip the auto error reporter on a momentary blip.
185+
* - `'download-install'` failures (signature mismatch, FS errors, partial writes) reach a code
186+
* path the user already opted into — log at error so they DO trip auto-report. The Settings
187+
* UI surfaces both via `updateState.error` regardless of log level.
188+
*
189+
* See `apps/desktop/src-tauri/src/error_reporter/CLAUDE.md` § convention.
190+
*/
191+
function finishCheckWithFailure(error: unknown, phase: 'check' | 'download-install'): void {
192+
updateState.status = 'idle'
193+
updateState.nextVersion = null
194+
updateState.error = error instanceof Error ? error.message : String(error)
195+
if (phase === 'check') {
196+
log.warn('Check failed: {error}', { error: updateState.error })
197+
} else {
198+
log.error('Download/install failed: {error}', { error: updateState.error })
141199
}
142200
}
143201

0 commit comments

Comments
 (0)