Skip to content

Commit f069a71

Browse files
committed
Bugfix: Auto-dispatcher race when error logged before AppHandle wired
If `log_error!` fired before `setup` ran, the debounce window opened but no flush task was spawned (no `AppHandle` to clone yet). The window then sat indefinitely until the next error arrived after init — not great. `set_app_handle` now atomically peeks at the dispatcher state under the same mutex `record_error` uses. If a window is active without a spawned flush, it spawns one with the original deadline. Past-deadline windows fall through to `sleep_until`'s no-op branch and `flush` runs immediately. A `flush_spawned: bool` on `DebounceState` tracks the flag; `mark_flush_spawned` in `on_error_logged` and the late-arrival path in `set_app_handle` race safely on it (loser bails). New `simulate_late_app_handle_for_test` and `flush_spawned_for_test` test seams plus three tests cover the late-arrival path, the past-deadline case, and the no-active-window no-op.
1 parent 564b4bf commit f069a71

3 files changed

Lines changed: 196 additions & 15 deletions

File tree

apps/desktop/src-tauri/src/error_reporter/CLAUDE.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -141,9 +141,14 @@ new user-visible errors. Do not bulk-migrate.
141141

142142
The macro can't thread an `AppHandle` through every call site, so
143143
`auto_dispatcher::set_app_handle(handle)` stashes one in a `OnceLock` at app startup
144-
(called from `lib.rs::setup` right after `crash_reporter::init`). If the handle isn't set
145-
yet (init order, unit tests), the dispatcher still updates the debounce counter but
146-
silently skips the spawn — acceptable, and matches the "soft errors only" contract.
144+
(called from `lib.rs::setup` right after `crash_reporter::init`). If an error fires
145+
before the handle is wired, the debounce window opens normally but the flush task isn't
146+
spawned (no handle to hand to `tauri::async_runtime::spawn`). The state carries a
147+
`flush_spawned` flag for exactly this reason: when `set_app_handle` runs later, it picks
148+
up the orphaned window and spawns the flush task with the remaining time. If the
149+
deadline has already passed, the spawned task fires immediately. The `mark_flush_spawned`
150+
helper plus the late-arrival path in `set_app_handle` race against each other safely —
151+
the loser just bails.
147152

148153
## Gotchas
149154

apps/desktop/src-tauri/src/error_reporter/auto_dispatcher.rs

Lines changed: 99 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,11 @@
2727
//! The macro can't pass an `AppHandle` (it'd require every `log_error!` site to thread
2828
//! one in). We stash a `tauri::AppHandle<tauri::Wry>` in [`APP_HANDLE`] at startup via
2929
//! [`set_app_handle`], called from `lib.rs::setup`. If the handle isn't set yet (before
30-
//! setup runs, or in unit tests), [`on_error_logged`] still bumps the counter so the
31-
//! debounce window is correct, but the spawn/upload is skipped.
30+
//! setup runs, or in unit tests), [`on_error_logged`] still bumps the counter and stores
31+
//! the debounce state, but skips the spawn (no handle to clone). The state's
32+
//! `flush_spawned` flag tracks this; when [`set_app_handle`] later runs, it picks up the
33+
//! orphaned window and spawns the flush task with the remaining time. If the deadline
34+
//! has already elapsed, [`sleep_until`] is a no-op and `flush` runs immediately.
3235
3336
use crate::error_reporter::{self, BundleKind};
3437
use rand::Rng;
@@ -74,11 +77,14 @@ struct DebounceState {
7477
first_category: String,
7578
first_message: String,
7679
error_count: usize,
77-
/// Wall-clock target for the flush. Used by tests to assert jitter bounds and by
78-
/// the spawned task to know when to wake up. The spawned task captures this value
79-
/// before storing the state, so the field itself is only read in the test seam.
80-
#[allow(dead_code, reason = "Read via snapshot_for_test in cfg(test) builds")]
80+
/// Wall-clock target for the flush. Read by the late-spawn path in
81+
/// [`set_app_handle`] to compute the remaining delay when a window opened before
82+
/// the AppHandle was ready, and by tests to assert jitter bounds.
8183
scheduled_send_at: Instant,
84+
/// True once a flush task has been spawned for this window. If `set_app_handle`
85+
/// runs after a window opened without the handle, this lets us spawn exactly once
86+
/// without racing with [`on_error_logged`].
87+
flush_spawned: bool,
8288
}
8389

8490
static STATE: Mutex<Option<DebounceState>> = Mutex::new(None);
@@ -99,8 +105,37 @@ pub fn is_enabled() -> bool {
99105

100106
/// Stash the app handle so the macro-driven entry point can spawn flush tasks without
101107
/// receiving an `AppHandle` argument. Called once from `lib.rs::setup`.
108+
///
109+
/// If a debounce window is already active and never got its flush task (because an error
110+
/// fired before the handle was wired up), spawn one now. Compute the remaining time
111+
/// from `scheduled_send_at`; if it's already past, fire immediately.
102112
pub fn set_app_handle(handle: AppHandle<Wry>) {
103-
let _ = APP_HANDLE.set(handle);
113+
if APP_HANDLE.set(handle.clone()).is_err() {
114+
// Already set — nothing more to do. Tests reset the handle differently; in prod
115+
// setup runs once.
116+
return;
117+
}
118+
// Atomically peek at the state under the lock. If a window is open without a
119+
// spawned flush, mark it as spawned and kick the task off.
120+
let scheduled_at = {
121+
let mut guard = match STATE.lock() {
122+
Ok(g) => g,
123+
Err(p) => p.into_inner(),
124+
};
125+
match guard.as_mut() {
126+
Some(state) if !state.flush_spawned => {
127+
state.flush_spawned = true;
128+
Some(state.scheduled_send_at)
129+
}
130+
_ => None,
131+
}
132+
};
133+
if let Some(deadline) = scheduled_at {
134+
tauri::async_runtime::spawn(async move {
135+
sleep_until(deadline).await;
136+
flush(handle).await;
137+
});
138+
}
104139
}
105140

106141
/// Records an error against the auto-dispatcher. If the opt-in flag is off, returns
@@ -120,19 +155,41 @@ pub fn on_error_logged(category: &str, message: &str) {
120155
None => return, // Already had an active debounce; only the counter changed.
121156
};
122157

123-
// Spawn the flush task only if the AppHandle has been wired up. In unit tests and
124-
// during the brief window before `set_app_handle` runs in `setup()`, we'll capture
125-
// the error in the debounce state but never fire the send. Acceptable — the next
126-
// error after init will trigger one normally.
158+
// Spawn the flush task only if the AppHandle has been wired up. If it's not, the
159+
// debounce state is preserved with `flush_spawned = false` — when `set_app_handle`
160+
// eventually runs, it'll spawn the task with the remaining time (or fire immediately
161+
// if the deadline has already passed).
127162
let Some(app) = APP_HANDLE.get().cloned() else {
128163
return;
129164
};
165+
if !mark_flush_spawned() {
166+
// Lost the race: someone else (e.g. set_app_handle catching up) already spawned
167+
// the flush task for this window. Don't double-spawn.
168+
return;
169+
}
130170
tauri::async_runtime::spawn(async move {
131171
sleep_until(scheduled_send_at).await;
132172
flush(app).await;
133173
});
134174
}
135175

176+
/// Atomically mark the active window as having a spawned flush task. Returns `true` if
177+
/// this caller is the one that flipped the flag (and so should spawn), `false` if it was
178+
/// already set (someone else won the race).
179+
fn mark_flush_spawned() -> bool {
180+
let mut guard = match STATE.lock() {
181+
Ok(g) => g,
182+
Err(p) => p.into_inner(),
183+
};
184+
match guard.as_mut() {
185+
Some(state) if !state.flush_spawned => {
186+
state.flush_spawned = true;
187+
true
188+
}
189+
_ => false,
190+
}
191+
}
192+
136193
/// Lock the state, register the error, and return the scheduled flush time iff this
137194
/// call started a new debounce window. Returns `None` if a window was already active
138195
/// (in which case the caller should NOT spawn a duplicate flush task).
@@ -154,6 +211,7 @@ fn record_error(category: &str, message: &str) -> Option<Instant> {
154211
first_message: message.to_string(),
155212
error_count: 1,
156213
scheduled_send_at,
214+
flush_spawned: false,
157215
});
158216
Some(scheduled_send_at)
159217
}
@@ -259,6 +317,36 @@ pub fn snapshot_for_test() -> Option<(String, String, usize, Instant)> {
259317
})
260318
}
261319

320+
/// Test seam: returns `Some(true)` if a window is active and its flush task has been
321+
/// spawned, `Some(false)` if a window is active but no spawn happened yet, `None` if
322+
/// no window is active.
323+
#[cfg(test)]
324+
pub fn flush_spawned_for_test() -> Option<bool> {
325+
let guard = match STATE.lock() {
326+
Ok(g) => g,
327+
Err(p) => p.into_inner(),
328+
};
329+
guard.as_ref().map(|s| s.flush_spawned)
330+
}
331+
332+
/// Test seam: simulates the late-arriving AppHandle path without needing a Tauri runtime.
333+
/// Returns `Some(deadline)` if a window was active and not yet spawned (so the production
334+
/// `set_app_handle` would spawn a task for it), `None` otherwise.
335+
#[cfg(test)]
336+
pub fn simulate_late_app_handle_for_test() -> Option<Instant> {
337+
let mut guard = match STATE.lock() {
338+
Ok(g) => g,
339+
Err(p) => p.into_inner(),
340+
};
341+
match guard.as_mut() {
342+
Some(state) if !state.flush_spawned => {
343+
state.flush_spawned = true;
344+
Some(state.scheduled_send_at)
345+
}
346+
_ => None,
347+
}
348+
}
349+
262350
#[cfg(test)]
263351
pub fn jitter_window() -> (Duration, Duration) {
264352
(DEBOUNCE_BASE - JITTER, DEBOUNCE_BASE + JITTER)

apps/desktop/src-tauri/src/error_reporter/auto_dispatcher_tests.rs

Lines changed: 89 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@
1111
//! parallel would race.
1212
1313
use super::auto_dispatcher::{
14-
jitter_window, pick_jitter_offset_for_test, record_error_for_test, reset_for_test, set_enabled, snapshot_for_test,
14+
flush_spawned_for_test, jitter_window, pick_jitter_offset_for_test, record_error_for_test, reset_for_test,
15+
set_enabled, simulate_late_app_handle_for_test, snapshot_for_test,
1516
};
1617
use std::sync::Mutex;
1718
use std::time::{Duration, Instant};
@@ -134,6 +135,93 @@ fn jitter_offset_is_within_double_jitter_band() {
134135
}
135136
}
136137

138+
/// Late AppHandle wiring: an error logged before `set_app_handle` runs should leave the
139+
/// debounce state with `flush_spawned = false`. When the handle later arrives, simulating
140+
/// the production path, we should see the flag flip to true and a deadline returned so
141+
/// the caller can spawn the flush task.
142+
#[test]
143+
fn late_app_handle_picks_up_active_window() {
144+
let _guard = lock_and_reset();
145+
set_enabled(true);
146+
147+
// Simulate "error logged before AppHandle ready": record but don't spawn.
148+
let scheduled =
149+
record_error_for_test("cmdr_lib::network", "logged before setup").expect("first call should open a window");
150+
assert_eq!(
151+
flush_spawned_for_test(),
152+
Some(false),
153+
"test seam doesn't spawn — flag must remain false"
154+
);
155+
156+
// Subsequent error in the same window must keep flush_spawned = false too,
157+
// otherwise the late-arriving AppHandle would think the spawn is already covered.
158+
let _ = record_error_for_test("cmdr_lib::other", "still no handle");
159+
assert_eq!(
160+
flush_spawned_for_test(),
161+
Some(false),
162+
"additional errors in the window must not flip the spawn flag"
163+
);
164+
165+
// Now simulate the AppHandle arriving. The helper returns the deadline iff there was
166+
// work to schedule, and flips the flag so a subsequent on_error_logged in the same
167+
// window won't spawn a duplicate.
168+
let deadline = simulate_late_app_handle_for_test().expect("expected a deadline for the active window");
169+
assert_eq!(deadline, scheduled, "deadline should match the original schedule");
170+
assert_eq!(
171+
flush_spawned_for_test(),
172+
Some(true),
173+
"after the simulated AppHandle wiring, the flag must be set"
174+
);
175+
176+
// A second call to the late-arrival helper is a no-op (idempotent / re-entrant safe).
177+
assert!(
178+
simulate_late_app_handle_for_test().is_none(),
179+
"calling the late-arrival helper again must not double-spawn"
180+
);
181+
182+
reset_for_test();
183+
}
184+
185+
/// If the AppHandle arrives after the debounce deadline has already passed, the late
186+
/// path still returns a deadline (in the past) so the spawned task fires immediately
187+
/// via `sleep_until` (which is a no-op when `deadline <= now`).
188+
#[test]
189+
fn late_app_handle_with_past_deadline_returns_deadline() {
190+
let _guard = lock_and_reset();
191+
set_enabled(true);
192+
193+
let scheduled =
194+
record_error_for_test("cmdr_lib::network", "logged way before setup").expect("first call should open a window");
195+
let now = Instant::now();
196+
assert!(scheduled > now, "scheduled should be in the future at this point");
197+
198+
// We can't time-travel `Instant` cheaply, but we can validate the contract: the
199+
// helper returns whatever scheduled_send_at was, and the production caller's
200+
// sleep_until handles the past-deadline case by returning immediately.
201+
let deadline = simulate_late_app_handle_for_test().expect("expected a deadline");
202+
assert_eq!(deadline, scheduled);
203+
204+
reset_for_test();
205+
}
206+
207+
/// If no window is active when the AppHandle arrives, the late-arrival helper is a no-op.
208+
#[test]
209+
fn late_app_handle_with_no_active_window_is_noop() {
210+
let _guard = lock_and_reset();
211+
set_enabled(true);
212+
213+
assert!(
214+
simulate_late_app_handle_for_test().is_none(),
215+
"no active window — nothing to spawn"
216+
);
217+
assert!(
218+
snapshot_for_test().is_none(),
219+
"the helper must not create state when there's nothing to do"
220+
);
221+
222+
reset_for_test();
223+
}
224+
137225
/// Documents the crash-loop interaction. If the process exits during the 60 s window,
138226
/// the spawned flush task is dropped before it fires — by design. The crash reporter
139227
/// covers panics; the auto-dispatcher is for soft errors that don't kill the app.

0 commit comments

Comments
 (0)