fix(http): log upload byte size on complete#24
Merged
Conversation
poll_existing dropped config::save errors with let _ = on all three studio responses (404, approved, rejected), so a read-only or full disk left no operator-visible breadcrumb. - approved credentials save failure now logs at error (a Sentry event via sentry-tracing): the session runs in memory but the worker silently re-registers on the next restart otherwise. - 404 + rejected cleanup save failures log at warn with the config path, matching the existing pattern in this module. - regression tests force a save failure (config path under a regular file) and assert the breadcrumb fires without leaking the auth token.
- WS client emitted zero tracing despite being the only comms channel - add structured breadcrumbs on target `studio_worker::ws::client` (op, worker_id, elapsed_ms; debug on success, warn on fault), mirroring the `studio_worker::http` contract - surfaces genuine silent failures: session maps recv errors to a generic Disconnected(_) and fires `let _ = sender.send(...)` for accept/reject/fail/completeJson - extract `classify_incoming` so split + non-split recv loops share one logging site and stay identical; behaviour preserved - cover with capture()-based unit tests; doc target list updated
- AGENTS.md/README/overview no longer present GradioEngine or the `gradio` engine; document sd-cpp + MultiEngine instead - README config example now matches the Config struct: drop engine, gradio_endpoint_url, label, auto_enabled, supported_models_override; add models_root - remove dead tests/gradio_engine.rs reference; Cargo keyword gradio -> inference - keep intentional 'legacy gone' notes (overview 383/517) and historical plan records
- Windows minted the install id + 256-bit registration_secret from a SHA-256-mixed timestamp, not the OS RNG: the secret was predictable. - The doc comment claimed BCryptGenRandom on Windows; it was never implemented (only unix read /dev/urandom). - Replace the per-OS hand-rolled RNG + weak fallback with one getrandom::fill call (getrandom 0.3, already transitively in tree). - Panic loudly if the OS entropy source is unavailable instead of minting a guessable secret; Sentry captures it. - Add entropy-primitive contract tests (distinctness + bit coverage) that now cover the formerly-untested Windows path.
- Promote src/ui/autostart.rs to top-level src/autostart.rs (always compiled, like service.rs) so the deferred disk-write round-trip is testable in the default CI matrix without the ui feature or HOME mutation. - enable/disable now emit structured tracing on target studio_worker::autostart (info ok / warn fail), matching http.rs + service.rs; a failed toggle is no longer silently swallowed. - Config tab surfaces the failure in the UI instead of let _ = ... - Drop the module-doc claim of a RegistryWriter trait that was never built; document the honest Windows marker-file status. - Add disk-write round-trip + tracing-emission tests.
- The video cargo-feature engine emits ext=gif, but complete()'s inline MIME match only knew png/webp/wav/mp3/mp4, so real GIFs were uploaded as application/octet-stream (wrong stored content-type). - Extract a single-source-of-truth mime_for_ext() and add the missing gif => image/gif arm; complete() now routes through it. - Add fast unit tests (none existed for the mapping); one locks the contract that every engine-emitted ext maps to a real MIME type.
- config::save logged a debug breadcrumb on success but emitted nothing on failure, relying on each caller to log the Err. The UI Save button discards it (let _ = draft.save(...)), so a failed persist was invisible in journalctl / Sentry. - Emit a warn! on the failure branch (path + error only, no secret fields) so every save failure is surfaced regardless of caller. - Split the IO into write_config() so both branches log without duplicating the happy path. - Add tracing-contract tests for the failure path incl. secret redaction (portable file-as-parent trigger, no /proc reliance).
- run_cli emits an INFO startup breadcrumb (target studio_worker::cli) naming AGENT_VERSION + the subcommand about to run - with no SENTRY_DSN the worker logged nothing at launch; this anchors which version/subcommand a process is running, key right after an auto-update re-exec or service restart - add Command::name() stable kebab-case labels, exhaustively unit-tested; breadcrumb proven via a capture()-based test
- Under `cargo test --features ui` the capture-based config_tracing
tests intermittently saw empty buffers ("expected ... got: ").
- Root cause: a tracing callsite first evaluated by a parallel test in
the narrow window around the one-time global-subscriber install can
cache its Interest as `never` and then silently drop the captured
event. `try_init` rebuilds the cache once on install, which only
helps the caller that wins the race.
- Fix: call `tracing::callsite::rebuild_interest_cache()` at the start
of every `capture()` so each caller re-evaluates all callsites
against the now-stable subscriber before emitting. Idempotent + cheap.
- Verified: 2 consecutive `--features ui` runs (act) that previously
flaked now pass clean; default suite green.
- Tray icon is the operator's at-a-glance health indicator, but the variant transitions (idle/busy/disconnected) were unlogged and the `set_icon`/`set_tooltip` updates were swallowed with `let _ =`, so a stale indicator left the operator with zero diagnostics. - app.rs: log idle/busy/disconnected transitions on studio_worker::ui::app and warn when a tray icon/tooltip update fails; log the hide-to-tray interception. - mod.rs: log the open-window and quit tray-menu clicks (pause was already logged), completing the operator audit trail. - Add a unit test for the transition-logging helper; verified under --features ui via act (clippy + check + test green).
- download_file renamed the .part into place with no length check, so a short read (reset/flaky CDN) cached a corrupt model that every later job failed to load with no retry - despite the doc claiming HEAD-checked length. - Add pure verify_download_len() comparing streamed bytes to the response Content-Length; mismatch removes the .part and bails clearly so the next tick re-downloads. None (chunked) => accept, as before. - Clean up the .part on a mid-stream io error (was leaked); drop the handle before remove/rename so cleanup works on Windows. - Correct the stale module doc to match actual behaviour.
- worker only handled SIGINT; systemd/launchd stop sends SIGTERM, so the graceful stop path never fired under the documented deployment - now waits on both SIGINT + SIGTERM (Ctrl-C off-Unix) and logs a named shutdown breadcrumb, mirroring the startup banner - WS session now closes cleanly + last log batch flushes on stop - TDD: request_shutdown unit-tested (flag + breadcrumb); signal wait excluded from coverage like other OS-signal code
- worker never recorded what it offered the studio; only config (startup banner) and per-job picks were logged, so 'why won't it claim X jobs' was unanswerable from the logs - add pure runtime::summarize_capabilities(kinds/models/vram/auto_enabled), unit-tested - log it via push_log_with_observers right before Hello, so it ships to the studio and shows in the UI Logs tab - prove it end-to-end: new ws_session_full_loop handshake test asserts the summary lands in recent_logs
- Add path-injectable enable_at/disable_at/is_enabled_at seams, mirroring the existing write_entry/remove_entry design; public enable/disable/is_enabled now delegate (signatures unchanged). - Cover the enable->is_enabled->disable round-trip, the unresolved-path branch, platform artefact selection, and the Linux XDG path layout against a tempdir, with no env mutation or real-FS side effects. - Lifts autostart.rs coverage: lines 76% -> 91%, functions 64% -> 84%.
- route the four best-effort `remove_file` cleanups (per-job sd-cli output, init image, half-written .part downloads) through a new `remove_temp_file` helper that warns on failure instead of silently swallowing the error - a removal that keeps failing leaks temp files and can quietly fill a long-running worker's disk; now it shows up in the logs - NotFound stays silent (already the desired end state) - covers the helper with TDD: quiet success, ignored-missing, and a surfaced failure that names the path + cleanup op
- About tab "Check for updates" left no trace on success/error - Emit structured INFO (up-to-date / newer) + WARN (failure) - Stable target studio_worker::ui::about for journal/Sentry filtering - Cover all three outcomes with capture-based tests
- JSON (LLM/STT) completion path now logs a breadcrumb on success, symmetric with the binary path's "binary upload ok" - branch on the completeJson send result so a dropped frame records Failed instead of a false-positive Completed - cover both with a full-loop test asserting the per-job breadcrumb
- dispatch_with_source bailed without a breadcrumb on the studio_worker::engine::multi target when the studio asked for an engine this binary lacks - pick_for already warns on its no-match path; mirror that so every routing decision (selected or rejected) is filterable on one target - add a tracing-capture regression test for the rejection breadcrumb
- RealRunner::download streamed the installer with no length check, so a truncated fetch (reset / flaky CDN) wrote a partial script that the next step hands straight to sh / powershell. - Add pure verify_download_len() comparing streamed bytes to the response Content-Length; mismatch bails before run_installer (tempdir drop cleans the partial file). None (chunked) => accept, mirroring the sdcpp guard. - Cover the previously-untested real download path end-to-end via wiremock (happy + truncation-rejection); update.rs line cov 85 -> 90%.
- emit structured breadcrumbs for DesktopNotifier: debug on success (previously silent), warn with an `error` field on failure (previously string-interpolated, off-convention) - extract a testable `log_show_outcome` seam and add unit tests for both branches via the shared tracing capture helper - correct the stale NotificationPrefs doc: the toggles are session only, not persisted to Config
- mp4 crate was added speculatively for video output but the video engine encodes GIF instead (no H.264 encoder shipped) - never referenced in source; removes 4 crates from the resolve (mp4, num-bigint, num-integer, num-rational) - shrinks build graph + supply-chain surface
- base64 was used by the old Gradio engine to decode response images; that engine was removed in the stable-diffusion.cpp migration - no longer referenced in source; manifest now reflects actual direct deps - base64 stays available transitively via reqwest, so the lockfile resolve is unchanged
- build_client_options was untested: lock that a valid DSN yields options carrying release / environment / server_name with perf tracing disabled. - Lock the invalid-DSN path: a malformed SENTRY_DSN must warn and return None (init propagates it via ?), never panic the worker at startup. - telemetry.rs line cov 90.8 -> 93.2%, region 86.5 -> 91.0%.
- surface dropped accept/reject offer-response sends at session level - reaches UI Logs tab + studio-shipped logs with the job_id attached - previously only the transport layer logged these, locally and without job context - pure offer_response_breadcrumb helper, unit-tested for both outcomes
- surface dropped Fail-frame sends at session level (upload error, dispatch error, dispatch panic arms of run_offered_job) - previously swallowed by let _ = sender.send(...): a dropped Fail left the studio holding the job until timeout with no local record - reaches UI Logs tab + studio-shipped logs with the job_id attached - pure fail_send_breadcrumb helper + record_fail_send glue, unit-tested for both outcomes
- a prior edit inserted remove_temp_file inside verify_download_len's doc comment, splitting it and leaving remove_temp_file documented as download verification - move each doc block back above its own function so both read coherently again (no behaviour change)
- auto-updater slept a full auto_update_tick (60s) before re-checking stop, so run_loops' join blocked shutdown for up to a tick after SIGTERM/SIGINT, undoing graceful shutdown - idle wait now re-polls stop every AUTO_UPDATE_SHUTDOWN_TICK (250ms) via the shared wait_with_stop helper - consolidate wait_with_stop into runtime; ws::session reuses it - tests: prompt-stop during idle wait + helper short-circuit
- Add dispatch_with_source to the Engine trait block - Drop the removed enable/disable subcommands; register has no label flag - MultiEngine routes strictly by ModelSource.engine (no kind fallback) - video.rs is an animated-GIF stand-in (not ffmpeg); tts.rs is formant-synth (not Piper) - Video TaskResult is webp/gif, never mp4
- load() read/parse failures now emit a source-level WARN breadcrumb (op=load, config_path), mirroring save(); previously only main()'s generic error surfaced them, lacking the structured studio_worker::config fields - parse breadcrumb omits the toml span on purpose: it can echo a secret on the failing line (e.g. an unterminated auth_token) - tests cover read-fail, parse-fail, and secret redaction
- build() now emits an info breadcrumb naming the backends this worker routes across (op=build, engine_count, engines=...) on the studio_worker::engine target - previously the roster was implicit: operators could not tell from the logs whether e.g. the sdcpp backend self-registered or was skipped for a missing sd-cli - roster logging split into a pure helper, unit-tested for shape; the build-level test stays environment-tolerant (no count pin)
- replace a dead empty if-block (computed values, did nothing) that silently ignored prompt + max_tokens overflowing the context window - emit a structured warn so a truncated long chat is diagnosable from the logs; we don't trim the operator's prompt (no behaviour change) - compare against n_ctx (the KV-cache size), not n_batch, so the guard fires on real overflow rather than crying wolf on normal prompts - pure exceeds_context_window helper, unit-tested at the boundary
- config.toml holds worker identity + secrets (auth_token, registration_secret) but used a plain fs::write - non-atomic: a crash/power-loss/full-disk mid-write truncated the file, wiping registration and forcing re-approval - world-readable: fs::write honours umask (typically 0644), exposing secrets to other local users - now stream to a same-dir temp file, fsync, then rename over the target; tempfile gives 0600 and persist keeps it - tests: owner-only mode (unix) + atomic replace without litter
- Status-tab Pause/Resume button flipped the flag silently; the tray menu toggle already logs. Now both paths emit a structured breadcrumb. - Extract toggle_pause() so the logging is unit-testable without egui. - Add a both-directions test (target/message/paused field, flag flip).
- add missing src modules: cli, auto_register, telemetry, update, autostart, test_support - expand tests/ list from 6 to all 22 files, grouped by area - note real_* E2E tests are feature-gated and off on free-tier CI
- push_log breadcrumbs now carry job_id as a structured field so operators can pivot shipped studio logs / Sentry by job - Option<&str> records the field only when present; jobless logs (startup, heartbeat, auto-update) stay free of an empty job_id - covered by two capture-based tests (present + absent)
- Reserve the worker atomically before accepting an offer so a second concurrent offer is rejected instead of clobbering the in-flight job. - Send Accept inside the job task and bail if it fails, releasing the busy flag. - Report the actual current job id in heartbeats instead of a placeholder; thread SessionSchedule through LoopSchedule.
- Parse api_base_url, drop query/fragment, and strip a trailing /graphics/api so a configured base that already includes the API prefix no longer double-prefixes request paths.
- Validate the installer download URL before fetching; only https is allowed, with loopback http permitted for tests.
- Resolve model cache files through a guard that rejects nested, absolute, or ../ filenames, keeping downloads inside models_root.
- config tab render now returns whether a save succeeded; the live Arc<Mutex<Config>> is only updated on a successful save so unsaved draft edits never leak into the running loops.
- Debug builds position the window at the left monitor origin; release builds defer to the window manager (behaviour unchanged). STUDIO_WORKER_WINDOW_POS=x,y overrides in any build.
- macOS plist is gg.minis.studio-worker.plist and the Windows task template is minis-studio-worker.task.xml; sync overview with code.
- run cargo audit --deny warnings on Cargo manifest changes + weekly cron - pin 11 accepted informational advisories in .cargo/audit.toml (ui GTK3 + build-macro transitive deps) - document the workflow in AGENTS.md CI section
- drain_notifications compared ring.len() to a seen-count, so once the capped recent_jobs ring saturated (50 jobs) its length stopped changing and desktop notifications silently stopped for the rest of the session - track the last-notified job by (job_id, finished_at) identity and walk newest-first until that marker, so new arrivals are detected regardless of ring saturation - add regression tests: per-job, idempotent re-drain, post-saturation
- guard out + init-image scratch via RAII so every exit path removes them - prevents temp-dir leak on sd-cli failure / unreadable output
- ImageParams gains maskUrl; sdcpp downloads the mask alongside the init image and passes sd-cli --mask so only the white region is repainted. - Powers the Find the Differences studio pipeline (draw red shapes -> inpaint the differences variant). Unit-tested arg building (mask + no-mask).
- cover run_cli's Ui arm + the non-ui run_ui bail branch - lifts lib.rs coverage from 88.6% to 100% lines
- sdcpp: a 'model' role file is a full checkpoint, loaded via sd-cli -m/--model; a 'diffusion-model' role stays on --diffusion-model (split weights). Lets the studio hand SDXL-inpaint as a single checkpoint. - Unit-tested both flag paths.
- number_prefix + paste come from the image-candle feature (hf-hub/candle), not the ui GTK3 stack - proc-macro-error is the ui stack's glib-macros build macro - header + review guidance now cover both ui and image-candle trees so stale ignores can not hide
- The reconnect attempt counter never reset after a successful connect, so a long-lived worker was killed once it accumulated DEFAULT_RECONNECT_ATTEMPTS (5) lifetime disconnects — even healthy ones. Now only consecutive failures-to-connect count toward the cap; a welcomed session resets it. - Fixes workers silently going stale after transient drops (e.g. local wrangler-dev WS recycles).
- Patch/minor bumps within existing constraints (cargo update) - Includes openssl 0.10.80, hyper 1.10.1, http 1.4.1, reqwest 0.13.4 - egui/eframe 0.34.3, serde_json 1.0.150 - fmt, clippy, tests, cargo-audit all green
The session only learned of a drop from the reader, which blocks forever on a half-open socket (post-job WS drops, dead peers). Two root-cause fixes so the worker reconnects on its own instead of hanging until an external restart: - Read-idle-timeout: the reader treats no-frames-within-N-seconds as a disconnect. The studio acks every heartbeat (~5s) so a live link always yields a frame; only a dead socket elapses it. - Abort the heartbeat/log-shipper/observer pumps on teardown: they only broke on the global stop flag, so on a silent-but-open socket (sends still buffer) they looped forever and blocked run_one_session from returning -> reconnect. - Connect timeout so a stalled upgrade can't hang the reconnect loop either. Covered by a new silent-server integration test (was hanging, now recovers) plus a connect-timeout unit test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
All quality gates verified locally: fmt, clippy (-D warnings), cargo test, cargo audit.