Per-fetch telemetry + site_cache wiring + opt-out#5
Merged
myleshorton merged 3 commits intomainfrom Apr 22, 2026
Merged
Conversation
Wick now reports an anonymized per-fetch record so we can answer
"which sites does it work on, with which strategy, and which need
work." Site cache is finally wired in — each install learns which
strategy works per host and reuses it on subsequent fetches.
analytics.rs:
- New `report_fetch(FetchEvent)` posts to /v1/events with
{host, strategy, escalated_from, ok, status, timing_ms, version, os}
- `is_opted_out()` checks WICK_TELEMETRY=0 or ~/.wick/no-telemetry
- All three telemetry fns (ping, report_failure, report_fetch) honor
the opt-out.
- No URL paths, no content, no IPs stored.
fetch.rs:
- site_cache consulted at entry; cached "cef" → go straight to CEF.
- New default: Cronet-first, CEF only as escalation when blocked.
Previously "CEF if installed" meant example.com took 10s; now 100ms.
- Every terminal return records the outcome to site_cache and
report_fetch. Strategies: cronet, cef, cef-after-cronet,
captcha-auto, captcha-interactive, cronet-blocked, cronet-error,
captcha-blocked. Same telemetry on fetch_html (crawl path).
site_cache.rs resolves the dead-code PR #4 review comment.
docs.html:
- New Telemetry section explaining schema, opt-out, retention,
and what's deliberately not collected.
The /v1/events Worker endpoint doesn't exist yet — PR #2 lands it.
Events sent against main today 404 harmlessly (fire-and-forget).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploying wickproject with
|
| Latest commit: |
b007598
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://2aef41de.wickproject.pages.dev |
| Branch Preview URL: | https://telemetry-fetch-events.wickproject.pages.dev |
This was referenced Apr 22, 2026
There was a problem hiding this comment.
Pull request overview
This PR wires per-fetch telemetry and hooks up site_cache to learn and reuse the best fetch strategy per host, with an explicit telemetry opt-out mechanism.
Changes:
- Adds structured per-fetch telemetry (
FetchEvent) and opt-out gating (WICK_TELEMETRY=0or~/.wick/no-telemetry). - Updates fetch routing to consult
site_cache, prefer Cronet-first, and escalate to CEF on block (plus telemetry tagging). - Documents telemetry collection/opt-out/retention in the site docs.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| site/docs.html | Adds a “Telemetry” section and sidebar link describing collection scope + opt-out. |
| rust/src/fetch.rs | Implements site-cache driven strategy selection, Cronet→CEF escalation, and per-branch fetch telemetry reporting. |
| rust/src/analytics.rs | Introduces per-fetch event reporting endpoint and unified opt-out gating for all telemetry. |
| rust/Cargo.lock | Updates the locked crate version to 0.9.1. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
analytics.rs (#3, #4, #8, #9): - Replace per-event std::thread::spawn + per-event reqwest::Client with a single long-lived background worker thread + bounded channel (cap 512). Telemetry never applies backpressure; on a full queue events are dropped. - One reused reqwest::blocking::Client for the worker's lifetime. - Build JSON payloads via serde_json::json! instead of format! string interpolation. Same dependency we already pull in transitively. - Shared wick_home() helper resolves $HOME/.wick (or /tmp/.wick) so the WICK_TELEMETRY opt-out marker file is honored consistently whether HOME is set or not. Matches what site_cache and ping_marker already do. - Drop the misleading "registrable hostname" wording on extract_host; it returns whatever host_str() gives, subdomains and all. No PSL normalization. fetch.rs (#2, #5, #6, #7): - Strategy-selection rule extracted as a pure helper `should_use_cef_first(cached_strategy, cef_installed)`. Five unit tests lock in the documented behavior (Cronet-first by default, CEF only when cache explicitly says "cef" and CEF is installed). - Fix doc comment to reflect the actual rule (no more "prefer CEF if installed"). Note the robots.txt early-return exception to the "every terminal point records telemetry" claim. - site_cache::record() is now only called with values in the documented set ("cef", "cronet"). CAPTCHA-solved retries cache as "cronet" (the underlying transport that succeeded after the solve) and tag the per-fetch event as "captcha-auto" / "captcha-interactive" for analysis. Stops writing unsupported strategy values. - fetch_html() now mirrors fetch()'s 403/503 → CEF escalation when CEF is installed and we haven't tried it yet. Crawl/map no longer silently underperform on JS-heavy or stealth-required sites that don't have a cache entry yet. site/docs.html (#10): - Reword the telemetry section so it doesn't link to /stats.html (which lands in the follow-up PR #7). Says the public stats page is coming in a follow-up release. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 4 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fetch.rs:
- Record `cef_timeout` in site_cache when CEF-first render fails so the
next fetch doesn't keep paying the CEF-first cost (both fetch() and
fetch_html()). The cached strategy flips back to "cef" only after a
real CEF success records over it.
- Replace bare `?` on `client.get(url).await?` with explicit error
arms that emit a `cronet-transport-error` (or per-captcha-strategy)
FetchEvent before propagating, so transport failures still produce
telemetry. Applies to the main fetch, both captcha retries, and
fetch_html.
- captcha-blocked now reports `escalated_from: Some("cronet")` to
match the captcha-auto and captcha-interactive branches; the
`captcha-*` strategies are enrichments of the cronet attempt, not
transport switches.
analytics.rs:
- worker_sender no longer panics when the background thread can't be
spawned. It returns `Option<&SyncSender>`, and enqueue() silently
no-ops if spawn failed — telemetry never fails loudly, even on
constrained platforms.
site_cache.rs:
- cache_path() now uses the shared `analytics::wick_home()` helper, so
`~/.wick` vs `/tmp/.wick` fallback is consistent across all on-disk
Wick state and the wick_home doc claim is accurate.
site/docs.html:
- Document the `/tmp/.wick/no-telemetry` opt-out fallback for
services/containers without HOME.
All 17 unit + integration tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
myleshorton
added a commit
that referenced
this pull request
Apr 22, 2026
Minor release covering the per-fetch telemetry work merged in #5–#7: - Per-fetch events posted to /v1/events (host, strategy, ok, status, timing). Site cache wired into fetch.rs so per-host strategy is learned and reused on subsequent fetches. - Default fetch flow is now Cronet-first; CEF is the escalation path on 403/503 (or when the cache explicitly says "cef"). The old "CEF-first when installed" default is gone. - `WICK_TELEMETRY=0` env var and `~/.wick/no-telemetry` marker fully disable outbound telemetry. - Public stats page at https://getwick.dev/stats.html fed by the new KV-backed `/v1/events` + `/v1/stats/summary` endpoints. npm/scripts/install.js sha256s will be updated in a follow-up commit once the release workflow publishes the tarballs and we can hash them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wire up continuous fetch telemetry so we can answer "which sites does Wick work on, with which strategy, and which need work." Also finally wires the
site_cachemodule that's been sitting unused since the unification PR.Paired with PR #2 (Worker) and PR #3 (public stats page) per the approved plan at
~/.claude/plans/humming-spinning-owl.md.What changes for users
wick fetch https://example.comgoes from ~10s (CEF cold start) to ~100ms (direct HTTP via Chrome TLS fingerprint).~/.wick/site-cache.jsonand reused on subsequent fetches.{host, strategy, escalated_from, ok, status, timing_ms, version, os}toreleases.getwick.dev/v1/events. Hostnames only — no paths, no content, no IPs persisted.WICK_TELEMETRY=0or~/.wick/no-telemetrysuppresses all telemetry.Details
rust/src/analytics.rs:FetchEventstruct andreport_fetch()functionis_opted_out()checks env var and marker fileping,report_failure) also honors the opt-outextract_host(url)helper for URL → hostname extractionrust/src/fetch.rs:site_cache::get(host)at entrycef→ go straight to CEF; otherwise Cronet-first with CEF-escalation-on-blockfetch_html()for the crawl pathsite/docs.html:Backward compat
/v1/eventsdoesn't exist yet (PR Add Apify marketplace Actor #2 lands it). Events fired now fail silently via the existing fire-and-forget thread — no user impact./pingand/pingerror-event paths unchanged, still land in the legacy KV-backed dashboard.Test plan
cargo build --release --features cronet— clean, no new warningswick fetch https://example.com— 100ms via Cronet, cached ascronetwick fetch https://www.cloudflare.com— 494ms via Cronet, cached ascronetsite-cache.jsonpopulated correctly after fetchesWICK_TELEMETRY=0 wick fetch ...— no thread spawned to report🤖 Generated with Claude Code