Skip to content

Per-fetch telemetry + site_cache wiring + opt-out#5

Merged
myleshorton merged 3 commits intomainfrom
telemetry-fetch-events
Apr 22, 2026
Merged

Per-fetch telemetry + site_cache wiring + opt-out#5
myleshorton merged 3 commits intomainfrom
telemetry-fetch-events

Conversation

@myleshorton
Copy link
Copy Markdown
Member

Summary

Wire up continuous fetch telemetry so we can answer "which sites does Wick work on, with which strategy, and which need work." Also finally wires the site_cache module that's been sitting unused since the unification PR.

Paired with PR #2 (Worker) and PR #3 (public stats page) per the approved plan at ~/.claude/plans/humming-spinning-owl.md.

What changes for users

  • Faster default. Cronet-first, CEF only when needed. Example: wick fetch https://example.com goes from ~10s (CEF cold start) to ~100ms (direct HTTP via Chrome TLS fingerprint).
  • Learned per-site strategy. First fetch may probe Cronet → escalate to CEF if blocked. The winning strategy is cached in ~/.wick/site-cache.json and reused on subsequent fetches.
  • Moderate telemetry. Each fetch sends {host, strategy, escalated_from, ok, status, timing_ms, version, os} to releases.getwick.dev/v1/events. Hostnames only — no paths, no content, no IPs persisted.
  • Honest opt-out. WICK_TELEMETRY=0 or ~/.wick/no-telemetry suppresses all telemetry.

Details

rust/src/analytics.rs:

  • New FetchEvent struct and report_fetch() function
  • is_opted_out() checks env var and marker file
  • All existing telemetry (ping, report_failure) also honors the opt-out
  • extract_host(url) helper for URL → hostname extraction

rust/src/fetch.rs:

  • Consults site_cache::get(host) at entry
  • Strategy selection: cache says cef → go straight to CEF; otherwise Cronet-first with CEF-escalation-on-block
  • Records outcome at every terminal return branch: 8 distinct strategies tagged
  • Mirrors the same changes on fetch_html() for the crawl path
  • Addresses the dead-code warnings from PR Unify Free and Pro into a single open-source project #4

site/docs.html:

  • New Telemetry section explaining exactly what's collected, what's not, how to opt out, retention

Backward compat

  • The Worker endpoint /v1/events doesn't exist yet (PR Add Apify marketplace Actor #2 lands it). Events fired now fail silently via the existing fire-and-forget thread — no user impact.
  • Existing /ping and /ping error-event paths unchanged, still land in the legacy KV-backed dashboard.

Test plan

  • cargo build --release --features cronet — clean, no new warnings
  • wick fetch https://example.com — 100ms via Cronet, cached as cronet
  • wick fetch https://www.cloudflare.com — 494ms via Cronet, cached as cronet
  • site-cache.json populated correctly after fetches
  • WICK_TELEMETRY=0 wick fetch ... — no thread spawned to report

🤖 Generated with Claude Code

Wick now reports an anonymized per-fetch record so we can answer
"which sites does it work on, with which strategy, and which need
work." Site cache is finally wired in — each install learns which
strategy works per host and reuses it on subsequent fetches.

analytics.rs:
- New `report_fetch(FetchEvent)` posts to /v1/events with
  {host, strategy, escalated_from, ok, status, timing_ms, version, os}
- `is_opted_out()` checks WICK_TELEMETRY=0 or ~/.wick/no-telemetry
- All three telemetry fns (ping, report_failure, report_fetch) honor
  the opt-out.
- No URL paths, no content, no IPs stored.

fetch.rs:
- site_cache consulted at entry; cached "cef" → go straight to CEF.
- New default: Cronet-first, CEF only as escalation when blocked.
  Previously "CEF if installed" meant example.com took 10s; now 100ms.
- Every terminal return records the outcome to site_cache and
  report_fetch. Strategies: cronet, cef, cef-after-cronet,
  captcha-auto, captcha-interactive, cronet-blocked, cronet-error,
  captcha-blocked. Same telemetry on fetch_html (crawl path).

site_cache.rs resolves the dead-code PR #4 review comment.

docs.html:
- New Telemetry section explaining schema, opt-out, retention,
  and what's deliberately not collected.

The /v1/events Worker endpoint doesn't exist yet — PR #2 lands it.
Events sent against main today 404 harmlessly (fire-and-forget).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 22, 2026

Deploying wickproject with  Cloudflare Pages  Cloudflare Pages

Latest commit: b007598
Status: ✅  Deploy successful!
Preview URL: https://2aef41de.wickproject.pages.dev
Branch Preview URL: https://telemetry-fetch-events.wickproject.pages.dev

View logs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR wires per-fetch telemetry and hooks up site_cache to learn and reuse the best fetch strategy per host, with an explicit telemetry opt-out mechanism.

Changes:

  • Adds structured per-fetch telemetry (FetchEvent) and opt-out gating (WICK_TELEMETRY=0 or ~/.wick/no-telemetry).
  • Updates fetch routing to consult site_cache, prefer Cronet-first, and escalate to CEF on block (plus telemetry tagging).
  • Documents telemetry collection/opt-out/retention in the site docs.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 10 comments.

File Description
site/docs.html Adds a “Telemetry” section and sidebar link describing collection scope + opt-out.
rust/src/fetch.rs Implements site-cache driven strategy selection, Cronet→CEF escalation, and per-branch fetch telemetry reporting.
rust/src/analytics.rs Introduces per-fetch event reporting endpoint and unified opt-out gating for all telemetry.
rust/Cargo.lock Updates the locked crate version to 0.9.1.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rust/src/fetch.rs Outdated
Comment thread rust/src/fetch.rs
Comment thread rust/src/analytics.rs Outdated
Comment thread rust/src/analytics.rs Outdated
Comment thread rust/src/fetch.rs Outdated
Comment thread rust/src/fetch.rs
Comment thread rust/src/fetch.rs Outdated
Comment thread rust/src/analytics.rs Outdated
Comment thread rust/src/analytics.rs Outdated
Comment thread site/docs.html Outdated
analytics.rs (#3, #4, #8, #9):
- Replace per-event std::thread::spawn + per-event reqwest::Client
  with a single long-lived background worker thread + bounded channel
  (cap 512). Telemetry never applies backpressure; on a full queue
  events are dropped.
- One reused reqwest::blocking::Client for the worker's lifetime.
- Build JSON payloads via serde_json::json! instead of format! string
  interpolation. Same dependency we already pull in transitively.
- Shared wick_home() helper resolves $HOME/.wick (or /tmp/.wick) so
  the WICK_TELEMETRY opt-out marker file is honored consistently
  whether HOME is set or not. Matches what site_cache and ping_marker
  already do.
- Drop the misleading "registrable hostname" wording on extract_host;
  it returns whatever host_str() gives, subdomains and all. No PSL
  normalization.

fetch.rs (#2, #5, #6, #7):
- Strategy-selection rule extracted as a pure helper
  `should_use_cef_first(cached_strategy, cef_installed)`. Five unit
  tests lock in the documented behavior (Cronet-first by default,
  CEF only when cache explicitly says "cef" and CEF is installed).
- Fix doc comment to reflect the actual rule (no more "prefer CEF
  if installed"). Note the robots.txt early-return exception to the
  "every terminal point records telemetry" claim.
- site_cache::record() is now only called with values in the
  documented set ("cef", "cronet"). CAPTCHA-solved retries cache as
  "cronet" (the underlying transport that succeeded after the solve)
  and tag the per-fetch event as "captcha-auto" / "captcha-interactive"
  for analysis. Stops writing unsupported strategy values.
- fetch_html() now mirrors fetch()'s 403/503 → CEF escalation when
  CEF is installed and we haven't tried it yet. Crawl/map no longer
  silently underperform on JS-heavy or stealth-required sites that
  don't have a cache entry yet.

site/docs.html (#10):
- Reword the telemetry section so it doesn't link to /stats.html
  (which lands in the follow-up PR #7). Says the public stats page
  is coming in a follow-up release.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rust/src/fetch.rs
Comment thread rust/src/fetch.rs
Comment thread rust/src/fetch.rs Outdated
Comment thread rust/src/fetch.rs
Comment thread rust/src/analytics.rs Outdated
Comment thread rust/src/analytics.rs
Comment thread site/docs.html Outdated
fetch.rs:
  - Record `cef_timeout` in site_cache when CEF-first render fails so the
    next fetch doesn't keep paying the CEF-first cost (both fetch() and
    fetch_html()). The cached strategy flips back to "cef" only after a
    real CEF success records over it.
  - Replace bare `?` on `client.get(url).await?` with explicit error
    arms that emit a `cronet-transport-error` (or per-captcha-strategy)
    FetchEvent before propagating, so transport failures still produce
    telemetry. Applies to the main fetch, both captcha retries, and
    fetch_html.
  - captcha-blocked now reports `escalated_from: Some("cronet")` to
    match the captcha-auto and captcha-interactive branches; the
    `captcha-*` strategies are enrichments of the cronet attempt, not
    transport switches.

analytics.rs:
  - worker_sender no longer panics when the background thread can't be
    spawned. It returns `Option<&SyncSender>`, and enqueue() silently
    no-ops if spawn failed — telemetry never fails loudly, even on
    constrained platforms.

site_cache.rs:
  - cache_path() now uses the shared `analytics::wick_home()` helper, so
    `~/.wick` vs `/tmp/.wick` fallback is consistent across all on-disk
    Wick state and the wick_home doc claim is accurate.

site/docs.html:
  - Document the `/tmp/.wick/no-telemetry` opt-out fallback for
    services/containers without HOME.

All 17 unit + integration tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@myleshorton myleshorton merged commit 9e89d35 into main Apr 22, 2026
4 checks passed
myleshorton added a commit that referenced this pull request Apr 22, 2026
Minor release covering the per-fetch telemetry work merged in #5#7:

  - Per-fetch events posted to /v1/events (host, strategy, ok, status,
    timing). Site cache wired into fetch.rs so per-host strategy is
    learned and reused on subsequent fetches.
  - Default fetch flow is now Cronet-first; CEF is the escalation
    path on 403/503 (or when the cache explicitly says "cef"). The
    old "CEF-first when installed" default is gone.
  - `WICK_TELEMETRY=0` env var and `~/.wick/no-telemetry` marker
    fully disable outbound telemetry.
  - Public stats page at https://getwick.dev/stats.html fed by the
    new KV-backed `/v1/events` + `/v1/stats/summary` endpoints.

npm/scripts/install.js sha256s will be updated in a follow-up commit
once the release workflow publishes the tarballs and we can hash them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants