Worker: /v1/events + /v1/stats/summary (KV-backed, free tier)#6
Merged
myleshorton merged 6 commits intomainfrom Apr 22, 2026
Merged
Worker: /v1/events + /v1/stats/summary (KV-backed, free tier)#6myleshorton merged 6 commits intomainfrom
myleshorton merged 6 commits intomainfrom
Conversation
Moves the Cloudflare Worker backing releases.getwick.dev from the archived wick-pro repo into the public wick repo. No behavior change to the legacy /ping, /analytics/:key, or release-download paths. New endpoints for the per-fetch telemetry work (pairs with PR #5): - POST /v1/events Accepts {host, strategy, escalated_from, ok, status, timing_ms, version, os}. Writes to a new Cloudflare Analytics Engine dataset (`wick_events`) bound as `WICK_EVENTS`. Does not persist caller IP as a data point. - GET /v1/stats/summary Queries AE via the SQL API for a 7-day rollup of fetches + success rate + p50 timing per (host, strategy). Caches the JSON response for 5 minutes in the existing SUBSCRIPTIONS KV. Public (no auth). Requires two new secrets on the Worker (see worker/README.md): - CF_ANALYTICS_ACCOUNT_ID - CF_ANALYTICS_TOKEN (AE read) Plus one new binding in wrangler.toml: - [[analytics_engine_datasets]] binding = WICK_EVENTS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploying wickproject with
|
| Latest commit: |
b02d8a9
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://5ec7ca69.wickproject.pages.dev |
| Branch Preview URL: | https://worker-analytics-engine.wickproject.pages.dev |
4 tasks
Analytics Engine needs Workers Paid ($5/mo) to bind. KV works on
Workers Free and matches the storage pattern the existing /ping
endpoint already uses — so there's no reason to pull in a paid
product for the scale Wick currently operates at.
Storage model:
- Key: evt:{YYYY-MM-DD}:{host}:{strategy}
- Value: {"fetches": N, "successes": M, "total_ms": T} (JSON)
- TTL: 30 days
Each POST /v1/events is a read-modify-write on a single key. At high
concurrency a few increments may be lost, which is fine for
telemetry — mirrors the behavior of the existing ping counters.
GET /v1/stats/summary scans the last 7 days of evt:* keys, aggregates
per (host, strategy), and returns shaped JSON. Cached 5 minutes in
the same KV namespace so the scan only runs ~288 times/day.
Removed:
- Analytics Engine binding in wrangler.toml
- CF_ANALYTICS_ACCOUNT_ID / CF_ANALYTICS_TOKEN secrets
- AE SQL query in /v1/stats/summary
Tradeoffs:
- No real p50 timing — using mean_ms (total_ms / fetches) as a
reasonable approximation. Good enough for the public stats page.
- Cap: ~5000 KV keys scanned per day during summary refresh. Well
inside Workers Free limits for expected Wick scale.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
myleshorton
added a commit
that referenced
this pull request
Apr 22, 2026
analytics.rs (#3, #4, #8, #9): - Replace per-event std::thread::spawn + per-event reqwest::Client with a single long-lived background worker thread + bounded channel (cap 512). Telemetry never applies backpressure; on a full queue events are dropped. - One reused reqwest::blocking::Client for the worker's lifetime. - Build JSON payloads via serde_json::json! instead of format! string interpolation. Same dependency we already pull in transitively. - Shared wick_home() helper resolves $HOME/.wick (or /tmp/.wick) so the WICK_TELEMETRY opt-out marker file is honored consistently whether HOME is set or not. Matches what site_cache and ping_marker already do. - Drop the misleading "registrable hostname" wording on extract_host; it returns whatever host_str() gives, subdomains and all. No PSL normalization. fetch.rs (#2, #5, #6, #7): - Strategy-selection rule extracted as a pure helper `should_use_cef_first(cached_strategy, cef_installed)`. Five unit tests lock in the documented behavior (Cronet-first by default, CEF only when cache explicitly says "cef" and CEF is installed). - Fix doc comment to reflect the actual rule (no more "prefer CEF if installed"). Note the robots.txt early-return exception to the "every terminal point records telemetry" claim. - site_cache::record() is now only called with values in the documented set ("cef", "cronet"). CAPTCHA-solved retries cache as "cronet" (the underlying transport that succeeded after the solve) and tag the per-fetch event as "captcha-auto" / "captcha-interactive" for analysis. Stops writing unsupported strategy values. - fetch_html() now mirrors fetch()'s 403/503 → CEF escalation when CEF is installed and we haven't tried it yet. Crawl/map no longer silently underperform on JS-heavy or stealth-required sites that don't have a cache entry yet. site/docs.html (#10): - Reword the telemetry section so it doesn't link to /stats.html (which lands in the follow-up PR #7). Says the public stats page is coming in a follow-up release. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Migrates the Cloudflare Worker for releases.getwick.dev into this repo and adds KV-backed per-fetch telemetry ingest plus a public 7‑day stats summary endpoint (designed to work on Workers Free).
Changes:
- Add
worker/wrangler.tomlwith routes and KV/R2 bindings forreleases.getwick.dev. - Implement Worker handlers including
POST /v1/events(KV counters w/ TTL) andGET /v1/stats/summary(KV scan + cached aggregate). - Document the Worker, bindings, and telemetry schema in
worker/README.md.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| worker/wrangler.toml | Declares Worker entrypoint, route, and KV/R2 bindings used by the new endpoints. |
| worker/src/index.js | Adds the full Worker implementation including KV-backed telemetry ingest and summary aggregation. |
| worker/README.md | Documents deployment, bindings, and the telemetry storage/query model. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Critical (security + correctness):
- Stripe webhook now verifies `Stripe-Signature` via HMAC-SHA256
(5-min timestamp tolerance, constant-time compare). Before this,
anyone who could reach /pro/webhook could mint API keys by POSTing
a forged `checkout.session.completed` payload.
- Geo-proxy (/proxy/:key) now rejects private/loopback/link-local
IP literals so a paid key can't probe our internal networks.
Catches IPv4 (0/8, 10/8, 127/8, 169.254/16, 172.16/12, 192.168/16,
100.64/10, 192.0.2/24) and IPv6 (::, ::1, fc00::/7, fe80::/10,
IPv4-mapped ::ffff:). DNS-based targets are not resolved here —
documented as a known limitation.
- /pro/success page no longer interpolates the `?session=` query
param into an inline <script>; reads it client-side via
URLSearchParams + encodeURIComponent instead.
Correctness / ops:
- /v1/events wraps `JSON.parse(existingRaw)` in try/catch so a
corrupted KV value doesn't 500 ingestion.
- /v1/events validates `host` (/^[a-zA-Z0-9.-]+$/) and `strategy`
(/^[a-zA-Z0-9_-]+$/) before embedding them in the colon-delimited
KV key, so neither can inject a `:` that breaks parsing at summary
time.
- /v1/stats/summary: single global SCAN_CAP (5000) instead of
per-day, and `break outer` exits both the inner key loop AND the
pagination `do/while` so the cap genuinely stops work.
- Active `session:<id>` KV entries now get a 24h TTL (was unbounded).
Privacy:
- /releases download log no longer records caller IP (CF-Connecting-
IP). Matches the repo's stated "don't persist IP as a data point"
posture.
All changes pass `node --check`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stripe purchases have been retired, so the entire subscription pipeline (`/pro/checkout`, `/pro/status/:session`, `/pro/webhook`, `/pro/validate/:key`, `/pro/success` page, and the `verifyStripeSignature` helper) is dead code. This also removes the `STRIPE_SECRET_KEY`, `STRIPE_PRICE_ID`, and `STRIPE_WEBHOOK_SECRET` env-var requirements that were briefly enforced in the prior review-fix commit. Kept: - `/solve/:key`, `/proxy/:key`, `/analytics/:key`, `/releases/:key/:file` (auth via the `API_KEYS` secret — unrelated to Stripe). - The SSRF block on `/proxy/:key` and the IP-stripped download log. - The KV binding name `SUBSCRIPTIONS` (legacy entries still live under `key:` and `session:` prefixes; wrangler.toml comment updated to describe current usage). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Worker hardening:
- /v1/events validates the parsed KV value's shape (must be a plain
object with numeric fields) before incrementing — null, strings,
or manually-edited corrupt values reset to zero rather than throw
or write back a poisoned record.
- /v1/events uses `body.ok === true` (strict boolean) so the
unauthenticated endpoint can't be skewed by `"false"` (a string)
being truthy.
- /v1/stats/summary coerces stored fields via Number() and ignores
non-finite values, so a stringly-typed `"1"` can't turn arithmetic
into string concatenation.
- /proxy/:key no longer reflects the user-controlled URL back as
`X-Proxy-Url` (newlines could trigger invalid-header errors;
query-string secrets could leak into downstream tooling that
records response headers).
- /proxy/:key log entry records `host` + `path` (from the parsed
URL) instead of the raw `body.url`, so signed-URL tokens and other
query-string secrets stay out of worker logs.
- /releases/:key/:filename picks Content-Type by extension —
`.tar.bz2` files now serve as `application/x-bzip2` instead of
being mislabeled `application/gzip`. Unknown extensions fall back
to `application/octet-stream`.
Doc/comment fixes:
- Stats handler comment now says "single global cap of 5000 KV
keys" to match the actual SCAN_CAP (was "7*1000").
- README's stats schema table has named columns (was empty header).
- README's references to `site/stats.html` note that the renderer
ships in a follow-up PR (it isn't in this repo yet).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Moves the Cloudflare Worker backing
releases.getwick.devfrom the archived wick-pro repo into the public wick repo, adds two new endpoints for the per-fetch telemetry work (pairs with #5), and strips out the Stripe-driven Pro-subscription flow (no longer used).Uses KV, not Analytics Engine. AE is gated behind Workers Paid; KV works on Workers Free and matches the storage pattern the legacy
/pingcounters already use.What's new
POST /v1/events— accepts the per-fetch telemetry JSON:{"host":"nytimes.com","strategy":"cef","escalated_from":"cronet", "ok":true,"status":200,"timing_ms":1840, "version":"0.9.2","os":"macos"}Stored in KV as
evt:YYYY-MM-DD:{host}:{strategy}→{fetches, successes, total_ms}JSON, 30-day TTL. Read-modify-write pattern; a few increments may race at high concurrency, which is fine for telemetry.GET /v1/stats/summary— public (no auth). Scans the last 7 days ofevt:*keys, aggregates per(host, strategy), returns shaped JSON. Cached 5 minutes in KV. Single global scan cap of 5000 keys.What's removed
Stripe-driven subscription flow is gone:
/pro/checkout,/pro/status/:session,/pro/webhook,/pro/validate/:key,/pro/success, and theverifyStripeSignaturehelper. Also drops theSTRIPE_SECRET_KEY,STRIPE_PRICE_ID, andSTRIPE_WEBHOOK_SECRETenv requirements.What's kept
/solve/:key,/proxy/:key,/analytics/:key, and/releases/:key/:filestill live — they auth against theAPI_KEYSsecret, independent of the Stripe flow. The hardening from the review round is kept: SSRF block on/proxy/:key(IP-literal reject), XSS-safe success page removed along with the endpoint, and no caller IP in download logs.Correctness/ops from the review pass:
/v1/eventsread-modify-write falls back on corrupted KV values instead of 500-ing ingestion.hostandstrategyare character-set-validated before reaching the colon-delimited KV key./v1/stats/summaryuses a single globalSCAN_CAP = 5000withbreak outer;so hitting the cap stops both the inner loop andKV.list()pagination.Deployment
cd worker npx wrangler deployNo new secrets required. Uses the existing
SUBSCRIPTIONSKV namespace andAPI_KEYSsecret.Tradeoffs
total_ms, so we reportmean_msasp50_msin the summary response. Good enough for the public stats page./v1/stats/summaryis hard-capped at 5000 keys/week scanned during refresh. At expected Wick scale this is no concern; the upgrade path is to move/v1/eventswrites to Analytics Engine.🤖 Generated with Claude Code