Skip to content

Worker: /v1/events + /v1/stats/summary (KV-backed, free tier)#6

Merged
myleshorton merged 6 commits intomainfrom
worker-analytics-engine
Apr 22, 2026
Merged

Worker: /v1/events + /v1/stats/summary (KV-backed, free tier)#6
myleshorton merged 6 commits intomainfrom
worker-analytics-engine

Conversation

@myleshorton
Copy link
Copy Markdown
Member

@myleshorton myleshorton commented Apr 22, 2026

Summary

Moves the Cloudflare Worker backing releases.getwick.dev from the archived wick-pro repo into the public wick repo, adds two new endpoints for the per-fetch telemetry work (pairs with #5), and strips out the Stripe-driven Pro-subscription flow (no longer used).

Uses KV, not Analytics Engine. AE is gated behind Workers Paid; KV works on Workers Free and matches the storage pattern the legacy /ping counters already use.

What's new

POST /v1/events — accepts the per-fetch telemetry JSON:

{"host":"nytimes.com","strategy":"cef","escalated_from":"cronet",
 "ok":true,"status":200,"timing_ms":1840,
 "version":"0.9.2","os":"macos"}

Stored in KV as evt:YYYY-MM-DD:{host}:{strategy}{fetches, successes, total_ms} JSON, 30-day TTL. Read-modify-write pattern; a few increments may race at high concurrency, which is fine for telemetry.

GET /v1/stats/summary — public (no auth). Scans the last 7 days of evt:* keys, aggregates per (host, strategy), returns shaped JSON. Cached 5 minutes in KV. Single global scan cap of 5000 keys.

What's removed

Stripe-driven subscription flow is gone: /pro/checkout, /pro/status/:session, /pro/webhook, /pro/validate/:key, /pro/success, and the verifyStripeSignature helper. Also drops the STRIPE_SECRET_KEY, STRIPE_PRICE_ID, and STRIPE_WEBHOOK_SECRET env requirements.

What's kept

/solve/:key, /proxy/:key, /analytics/:key, and /releases/:key/:file still live — they auth against the API_KEYS secret, independent of the Stripe flow. The hardening from the review round is kept: SSRF block on /proxy/:key (IP-literal reject), XSS-safe success page removed along with the endpoint, and no caller IP in download logs.

Correctness/ops from the review pass:

  • JSON.parse on the /v1/events read-modify-write falls back on corrupted KV values instead of 500-ing ingestion.
  • host and strategy are character-set-validated before reaching the colon-delimited KV key.
  • /v1/stats/summary uses a single global SCAN_CAP = 5000 with break outer; so hitting the cap stops both the inner loop and KV.list() pagination.

Deployment

cd worker
npx wrangler deploy

No new secrets required. Uses the existing SUBSCRIPTIONS KV namespace and API_KEYS secret.

Tradeoffs

  • No true p50 — the KV storage model sums total_ms, so we report mean_ms as p50_ms in the summary response. Good enough for the public stats page.
  • Scan cap/v1/stats/summary is hard-capped at 5000 keys/week scanned during refresh. At expected Wick scale this is no concern; the upgrade path is to move /v1/events writes to Analytics Engine.

🤖 Generated with Claude Code

Moves the Cloudflare Worker backing releases.getwick.dev from the
archived wick-pro repo into the public wick repo. No behavior change
to the legacy /ping, /analytics/:key, or release-download paths.

New endpoints for the per-fetch telemetry work (pairs with PR #5):

- POST /v1/events
  Accepts {host, strategy, escalated_from, ok, status, timing_ms,
  version, os}. Writes to a new Cloudflare Analytics Engine dataset
  (`wick_events`) bound as `WICK_EVENTS`. Does not persist caller IP
  as a data point.

- GET /v1/stats/summary
  Queries AE via the SQL API for a 7-day rollup of fetches + success
  rate + p50 timing per (host, strategy). Caches the JSON response
  for 5 minutes in the existing SUBSCRIPTIONS KV. Public (no auth).

Requires two new secrets on the Worker (see worker/README.md):
- CF_ANALYTICS_ACCOUNT_ID
- CF_ANALYTICS_TOKEN (AE read)

Plus one new binding in wrangler.toml:
- [[analytics_engine_datasets]] binding = WICK_EVENTS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 22, 2026

Deploying wickproject with  Cloudflare Pages  Cloudflare Pages

Latest commit: b02d8a9
Status: ✅  Deploy successful!
Preview URL: https://5ec7ca69.wickproject.pages.dev
Branch Preview URL: https://worker-analytics-engine.wickproject.pages.dev

View logs

Analytics Engine needs Workers Paid ($5/mo) to bind. KV works on
Workers Free and matches the storage pattern the existing /ping
endpoint already uses — so there's no reason to pull in a paid
product for the scale Wick currently operates at.

Storage model:
- Key:   evt:{YYYY-MM-DD}:{host}:{strategy}
- Value: {"fetches": N, "successes": M, "total_ms": T}  (JSON)
- TTL:   30 days

Each POST /v1/events is a read-modify-write on a single key. At high
concurrency a few increments may be lost, which is fine for
telemetry — mirrors the behavior of the existing ping counters.

GET /v1/stats/summary scans the last 7 days of evt:* keys, aggregates
per (host, strategy), and returns shaped JSON. Cached 5 minutes in
the same KV namespace so the scan only runs ~288 times/day.

Removed:
- Analytics Engine binding in wrangler.toml
- CF_ANALYTICS_ACCOUNT_ID / CF_ANALYTICS_TOKEN secrets
- AE SQL query in /v1/stats/summary

Tradeoffs:
- No real p50 timing — using mean_ms (total_ms / fetches) as a
  reasonable approximation. Good enough for the public stats page.
- Cap: ~5000 KV keys scanned per day during summary refresh. Well
  inside Workers Free limits for expected Wick scale.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@myleshorton myleshorton changed the title Worker: /v1/events + /v1/stats/summary (Analytics Engine) Worker: /v1/events + /v1/stats/summary (KV-backed, free tier) Apr 22, 2026
myleshorton added a commit that referenced this pull request Apr 22, 2026
analytics.rs (#3, #4, #8, #9):
- Replace per-event std::thread::spawn + per-event reqwest::Client
  with a single long-lived background worker thread + bounded channel
  (cap 512). Telemetry never applies backpressure; on a full queue
  events are dropped.
- One reused reqwest::blocking::Client for the worker's lifetime.
- Build JSON payloads via serde_json::json! instead of format! string
  interpolation. Same dependency we already pull in transitively.
- Shared wick_home() helper resolves $HOME/.wick (or /tmp/.wick) so
  the WICK_TELEMETRY opt-out marker file is honored consistently
  whether HOME is set or not. Matches what site_cache and ping_marker
  already do.
- Drop the misleading "registrable hostname" wording on extract_host;
  it returns whatever host_str() gives, subdomains and all. No PSL
  normalization.

fetch.rs (#2, #5, #6, #7):
- Strategy-selection rule extracted as a pure helper
  `should_use_cef_first(cached_strategy, cef_installed)`. Five unit
  tests lock in the documented behavior (Cronet-first by default,
  CEF only when cache explicitly says "cef" and CEF is installed).
- Fix doc comment to reflect the actual rule (no more "prefer CEF
  if installed"). Note the robots.txt early-return exception to the
  "every terminal point records telemetry" claim.
- site_cache::record() is now only called with values in the
  documented set ("cef", "cronet"). CAPTCHA-solved retries cache as
  "cronet" (the underlying transport that succeeded after the solve)
  and tag the per-fetch event as "captcha-auto" / "captcha-interactive"
  for analysis. Stops writing unsupported strategy values.
- fetch_html() now mirrors fetch()'s 403/503 → CEF escalation when
  CEF is installed and we haven't tried it yet. Crawl/map no longer
  silently underperform on JS-heavy or stealth-required sites that
  don't have a cache entry yet.

site/docs.html (#10):
- Reword the telemetry section so it doesn't link to /stats.html
  (which lands in the follow-up PR #7). Says the public stats page
  is coming in a follow-up release.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates the Cloudflare Worker for releases.getwick.dev into this repo and adds KV-backed per-fetch telemetry ingest plus a public 7‑day stats summary endpoint (designed to work on Workers Free).

Changes:

  • Add worker/wrangler.toml with routes and KV/R2 bindings for releases.getwick.dev.
  • Implement Worker handlers including POST /v1/events (KV counters w/ TTL) and GET /v1/stats/summary (KV scan + cached aggregate).
  • Document the Worker, bindings, and telemetry schema in worker/README.md.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
worker/wrangler.toml Declares Worker entrypoint, route, and KV/R2 bindings used by the new endpoints.
worker/src/index.js Adds the full Worker implementation including KV-backed telemetry ingest and summary aggregation.
worker/README.md Documents deployment, bindings, and the telemetry storage/query model.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread worker/src/index.js Outdated
Comment thread worker/src/index.js
Comment thread worker/src/index.js Outdated
Comment thread worker/src/index.js Outdated
Comment thread worker/src/index.js Outdated
Comment thread worker/src/index.js Outdated
Comment thread worker/src/index.js
Comment thread worker/src/index.js
Comment thread worker/src/index.js Outdated
myleshorton and others added 2 commits April 22, 2026 06:51
Critical (security + correctness):
  - Stripe webhook now verifies `Stripe-Signature` via HMAC-SHA256
    (5-min timestamp tolerance, constant-time compare). Before this,
    anyone who could reach /pro/webhook could mint API keys by POSTing
    a forged `checkout.session.completed` payload.
  - Geo-proxy (/proxy/:key) now rejects private/loopback/link-local
    IP literals so a paid key can't probe our internal networks.
    Catches IPv4 (0/8, 10/8, 127/8, 169.254/16, 172.16/12, 192.168/16,
    100.64/10, 192.0.2/24) and IPv6 (::, ::1, fc00::/7, fe80::/10,
    IPv4-mapped ::ffff:). DNS-based targets are not resolved here —
    documented as a known limitation.
  - /pro/success page no longer interpolates the `?session=` query
    param into an inline <script>; reads it client-side via
    URLSearchParams + encodeURIComponent instead.

Correctness / ops:
  - /v1/events wraps `JSON.parse(existingRaw)` in try/catch so a
    corrupted KV value doesn't 500 ingestion.
  - /v1/events validates `host` (/^[a-zA-Z0-9.-]+$/) and `strategy`
    (/^[a-zA-Z0-9_-]+$/) before embedding them in the colon-delimited
    KV key, so neither can inject a `:` that breaks parsing at summary
    time.
  - /v1/stats/summary: single global SCAN_CAP (5000) instead of
    per-day, and `break outer` exits both the inner key loop AND the
    pagination `do/while` so the cap genuinely stops work.
  - Active `session:<id>` KV entries now get a 24h TTL (was unbounded).

Privacy:
  - /releases download log no longer records caller IP (CF-Connecting-
    IP). Matches the repo's stated "don't persist IP as a data point"
    posture.

All changes pass `node --check`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stripe purchases have been retired, so the entire subscription pipeline
(`/pro/checkout`, `/pro/status/:session`, `/pro/webhook`, `/pro/validate/:key`,
`/pro/success` page, and the `verifyStripeSignature` helper) is dead code.
This also removes the `STRIPE_SECRET_KEY`, `STRIPE_PRICE_ID`, and
`STRIPE_WEBHOOK_SECRET` env-var requirements that were briefly enforced
in the prior review-fix commit.

Kept:
- `/solve/:key`, `/proxy/:key`, `/analytics/:key`, `/releases/:key/:file`
  (auth via the `API_KEYS` secret — unrelated to Stripe).
- The SSRF block on `/proxy/:key` and the IP-stripped download log.
- The KV binding name `SUBSCRIPTIONS` (legacy entries still live under
  `key:` and `session:` prefixes; wrangler.toml comment updated to
  describe current usage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread worker/src/index.js Outdated
Comment thread worker/src/index.js
Comment thread worker/README.md
Comment thread worker/README.md Outdated
Comment thread worker/src/index.js Outdated
Comment thread worker/src/index.js Outdated
Comment thread worker/src/index.js
Comment thread worker/src/index.js Outdated
Comment thread worker/src/index.js
Worker hardening:
  - /v1/events validates the parsed KV value's shape (must be a plain
    object with numeric fields) before incrementing — null, strings,
    or manually-edited corrupt values reset to zero rather than throw
    or write back a poisoned record.
  - /v1/events uses `body.ok === true` (strict boolean) so the
    unauthenticated endpoint can't be skewed by `"false"` (a string)
    being truthy.
  - /v1/stats/summary coerces stored fields via Number() and ignores
    non-finite values, so a stringly-typed `"1"` can't turn arithmetic
    into string concatenation.
  - /proxy/:key no longer reflects the user-controlled URL back as
    `X-Proxy-Url` (newlines could trigger invalid-header errors;
    query-string secrets could leak into downstream tooling that
    records response headers).
  - /proxy/:key log entry records `host` + `path` (from the parsed
    URL) instead of the raw `body.url`, so signed-URL tokens and other
    query-string secrets stay out of worker logs.
  - /releases/:key/:filename picks Content-Type by extension —
    `.tar.bz2` files now serve as `application/x-bzip2` instead of
    being mislabeled `application/gzip`. Unknown extensions fall back
    to `application/octet-stream`.

Doc/comment fixes:
  - Stats handler comment now says "single global cap of 5000 KV
    keys" to match the actual SCAN_CAP (was "7*1000").
  - README's stats schema table has named columns (was empty header).
  - README's references to `site/stats.html` note that the renderer
    ships in a follow-up PR (it isn't in this repo yet).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@myleshorton myleshorton merged commit 4e68445 into main Apr 22, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants