Add opt-in browser-scraper Worker for JS-rendered monitor pages by valuecodes · Pull Request #14 · valuecodes/switch-operator

valuecodes · 2026-05-02T05:25:59Z

What

Adds a separate apps/browser-scraper Worker that runs Playwright on Cloudflare Browser Rendering, called from operator via a Service Binding. Native fetch() stays the default; per-monitor use_browser flag opts into the browser path. Includes the previously committed logger metadata-key redaction.

New Worker switch-operator-browser-scraper with workers_dev: false (only reachable via service binding); SSRF guards on top-level URL, post-redirect URL, and every subresource via page.route()
Operator: new use_browser column (migration 0005), BROWSER_SCRAPER service binding, scrapeUrl switched to options-object signature
Shared packages/url-validator workspace, used by both Workers
CI: deploy step runs browser-scraper before operator so the binding resolves on first deploy

How to test

pnpm typecheck && pnpm lint && pnpm format:check && pnpm test — all green (210 tests)
Manual end-to-end requires deploying browser-scraper (no local emulator for Browser Rendering); see plan file for the migration → deploy → wrangler tail flow

Security review

Auth / network: new Worker is service-binding-only (workers_dev: false); SSRF validator blocks private/loopback/non-HTTPS on three paths (initial URL, post-redirect, subresources); 2 MB body cap and 15s navigation timeout protect the daily browser budget.
Deps: @cloudflare/playwright@1.2.0 added to the new Worker only; operator bundle unchanged.

Copilot

Pull request overview

This PR introduces a new apps/browser-scraper Cloudflare Worker so the operator can optionally scrape JS-rendered pages through Browser Rendering instead of the existing native fetch() path. It also factors URL validation into a shared workspace package, adds the operator-side service binding and schedule flag plumbing, and includes the logger metadata redaction update.

Changes:

Adds a new @repo/browser-scraper Worker that uses Playwright on Cloudflare Browser Rendering, plus its config, package metadata, and tests.
Moves URL safety checks into a shared @repo/url-validator package and switches operator scraping/validation code to use it.
Adds operator-side browser-scraper binding/migration wiring and updates CI deploy order so the new service exists before operator deploys.

Reviewed changes

Copilot reviewed 25 out of 28 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`pnpm-lock.yaml`	Locks new workspace/package dependencies for browser-scraper and url-validator.
`packages/url-validator/vitest.config.ts`	Adds Vitest config for the shared URL validator package.
`packages/url-validator/tsconfig.json`	Adds TS config for the shared URL validator package.
`packages/url-validator/src/url-validator.ts`	Introduces shared source URL validation logic used by operator and browser-scraper.
`packages/url-validator/src/url-validator.test.ts`	Adds unit tests for the shared URL validator.
`packages/url-validator/package.json`	Defines the new shared package metadata and scripts.
`packages/url-validator/eslint.config.ts`	Adds ESLint config for the new shared package.
`packages/logger/src/logger.ts`	Adds top-level metadata redaction for sensitive log keys.
`packages/logger/src/__tests__/logger.test.ts`	Tests the new logger redaction behavior.
`apps/operator/wrangler.jsonc`	Wires the browser-scraper service binding into the operator Worker.
`apps/operator/src/types/env.ts`	Extends operator environment typings with the browser-scraper binding.
`apps/operator/src/services/scrape.ts`	Refactors scraping to support native fetch or browser-scraper via options.
`apps/operator/src/services/scrape.test.ts`	Adds tests for the new browser-scraper scrape path.
`apps/operator/src/scheduled.ts`	Passes per-schedule browser options into monitor execution.
`apps/operator/src/modules/telegram/controller.ts`	Switches monitor URL validation import to the shared package.
`apps/operator/src/db/schema.ts`	Adds the `use_browser` column to schedules.
`apps/operator/package.json`	Adds the shared URL validator dependency to operator.
`apps/operator/migrations/meta/_journal.json`	Registers the new migration in Drizzle metadata.
`apps/operator/migrations/meta/0005_snapshot.json`	Captures the schema snapshot including `use_browser`.
`apps/operator/migrations/0005_nervous_scorpion.sql`	Adds the SQL migration for the new schedule column.
`apps/browser-scraper/wrangler.jsonc`	Configures the new browser-scraper Worker and Browser Rendering binding.
`apps/browser-scraper/tsconfig.json`	Adds TS config for the new Worker.
`apps/browser-scraper/src/services/playwright.ts`	Implements Playwright-based rendering and subresource URL checks.
`apps/browser-scraper/src/services/playwright.test.ts`	Adds unit tests for Playwright rendering behavior.
`apps/browser-scraper/src/index.ts`	Exposes the browser-scraper Worker request handler.
`apps/browser-scraper/package.json`	Defines package metadata/scripts/dependencies for the new Worker.
`apps/browser-scraper/eslint.config.ts`	Adds ESLint config for the new Worker.
`.github/workflows/main.yml`	Updates deploy ordering so browser-scraper deploys before operator.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

valuecodes · 2026-05-02T06:02:14Z

+    useBrowser: int("use_browser", { mode: "boolean" })
+      .notNull()
+      .default(false),


Intentional and called out as out-of-scope in the PR plan: enabling a schedule for the browser path is currently a manual step (wrangler d1 execute … "UPDATE schedules SET use_browser = 1 WHERE id = '…'"). Wiring it into createScheduleSchema / the OpenAI create_schedule tool / ScheduleService.create is a deliberate follow-up so we can validate the daily browser-time budget before exposing the toggle to LLM-generated schedules.

valuecodes · 2026-05-02T06:02:54Z

+      const html = await page.content();
+      const truncated = html.length > MAX_HTML_CHARS;
+      const finalHtml = truncated ? html.slice(0, MAX_HTML_CHARS) : html;


Intentional. page.content() has no streaming variant on the @cloudflare/playwright binding, so the only enforcement point the worker exposes is post-serialization. The cap is therefore on the response payload we send back over the service binding (and ultimately into the AI prompt), not on browser memory — the 60s hard timeout and 10 min/day daily budget bound the memory side. Char vs byte: agreed it's not a hard byte cap, but UTF-8 worst case is ~4× and downstream convertContent handles oversized input fine, so the slack is acceptable here.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d65cd20a7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-02T05:32:03Z

+      - name: Deploy operator to Cloudflare Workers
        run: pnpm --filter @repo/operator run deploy


Run DB migration before deploying the operator

This workflow deploys the updated operator immediately but never applies migration 0005_nervous_scorpion.sql, even though the commit adds a new schedules.use_browser column that is now part of the runtime schema (apps/operator/src/db/schema.ts) and read during scheduled execution. On environments where db:migrate:remote has not been run yet, the newly deployed worker can fail with no such column: use_browser as soon as it queries schedules. Add a migration step (for example pnpm --filter @repo/operator db:migrate:remote) before the operator deploy step.

Useful? React with 👍 / 👎.

A dedicated workflow at .github/workflows/migrations.yml already applies D1 migrations on push to main whenever apps/operator/migrations/** changes (wrangler d1 migrations apply switch-operator-db --remote). Duplicating that step inside the main deploy job would either run it twice or couple the two workflows. There is a known race window where the deploy can land before the migration workflow finishes; the current mitigation is to merge migration-only changes ahead of the feature commit when the gap matters. Happy to add an explicit guard if the race becomes a real problem.

valuecodes added 2 commits May 2, 2026 07:42

feat: redact sensitive metadata keys in logger

8043e1d

feat: add opt-in browser-scraper Worker for JS-rendered monitor pages

9d65cd2

Copilot AI review requested due to automatic review settings May 2, 2026 05:25

Copilot started reviewing on behalf of valuecodes May 2, 2026 05:26 View session

Copilot AI reviewed May 2, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

valuecodes added 2 commits May 2, 2026 09:00

fix: pr comments

0188ca4

chore: allow gh api in claude settings

2728464

valuecodes merged commit 33aa260 into main May 2, 2026
7 checks passed

valuecodes deleted the playwright-scraper branch May 2, 2026 06:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opt-in browser-scraper Worker for JS-rendered monitor pages#14

Add opt-in browser-scraper Worker for JS-rendered monitor pages#14
valuecodes merged 4 commits intomainfrom
playwright-scraper

valuecodes commented May 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

valuecodes May 2, 2026

Uh oh!

valuecodes May 2, 2026

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Uh oh!

valuecodes May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		- name: Deploy operator to Cloudflare Workers
		run: pnpm --filter @repo/operator run deploy

Conversation

valuecodes commented May 2, 2026

What

How to test

Security review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

valuecodes May 2, 2026

Choose a reason for hiding this comment

Uh oh!

valuecodes May 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

valuecodes May 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants