Skip to content

Conversation

d4mr
Copy link
Member

@d4mr d4mr commented Oct 4, 2025

  • Introduced a new script for cleaning up old webhook metadata in Redis, allowing for batch processing and dry run options.
  • Updated existing Redis cleanup script to handle additional patterns and results for user operations, EIP-7702, webhooks, and external bundler sends.
  • Improved error handling and statistics tracking for various cleanup operations.
  • Added tests to verify pruning behavior with randomly generated job IDs to ensure correctness in job metadata management.

Summary by CodeRabbit

  • New Features
    • Added maintenance scripts to clean up old webhook metadata and per-queue result hashes, with dry-run support and detailed summaries.
  • Bug Fixes
    • Improved job pruning reliability in Redis-backed queues, reducing false detections and ensuring clearer outcomes when no deletions occur.
  • Chores
    • Expanded operational cleanup reporting with per-queue statistics for errors and result fields.
  • Tests
    • Introduced an integration test simulating high-volume, random-ID workloads to validate aggressive pruning behavior and prevent leaks.

- Introduced a new script for cleaning up old webhook metadata in Redis, allowing for batch processing and dry run options.
- Updated existing Redis cleanup script to handle additional patterns and results for user operations, EIP-7702, webhooks, and external bundler sends.
- Improved error handling and statistics tracking for various cleanup operations.
- Added tests to verify pruning behavior with randomly generated job IDs to ensure correctness in job metadata management.
Copy link

coderabbitai bot commented Oct 4, 2025

Walkthrough

Adds a new TypeScript script to clean old webhook meta hashes in Redis, enhances an existing Redis cleanup script to handle result-hash counters per queue, updates Twmq pruning Lua logic to use type checks and adds debug logs, and introduces an integration test validating pruning with random job IDs.

Changes

Cohort / File(s) Summary
Redis webhook meta cleanup script
scripts/cleanup-webhook-meta.ts
New script scans twmq:engine-cloud_webhook:job:*:meta hashes, reads finished_at, and deletes those older than a cutoff. Supports batching, dry-run, per-batch logging, error handling, and final stats.
Expanded Redis simple cleanup
scripts/simple-redis-cleanup.ts
Adds per-queue result-hash cleanup for userop_confirm, eip7702_send, webhook, and external_bundler_send. Introduces cleanHash and updateStatsForHash, updates stats reporting, and integrates new flows into run().
Twmq pruning logic updates
twmq/src/lib.rs
Lua-based pruning now checks LPOS/ZSCORE return types (numeric) instead of nil comparisons; adds debug logs when pruning runs but deletes zero jobs. Mirrors changes across duplicated pruning blocks.
Integration test for pruning with random IDs
twmq/tests/prune_race_random_ids.rs
New test sets aggressive pruning (max_success = 1) with idempotency, processes 100 jobs with random IDs across two workers, logs Redis structures, and verifies pruning/metadata behavior.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Op as Operator
  participant Script as cleanup-webhook-meta.ts
  participant Redis

  Op->>Script: Start with REDIS_URL, maxAgeHours, batchSize, dryRun
  Script->>Redis: SCAN MATCH twmq:engine-cloud_webhook:job:*:meta COUNT N
  loop For each batch
    Script->>Redis: PIPELINE HMGET key finished_at
    Redis-->>Script: finished_at values
    Script->>Script: Filter keys older than cutoff
    alt dryRun
      Script->>Op: Log would-delete stats
    else delete
      Script->>Redis: PIPELINE DEL old-meta keys
      Redis-->>Script: del results
    end
    Script->>Op: Batch stats
  end
  Script->>Op: Final totals and errors
Loading
sequenceDiagram
  autonumber
  actor Op as Operator
  participant Clean as simple-redis-cleanup.ts
  participant Redis

  Op->>Clean: Run cleanup
  Clean->>Redis: Delete error-pattern keys (existing flow)
  par Per-queue result hashes
    Clean->>Redis: HLEN/HSCAN for userop_confirm results
    Clean->>Redis: HLEN/HSCAN for eip7702_send results
    Clean->>Redis: HLEN/HSCAN for webhook results
    Clean->>Redis: HLEN/HSCAN for external_bundler_send results
  end
  alt dryRun
    Clean->>Op: Report counts (no DEL)
  else execute
    Clean->>Redis: HDEL fields per queue
  end
  Clean->>Op: Updated per-queue and total stats
Loading
sequenceDiagram
  autonumber
  participant Worker1
  participant Worker2
  participant Twmq as Twmq Queue
  participant Redis
  participant Lua as Pruning Lua

  loop Push 100 jobs (random IDs)
    Twmq->>Redis: LPUSH/Meta set
  end
  par Workers process
    Worker1->>Twmq: Fetch/Process/Nack or Ack
    Worker2->>Twmq: Fetch/Process/Nack or Ack
  end
  Twmq->>Lua: Invoke prune (max_success=1)
  Lua->>Redis: LPOS/ZScore checks (type-based)
  alt Deletions found
    Lua->>Redis: DEL/Trim structures
  else None
    Lua->>Twmq: trimmed_count=0
    Twmq->>Twmq: Debug log "no deletions"
  end
  Twmq-->>Workers: Completion
  Note over Workers,Twmq: Test asserts/logs Redis lists, sets, hashes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly describes the primary addition of a webhook metadata cleanup script and the related enhancements to Redis cleanup logic, matching the key changes in the PR without unnecessary detail or noise.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pb/pruning-fix

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • EIP-7702: Entity not found: Issue - Could not find referenced Issue.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 029bc40 and 7af5dcf.

📒 Files selected for processing (4)
  • scripts/cleanup-webhook-meta.ts (1 hunks)
  • scripts/simple-redis-cleanup.ts (4 hunks)
  • twmq/src/lib.rs (5 hunks)
  • twmq/tests/prune_race_random_ids.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
scripts/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (scripts/.cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc)

scripts/**/*.{ts,tsx,js,jsx}: Use bun <file> instead of node <file> or ts-node <file> to run JS/TS files
Do not use dotenv; Bun automatically loads .env files
Use Bun.serve() for HTTP/WebSocket servers; do not use express
Use bun:sqlite for SQLite; do not use better-sqlite3
Use Bun.redis for Redis; do not use ioredis
Use Bun.sql for Postgres; do not use pg or postgres.js
Use built-in WebSocket; do not use ws
Prefer Bun.file over node:fs readFile/writeFile for file IO
Use Bun.$ (shell) instead of execa

Files:

  • scripts/simple-redis-cleanup.ts
  • scripts/cleanup-webhook-meta.ts
scripts/**/*.{html,ts,tsx,css}

📄 CodeRabbit inference engine (scripts/.cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc)

Use bun build <file.html|file.ts|file.css> instead of webpack or esbuild

Files:

  • scripts/simple-redis-cleanup.ts
  • scripts/cleanup-webhook-meta.ts
🧬 Code graph analysis (3)
scripts/simple-redis-cleanup.ts (1)
twmq/src/lib.rs (1)
  • count (359-386)
twmq/tests/prune_race_random_ids.rs (1)
twmq/src/lib.rs (9)
  • redis (1243-1245)
  • redis (1251-1251)
  • redis (1343-1345)
  • redis (1351-1351)
  • redis (1381-1383)
  • redis (1389-1391)
  • job (157-162)
  • new (132-151)
  • name (174-176)
twmq/src/lib.rs (1)
twmq/src/queue.rs (1)
  • name (145-153)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: coverage
  • GitHub Check: test

@@ -0,0 +1,206 @@
#!/usr/bin/env bun

import Redis from "ioredis";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Switch to Bun.redis for Redis access

Scripts under scripts/ must rely on Bun’s built-in Redis client rather than ioredis. Please refactor this script to use Bun.redis so it aligns with the required runtime stack. As per coding guidelines

🤖 Prompt for AI Agents
In scripts/cleanup-webhook-meta.ts around line 3, the file currently imports
ioredis; replace that with Bun's built-in Redis client by removing the ioredis
import and creating a Bun.redis client configured from the same environment
variables (e.g. REDIS_URL/REDIS_PASSWORD) used previously. Update any
ioredis-specific usage to the Bun.redis equivalents (get/set/del/scan/expire or
pipeline patterns) and ensure you properly close/disconnect the Bun.redis client
at the end of the script or on errors; keep the rest of the cleanup logic
unchanged.

Comment on lines +967 to 971
local lpos_result = redis.call('LPOS', pending_list, j_id)
local is_pending = type(lpos_result) == "number"
local zscore_result = redis.call('ZSCORE', delayed_zset, j_id)
local is_delayed = type(zscore_result) == "number"
Copy link

@coderabbitai coderabbitai bot Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Prevent pruning delayed jobs incorrectly

redis.call('ZSCORE', ...) returns a bulk string, not a Lua number. With the new type(...) == "number" guard, is_delayed is never true—even when the job is still in the delayed zset—so the pruning loop will happily delete jobs that are merely scheduled, wiping their data/results. Please treat any non-false reply as “present” (and mirror the same change in the failed-list trim script lower in the file).

-        local zscore_result = redis.call('ZSCORE', delayed_zset, j_id)
-        local is_delayed = type(zscore_result) == "number"
+        local zscore_result = redis.call('ZSCORE', delayed_zset, j_id)
+        local is_delayed = zscore_result ~= false and zscore_result ~= nil
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
local lpos_result = redis.call('LPOS', pending_list, j_id)
local is_pending = type(lpos_result) == "number"
local zscore_result = redis.call('ZSCORE', delayed_zset, j_id)
local is_delayed = type(zscore_result) == "number"
local lpos_result = redis.call('LPOS', pending_list, j_id)
local is_pending = type(lpos_result) == "number"
local zscore_result = redis.call('ZSCORE', delayed_zset, j_id)
local is_delayed = zscore_result ~= false and zscore_result ~= nil
🤖 Prompt for AI Agents
In twmq/src/lib.rs around lines 967 to 971, the check for presence in the
delayed zset uses type(zscore_result) == "number" which fails because ZSCORE
returns a bulk string; change the guard to treat any non-false Redis reply as
present (e.g., is_delayed = zscore_result ~= false or equivalent), and make the
same adjustment in the failed-list trim script elsewhere in the file so ZSCORE
responses are correctly considered "present" and delayed/failed jobs are not
pruned incorrectly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but:

When executing Redis commands within a Lua script using redis.call(), the return values are converted from Redis's RESP (REdis Serialization Protocol) types to Lua data types. Specifically, a Redis bulk string reply is converted into a Lua string.
This means that if you execute a command like GET which typically returns a bulk string (the value associated with a key), the redis.call('GET', key_name) function within your Lua script will return a standard Lua string.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🌐 Web query:

What type does Redis ZSCORE command return in Lua scripts? Does it return a Lua number or Lua string?

💡 Result:

ZSCORE returns the score as a Redis bulk string, which in Lua scripts is converted to a Lua string (not a Lua number). [1][2]


@d4mr, thank you for sharing that documentation! Actually, this quote confirms the bug rather than refuting it. Let me explain:

The Problem

Your documentation states that bulk strings convert to Lua strings. When ZSCORE finds a member, it returns the score as a bulk string (e.g., "42.0"), which converts to a Lua string, not a Lua number.

This means:

local zscore_result = redis.call('ZSCORE', delayed_zset, j_id)
-- If job exists: zscore_result is a Lua STRING (e.g., "42.0")
-- If job doesn't exist: zscore_result is Lua FALSE (nil bulk reply → false)

local is_delayed = type(zscore_result) == "number"  -- ❌ ALWAYS false!

The check type(zscore_result) == "number" will never be true because:

  • Existing delayed jobs return a string (not number)
  • Non-existent jobs return false (not number)

Result: is_delayed is always false → delayed jobs get incorrectly pruned!

The Fix

Since nil bulk replies convert to false, the correct check is:

local zscore_result = redis.call('ZSCORE', delayed_zset, j_id)
local is_delayed = zscore_result ~= false and zscore_result ~= nil

This correctly identifies:

  • Delayed jobs present: zscore_result is a string → true
  • Delayed jobs absent: zscore_result is false → false

The same fix needs to be applied to the failed-list pruning script as well.


✏️ Learnings added
Learnt from: d4mr
PR: thirdweb-dev/engine-core#66
File: twmq/src/lib.rs:967-971
Timestamp: 2025-10-04T12:43:13.415Z
Learning: In Redis Lua scripts, ZSCORE returns a bulk string (converted to Lua string, not Lua number) when the member exists, or a nil bulk reply (converted to Lua false) when the member doesn't exist. Therefore, checking `type(result) == "number"` will never work for ZSCORE results.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@d4mr d4mr merged commit 5786fb5 into main Oct 4, 2025
4 of 5 checks passed
@d4mr d4mr deleted the pb/pruning-fix branch October 4, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant