-
Notifications
You must be signed in to change notification settings - Fork 7
Add webhook metadata cleanup script and enhance Redis cleanup logic #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced a new script for cleaning up old webhook metadata in Redis, allowing for batch processing and dry run options. - Updated existing Redis cleanup script to handle additional patterns and results for user operations, EIP-7702, webhooks, and external bundler sends. - Improved error handling and statistics tracking for various cleanup operations. - Added tests to verify pruning behavior with randomly generated job IDs to ensure correctness in job metadata management.
WalkthroughAdds a new TypeScript script to clean old webhook meta hashes in Redis, enhances an existing Redis cleanup script to handle result-hash counters per queue, updates Twmq pruning Lua logic to use type checks and adds debug logs, and introduces an integration test validating pruning with random job IDs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Op as Operator
participant Script as cleanup-webhook-meta.ts
participant Redis
Op->>Script: Start with REDIS_URL, maxAgeHours, batchSize, dryRun
Script->>Redis: SCAN MATCH twmq:engine-cloud_webhook:job:*:meta COUNT N
loop For each batch
Script->>Redis: PIPELINE HMGET key finished_at
Redis-->>Script: finished_at values
Script->>Script: Filter keys older than cutoff
alt dryRun
Script->>Op: Log would-delete stats
else delete
Script->>Redis: PIPELINE DEL old-meta keys
Redis-->>Script: del results
end
Script->>Op: Batch stats
end
Script->>Op: Final totals and errors
sequenceDiagram
autonumber
actor Op as Operator
participant Clean as simple-redis-cleanup.ts
participant Redis
Op->>Clean: Run cleanup
Clean->>Redis: Delete error-pattern keys (existing flow)
par Per-queue result hashes
Clean->>Redis: HLEN/HSCAN for userop_confirm results
Clean->>Redis: HLEN/HSCAN for eip7702_send results
Clean->>Redis: HLEN/HSCAN for webhook results
Clean->>Redis: HLEN/HSCAN for external_bundler_send results
end
alt dryRun
Clean->>Op: Report counts (no DEL)
else execute
Clean->>Redis: HDEL fields per queue
end
Clean->>Op: Updated per-queue and total stats
sequenceDiagram
autonumber
participant Worker1
participant Worker2
participant Twmq as Twmq Queue
participant Redis
participant Lua as Pruning Lua
loop Push 100 jobs (random IDs)
Twmq->>Redis: LPUSH/Meta set
end
par Workers process
Worker1->>Twmq: Fetch/Process/Nack or Ack
Worker2->>Twmq: Fetch/Process/Nack or Ack
end
Twmq->>Lua: Invoke prune (max_success=1)
Lua->>Redis: LPOS/ZScore checks (type-based)
alt Deletions found
Lua->>Redis: DEL/Trim structures
else None
Lua->>Twmq: trimmed_count=0
Twmq->>Twmq: Debug log "no deletions"
end
Twmq-->>Workers: Completion
Note over Workers,Twmq: Test asserts/logs Redis lists, sets, hashes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsErrors were encountered while retrieving linked issues. Errors (1)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
scripts/cleanup-webhook-meta.ts
(1 hunks)scripts/simple-redis-cleanup.ts
(4 hunks)twmq/src/lib.rs
(5 hunks)twmq/tests/prune_race_random_ids.rs
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
scripts/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (scripts/.cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc)
scripts/**/*.{ts,tsx,js,jsx}
: Usebun <file>
instead ofnode <file>
orts-node <file>
to run JS/TS files
Do not usedotenv
; Bun automatically loads .env files
UseBun.serve()
for HTTP/WebSocket servers; do not useexpress
Usebun:sqlite
for SQLite; do not usebetter-sqlite3
UseBun.redis
for Redis; do not useioredis
UseBun.sql
for Postgres; do not usepg
orpostgres.js
Use built-inWebSocket
; do not usews
PreferBun.file
overnode:fs
readFile/writeFile for file IO
UseBun.$
(shell) instead ofexeca
Files:
scripts/simple-redis-cleanup.ts
scripts/cleanup-webhook-meta.ts
scripts/**/*.{html,ts,tsx,css}
📄 CodeRabbit inference engine (scripts/.cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc)
Use
bun build <file.html|file.ts|file.css>
instead ofwebpack
oresbuild
Files:
scripts/simple-redis-cleanup.ts
scripts/cleanup-webhook-meta.ts
🧬 Code graph analysis (3)
scripts/simple-redis-cleanup.ts (1)
twmq/src/lib.rs (1)
count
(359-386)
twmq/tests/prune_race_random_ids.rs (1)
twmq/src/lib.rs (9)
redis
(1243-1245)redis
(1251-1251)redis
(1343-1345)redis
(1351-1351)redis
(1381-1383)redis
(1389-1391)job
(157-162)new
(132-151)name
(174-176)
twmq/src/lib.rs (1)
twmq/src/queue.rs (1)
name
(145-153)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: coverage
- GitHub Check: test
@@ -0,0 +1,206 @@ | |||
#!/usr/bin/env bun | |||
|
|||
import Redis from "ioredis"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Switch to Bun.redis for Redis access
Scripts under scripts/
must rely on Bun’s built-in Redis client rather than ioredis
. Please refactor this script to use Bun.redis
so it aligns with the required runtime stack. As per coding guidelines
🤖 Prompt for AI Agents
In scripts/cleanup-webhook-meta.ts around line 3, the file currently imports
ioredis; replace that with Bun's built-in Redis client by removing the ioredis
import and creating a Bun.redis client configured from the same environment
variables (e.g. REDIS_URL/REDIS_PASSWORD) used previously. Update any
ioredis-specific usage to the Bun.redis equivalents (get/set/del/scan/expire or
pipeline patterns) and ensure you properly close/disconnect the Bun.redis client
at the end of the script or on errors; keep the rest of the cleanup logic
unchanged.
local lpos_result = redis.call('LPOS', pending_list, j_id) | ||
local is_pending = type(lpos_result) == "number" | ||
local zscore_result = redis.call('ZSCORE', delayed_zset, j_id) | ||
local is_delayed = type(zscore_result) == "number" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent pruning delayed jobs incorrectly
redis.call('ZSCORE', ...)
returns a bulk string, not a Lua number. With the new type(...) == "number"
guard, is_delayed
is never true—even when the job is still in the delayed zset—so the pruning loop will happily delete jobs that are merely scheduled, wiping their data/results. Please treat any non-false
reply as “present” (and mirror the same change in the failed-list trim script lower in the file).
- local zscore_result = redis.call('ZSCORE', delayed_zset, j_id)
- local is_delayed = type(zscore_result) == "number"
+ local zscore_result = redis.call('ZSCORE', delayed_zset, j_id)
+ local is_delayed = zscore_result ~= false and zscore_result ~= nil
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
local lpos_result = redis.call('LPOS', pending_list, j_id) | |
local is_pending = type(lpos_result) == "number" | |
local zscore_result = redis.call('ZSCORE', delayed_zset, j_id) | |
local is_delayed = type(zscore_result) == "number" | |
local lpos_result = redis.call('LPOS', pending_list, j_id) | |
local is_pending = type(lpos_result) == "number" | |
local zscore_result = redis.call('ZSCORE', delayed_zset, j_id) | |
local is_delayed = zscore_result ~= false and zscore_result ~= nil | |
🤖 Prompt for AI Agents
In twmq/src/lib.rs around lines 967 to 971, the check for presence in the
delayed zset uses type(zscore_result) == "number" which fails because ZSCORE
returns a bulk string; change the guard to treat any non-false Redis reply as
present (e.g., is_delayed = zscore_result ~= false or equivalent), and make the
same adjustment in the failed-list trim script elsewhere in the file so ZSCORE
responses are correctly considered "present" and delayed/failed jobs are not
pruned incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but:
When executing Redis commands within a Lua script using redis.call(), the return values are converted from Redis's RESP (REdis Serialization Protocol) types to Lua data types. Specifically, a Redis bulk string reply is converted into a Lua string.
This means that if you execute a command like GET which typically returns a bulk string (the value associated with a key), the redis.call('GET', key_name) function within your Lua script will return a standard Lua string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
What type does Redis ZSCORE command return in Lua scripts? Does it return a Lua number or Lua string?
💡 Result:
ZSCORE returns the score as a Redis bulk string, which in Lua scripts is converted to a Lua string (not a Lua number). [1][2]
@d4mr, thank you for sharing that documentation! Actually, this quote confirms the bug rather than refuting it. Let me explain:
The Problem
Your documentation states that bulk strings convert to Lua strings. When ZSCORE
finds a member, it returns the score as a bulk string (e.g., "42.0"
), which converts to a Lua string, not a Lua number.
This means:
local zscore_result = redis.call('ZSCORE', delayed_zset, j_id)
-- If job exists: zscore_result is a Lua STRING (e.g., "42.0")
-- If job doesn't exist: zscore_result is Lua FALSE (nil bulk reply → false)
local is_delayed = type(zscore_result) == "number" -- ❌ ALWAYS false!
The check type(zscore_result) == "number"
will never be true because:
- Existing delayed jobs return a string (not number)
- Non-existent jobs return false (not number)
Result: is_delayed
is always false → delayed jobs get incorrectly pruned!
The Fix
Since nil bulk replies convert to false
, the correct check is:
local zscore_result = redis.call('ZSCORE', delayed_zset, j_id)
local is_delayed = zscore_result ~= false and zscore_result ~= nil
This correctly identifies:
- Delayed jobs present:
zscore_result
is a string → true - Delayed jobs absent:
zscore_result
is false → false
The same fix needs to be applied to the failed-list pruning script as well.
✏️ Learnings added
Learnt from: d4mr
PR: thirdweb-dev/engine-core#66
File: twmq/src/lib.rs:967-971
Timestamp: 2025-10-04T12:43:13.415Z
Learning: In Redis Lua scripts, ZSCORE returns a bulk string (converted to Lua string, not Lua number) when the member exists, or a nil bulk reply (converted to Lua false) when the member doesn't exist. Therefore, checking `type(result) == "number"` will never work for ZSCORE results.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Summary by CodeRabbit